More on Random Text Generation

Tags:

I’ve been tracking random text generator for the past a couple of days. The reason will be revealed in a separate post. 

What I have seen so far are two types of algorithms, represented by two classic programs.

The "cut-up" method 

book coverThe idea is simple — using a Markov Chain to model n-gram frequency of a text, and then generate text based on the transition matrix. The idea goes back to Shannon’s 1948 classic Mathematical Theory of Communication, and perhaps even earlier.

In terms of implementations, the classic example is the Code from The Practice of Programming. The AWK implementation is the easiest to understand.

More of these programs can be found at here and here. An on-line service can be found here. A new breed (such as GoogleTalk) is to use search engines results instead of building a transition matrix.

The Dada Engine

A big problem with the Markov models is that there is no guarantee that the output will be grammatical. The outcome is in a non-insignificant way determined by the size of the input text. Often one finds the paradoxical result that the larger the text input, the less grammatical the output will be. On the other hand, when the input is too short (and higher-order Markovian models are used) the model will spit the original text back. These are all inherent problems with the model-less Markovian approach.

In contrast, one could start with a grammar — often context-free grammar — and generate texts from there. Building a grammar is no trival matter, however. The Dada Engine by Andrew Blucher, was power behind the Postmodernism Generator I blogged a couple of days ago. NovelWriting is a Python re-implementation of the Dada Engine. An example of the grammar is here.

The texts generated, while grammatically sound, are incoherent, because of the context-free nature of the grammar. However, they may not  be meaningless. It depends on what is built into the grammar.

Leave a Reply

If the above Image does not contain text, use this secure code: PK4jjWW