MarkovChainText

MarkovChainText

Generates text samples based on a loaded markov chain. To create your own chain run the MarkovChainBuilder tool:
java -cp pdgf.jar pdgf.util.text.MarkovChainBuilder

This tool operates new-line oriented.
The tokenizer is a Whitespace java.io.StreamTokenizer.StreamTokenizer. Every line is separately used to train the markov model.
e.g. Input 2 lines:
------------
I have many cheeseburgers.
You have many cheeseburgers.
------------
Will generate:
n-gram: Suffixlist
{##,##}: {I,You}
{##,I}: {have}
{##,You}: {have}
{I,have}: {many}
{You,have}: {many}
{have,many}: {cheeseburgers}
{many,cheeseburgers}: {##}

If the provided cmd line options of MarkovChainBuilder are not enough (e.g. you require multiple lines to be treated as one "sample", or you need another tokenizer, you can interface with MarkovChainBuilder and programm your own wrapper.
If you need even more fine grained controll you can directly interface with: pdgf.util.text.MarkovChain. In both cases use the provided toBytes() method to obtain an optimized serialized binary version of you built chain, which will be loadable by this generator.

Attributes
Name Description Required Min Max Allowed Values
seed Random number generator seed of this Element. Overrides default seeding behavior. no 0 1
name (Class)Name of this element. Used to identify plugin Class. Full name is required. Example: com.en.myPluginPackage.myPuginClass no 0 1
id Identification String of this element. May be used to uniquely identify a field within the children of an Element. no 0 1
Nodes
Name Description Required Min Max Allowed Values
max Content type: Long
Sets the maximal sample length
yes 1 1
file Content type: String (must be a valid filesystem path)
Specifies the file with the Markov chain to load.
yes 1 1
min Content type: Long
Sets the minimal sample length.
yes 1 1
2.6_#1486_b758 | 2016-05-24