Probability

Probability

Chooses and runs one of the specified sub-generators with the given probability. Needs at least one probability element with a value attribute together with a sub-generator.

Attributes
Name Description Required Min Max Allowed Values
seed Random number generator seed of this Element. Overrides default seeding behavior. no 0 1
name (Class)Name of this element. Used to identify plugin Class. Full name is required. Example: com.en.myPluginPackage.myPuginClass no 0 1
id Identification String of this element. May be used to uniquely identify a field within the children of an Element. no 0 1
Nodes
Name Description Required Min Max Allowed Values
chunkSize Content type: Long
Sets a chunk size (in combination with: disableRng). The default chunk size is the size of the whole table. More details can be found in the description of disableRng.
no 0 1
probability Content type: Sub-generator
Needs an attribute "value" which indicates the likelihood of running the specified sub-generator. Likelihood (probabilities) do not need to add up to 1.0 or 100 and can be any value. The final likelihood is defined as: probabilityValue/sum(probabilityValues)
yes 1 1
sameChoiceAs Content type: Empty
Requires a <field> and a <generatorByID> attribute (in same table) to pick the row number from. If specified this Probability does not choose a random row, but it uses the same row as the referenced generator.
no 0 1
disableRng Content type: Boolean
If this is true, random mixing of values will be disabled. The values will be generated in the order specified in the xml (in relation to the current ID).
Example 1:
chunkSize= unset (default: whole table), a table with 10000 lines) and there are three sub-generators defined, with a probability of 0.5, 0.3 and 0.2 respectively. Setting disableRng to true would in that case generate:
First 0.5 * 10000 lines with generator[0], then the next 0.3 * 10000 lines with generator[1], and the last 0.2 * 10000 lines with generator[2].
Example 2:
chunkSize= 100, the remaining numbers identical to example 1 above:
0.5 * 100 lines generator[0], 0.3 * 100 lines generator[1], 0.2 * 100 lines generator[2], 0.5 * 100 lines generator[0], ... (until the table is filled).
no 0 1
  • true
  • false
  • 0
  • 1

Examples

  1. Simple Probability

    Generate 26 grades (numbers 1 to 6) according to the given probabilities in random order

    Schema config for Simple Probability
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!--
    /*******************************************************************************
     * Copyright (c) 2013, bankmark and/or its affiliates. All rights reserved.
     * bankmark UG PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
     ******************************************************************************/
    --><schema xmlns:doc="http://bankmark.de/pdgf/doc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="demo" xsi:noNamespaceSchemaLocation="structure/pdgfSchema.xsd">
    
      <seed>23</seed>
    
      <property name="SF" type="double">1</property>
    
      <table name="students">
        <size>26</size>
    
        <!--Simple Probability-->
          <!--Generate 26 grades (numbers 1 to 6) according to the given probabilities in random order-->
          <field name="grades_random" size="" type="INTEGER">
            <gen_Probability>
              <!-- 1 -->
              <probability value="0.10">
                <gen_StaticValue>
                  <value>1</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 2 -->
              <probability value="0.15">
                <gen_StaticValue>
                  <value>2</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 3 -->
              <probability value="0.25">
                <gen_StaticValue>
                  <value>3</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 4 -->
              <probability value="0.25">
                <gen_StaticValue>
                  <value>4</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 5 -->
              <probability value="0.15">
                <gen_StaticValue>
                  <value>5</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 6 -->
              <probability value="0.10">
                <gen_StaticValue>
                  <value>6</value>
                </gen_StaticValue>
              </probability>
            </gen_Probability>
          </field>
          </table>
    </schema>
    
    Output for Simple Probability
    3
    6
    6
    3
    2
    3
    4
    1
    4
    5
    2
    1
    1
    2
    2
    2
    3
    1
    2
    4
    3
    4
    5
    5
    5
    3
  2. Ordered Probability

    Generate 26 grades (numbers 1 to 6) according to the given probabilities in the given order. (6, 5, ..., 1)

    Schema config for Ordered Probability
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!--
    /*******************************************************************************
     * Copyright (c) 2013, bankmark and/or its affiliates. All rights reserved.
     * bankmark UG PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
     ******************************************************************************/
    --><schema xmlns:doc="http://bankmark.de/pdgf/doc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="demo" xsi:noNamespaceSchemaLocation="structure/pdgfSchema.xsd">
    
      <seed>23</seed>
    
      <property name="SF" type="double">1</property>
    
      <table name="students">
        <size>26</size>
    
        <!--Ordered Probability-->
          <!--Generate 26 grades (numbers 1 to 6) according to the given probabilities in the given order. (6, 5, ..., 1)-->
          <field name="grades_ordered" size="" type="INTEGER">
            <gen_Probability>
              <disableRng>true</disableRng>
    
              <!-- 6 -->
              <probability value="0.10">
                <gen_StaticValue>
                  <value>6</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 5 -->
              <probability value="0.15">
                <gen_StaticValue>
                  <value>5</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 4 -->
              <probability value="0.25">
                <gen_StaticValue>
                  <value>4</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 3 -->
              <probability value="0.25">
                <gen_StaticValue>
                  <value>3</value>
                </gen_StaticValue>
              </probability>
    
    
              <!-- 2 -->
              <probability value="0.15">
                <gen_StaticValue>
                  <value>2</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 1 -->
              <probability value="0.10">
                <gen_StaticValue>
                  <value>1</value>
                </gen_StaticValue>
              </probability>
            </gen_Probability>
          </field>
          </table>
    </schema>
    
    Output for Ordered Probability
    6
    6
    6
    5
    5
    5
    5
    4
    4
    4
    4
    4
    4
    3
    3
    3
    3
    3
    3
    3
    2
    2
    2
    1
    1
    1
  3. Ordered and Chunked Probability

    We want to simulate an unfair dice with six faces, number one through six. Every face is equally likely with a probability of 1/6 (0.1666666667). Since disableRng is used with a chunk size of 12, every face occurs twice (12 * 1/6), in the specified order (6, 5, 4, .., 1). When the last item (1) is reached the sequence starts with the first item (6) until the desired number of elements given by the table size (26) is reached.

    Schema config for Ordered and Chunked Probability
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!--
    /*******************************************************************************
     * Copyright (c) 2013, bankmark and/or its affiliates. All rights reserved.
     * bankmark UG PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
     ******************************************************************************/
    --><schema xmlns:doc="http://bankmark.de/pdgf/doc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="demo" xsi:noNamespaceSchemaLocation="structure/pdgfSchema.xsd">
    
      <seed>23</seed>
    
      <property name="SF" type="double">1</property>
    
      <table name="students">
        <size>26</size>
    
        <!--Ordered and Chunked Probability-->
          <!--
            We want to simulate an unfair dice with six faces, number one through six. Every face is equally likely with a
            probability of 1/6 (0.1666666667). Since disableRng is used with a chunk size of 12, every face occurs twice
            (12 * 1/6), in the specified order (6, 5, 4, .., 1). When the last item (1) is reached the sequence starts with
            the first item (6) until the desired number of elements given by the table size (26) is reached.
          -->
          <field name="dice_roll_chunk" size="" type="INTEGER">
            <gen_Probability>
              <disableRng>true</disableRng>
              <chunkSize>12</chunkSize>
    
              <!-- 6 -->
              <probability value="0.1666666667">
                <gen_StaticValue>
                  <value>6</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 5 -->
              <probability value="0.1666666667">
                <gen_StaticValue>
                  <value>5</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 4 -->
              <probability value="0.1666666667">
                <gen_StaticValue>
                  <value>4</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 3 -->
              <probability value="0.1666666667">
                <gen_StaticValue>
                  <value>3</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 2 -->
              <probability value="0.1666666667">
                <gen_StaticValue>
                  <value>2</value>
                </gen_StaticValue>
              </probability>
    
              <!-- 1 -->
              <probability value="0.1666666667">
                <gen_StaticValue>
                  <value>1</value>
                </gen_StaticValue>
              </probability>
            </gen_Probability>
          </field>
          </table>
    </schema>
    
    Output for Ordered and Chunked Probability
    6
    6
    5
    5
    4
    4
    3
    3
    2
    2
    1
    1
    6
    6
    5
    5
    4
    4
    3
    3
    2
    2
    1
    1
    6
    6
  4. Conditional Probability

    Generates gender (female/male) and samples from the appropriate first name dictionary to generate the first name based on the gender.

    Schema config for Conditional Probability
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!--
    /*******************************************************************************
     * Copyright (c) 2013, bankmark and/or its affiliates. All rights reserved.
     * bankmark UG PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
     ******************************************************************************/
    --><schema xmlns:doc="http://bankmark.de/pdgf/doc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="demo" xsi:noNamespaceSchemaLocation="structure/pdgfSchema.xsd">
    
      <seed>23</seed>
    
      <property name="SF" type="double">1</property>
    
      <table name="students">
        <size>26</size>
    
        <!--Conditional Probability-->
          <!--
            Generates gender (female/male) and samples from the appropriate first name dictionary to generate the first name
            based on the gender.
          -->
          <field name="gender" size="" type="VARCHAR">
            <gen_Probability id="gender_gen">
    
              <probability value="0.5">
                <gen_StaticValue>
                  <value>female</value>
                </gen_StaticValue>
              </probability>
    
              <probability value="0.5">
                <gen_StaticValue>
                  <value>male</value>
                </gen_StaticValue>
              </probability>
    
            </gen_Probability>
          </field>
    
          <field name="first_name" type="VARCHAR">
            <gen_Probability>
              <sameChoiceAs field="gender" generatorByID="gender_gen"/>
    
              <probability value="0.5">
                <gen_DictList>
                  <file>dicts/female.dict</file>
                </gen_DictList>
              </probability>
    
              <probability value="0.5">
                <gen_DictList>
                  <file>dicts/male.dict</file>
                </gen_DictList>
              </probability>
            </gen_Probability>
          </field>
          </table>
    </schema>
    
    Output for Conditional Probability
    male|Christoph
    male|Björn
    female|Leyla
    female|Kate
    male|Michel
    male|Ruben
    female|Melinda
    female|Malea
    male|Ilias
    male|Enrico
    female|Marika
    female|Sena
    male|Quinn
    male|Adam
    female|Marika
    male|Noel
    female|Miley
    male|Nick
    male|Hennes
    female|Gina
    male|Lucien
    female|Angelie
    female|Tamia
    female|Enya
    female|Helin
    male|Stanley
2.6_#1486_b758 | 2016-05-24