Skip to content
Bryan Robbins edited this page Jan 16, 2015 · 2 revisions

User Scenario

I am a user of getDataGen.com. I am testing a web form with 4 fields:

  • Name: A person’s name, in English
  • Birthday: A person’s birthday
  • Account number: A 10-digit account number
  • US State

My experience should be something like this:

  • Specify that I need one variable “Name” which is a String of 3-12 English characters
  • Specify that I need one variable “Birthday” which is a Date between 1/1/1950 and 1/1/2000
  • Specify that I need one variable “AccountNum” which is a sequence of digits of length 10
  • Specify that I need one variable “State” which has a value which is a valid US State abbreviation, randomly selected from all available states
  • Specify that I want 5000 total “rows” of output
  • Click “GENERATE”
  • Receive a link to my file (in the browser) once the generation is complete

Requirements

  • There are three phases to using this UI: Configuration, Generation, and Acquisition.

    • During Configuration, the user describes the constraints of her data.
    • During Generation, the hosted environment generates data.
    • During Acquisition, the user acquires the generated data.
  • Configuration

    • During configuration, the user describes their desired output data in terms of Variables, Equivalence Classes, and Generation Technique.
    • Possible variable values are defined by one or more equivalence classes. Following from existing theory in software testing, all values from the same equivalence class are considered to be equal for purposes of generating output data.
    • Equivalence classes are defined by a Template and its Parameters.
    • The system shall provide, at a minimum, the following templates (with parameter lists in parentheses below):
      • Literal(Value)
      • RegularExpression(Expression)
          • Custom regular expressions will only be able to generate strings of length 200 or less. If the limit is exceeded, generation fails and stops immediately (to avoid issues with long-running or even infinite expression evaluation).
      • DigitSequence(Length)
    • The system shall allow the user to choose from the following possible generation strategies:
      • All Combinations
      • Pairwise Combinations
    • The system shall allow the user to specify a maximum number of lines to be generated (even though restricting output will prevent coverage goals from being achieved).
    • At any point while using the system, the user should be able to acquire via download a portable, textual representation of the current configuration.
    • Once configuration is complete, the system shall allow the user to indicate (e.g., via a button) that the Generation phases should be triggered.
  • Generation

    • By default, the system shall select values from each equivalence class at random during generation.
    • The output of generation shall be a List of Data Sets.
    • A single Data Set is a set of (variable, value) pairs, one pair per variable defined. A Data Set can also be represented as a Row, with one value per column. In this form, the order of columns (with one variable per column) must be pre-defined.
    • "All Combinations" generation should produce one output Data Set for every unique combination of equivalence classes across variables. For example, consider a variable A with equivalence classes A1, A2; variable B with B1, B2; and variable C with C1, C2. All Combinations generation over these variables and their equivalence classes produces 8 unique data sets: (A1, B1, C1), (A1, B1, C2), (A1, B2, C1), (A1, B2, C2), (A2, B1, C1), (A2, B1, C2), (A2, B2, C1), and (A2, B2, C2).
    • "All Pairs" generation should produce one output Data Set for every pair of equivalence classes across all variables. For the same scenario as above, there are 12 pairs to be covered by All Pairs generation: (A1, B1), (A1, B2), (A2, B1), (A2, B2), (A1, C1), (A1, C2), (A2, C1), (A2, C2), (C1, B1), (C1, B2), (C2, B1), (C2, B2). However, we can take advantage of the fact that a single data set covers three of these pairs. This leads to an All Pairs generation output such as: (A1, B1, C1), (A1, B2, C2), (A2, B1, C2), (A2, B2, C1).
    • In the hosted version of the tool, data generation will be limited to 1 million rows per use.
  • Acquisition

    • After generation is complete, the resulting data should be made available to the user via a downloadable file.
Clone this wiki locally