Skip to content

Log Generator Driver

Ashrith Mekala edited this page May 9, 2015 · 3 revisions

Log Generator Driver command line interface

bin/generator log --help
Log Generator
Usage: generator [options]

  -e <value> | --eventsPerSec <value>
        number of log events to generate per sec, use this to throttle the generator
  -o <value> | --outputFormat <value>
        format of the string to write to the file defaults to: 'tsv'
	 where,
		text - string formatted by tabs in between columns
		avro - string formatted using avro serialization
  -d <value> | --destination <value>
        destination where the generator writes data to, defaults to: 'file'
	 where,
		file - output's directly to flat files
		kafka - output to specified kafka topic
		kinesis - output to specified kinesis
  -r <value> | --fileRollSize <value>
        size of the file to roll in bytes, defaults to: Int.MaxValue (don't roll files)
  -p <value> | --filePath <value>
        path of the file where the data should be generated, defaults to: '/tmp'
  -t <value> | --totalEvents <value>
        total number of events to generate, default: 1000
  -b <value> | --flushBatch <value>
        number of events to flush to file at a single time, defaults to: 10000
  --kafkaBrokerList <value>
        list of kafka brokers to write to, defaults to: 'localhost:9092'
  --kafkaTopicName <value>
        name of the kafka topic to write data to, defaults to: 'logs'
  --kinesisStreamName <value>
        name of the kinesis stream to write data to, defaults to: 'logevents'
  --kinesisShardCount <value>
        number of kinesis shard to create, defaults to: '1'
  --ipSessionCount <value>
        number of times a ip can appear in a session, defaults to: '25'
  --ipSessionLength <value>
        size of the session, defaults to: '50'
  --threadsCount <value>
        number of threads to use for write and read operations, defaults to: 1
  --threadPoolSize <value>
        size of the thread pool, defaults to: 10
  --awsAccessKey <value>
        AWS access key (required for kinesis)
  --awsSecretKey <value>
        AWS secret key (required for kinesis)
  --awsEndPoint <value>
        AWS service end point to connect to (required for kinesis)
  --loggingLevel <value>
        Logging level to set, defaults to: INFO
  --help
        prints this usage text

Examples:

  1. To generate 100000 events to /tmp

     bin/generator log --totalEvents 100000 --filePath /tmp
    
  2. To generate 100M events to /tmp and roll file every 64 MB

     bin/generator log --totalEvents 100000000 --filePath /tmp --fileRollSize 67108864
    
  3. To generate 100M events to /tmp/ concurrently using 5 threads

     bin/generator log --totalEvents 100000000 --filePath /tmp --fileRollSize 67108864 --threadCount 5
    
  4. To generate 100M events to /tmp/ concurrently using 5 threads in 'avro' format

     bin/generator log --totalEvents 100000000 --filePath /tmp --fileRollSize 67108864 --threadCount 5 --fileFormat avro
    
  5. Writing directly to kafka

     bin/generator log --totalEvents 1000 --destination kafka --kafkaBrokerList "localhost:9092" --kafkaTopicName logs --threadsCount 5
    
  6. Writing to Kinesis

     bin/generator log --totalEvents 1000 --eventsPerSec 100 --flushBatch 500 --destination kinesis --kinesisStreamName logevents --kinesisShardCount 2 --awsAccessKey [ACCESS_KEY] --awsSecretKey [SECRET_KEY] --awsEndPoint [ENDPOINT_URL_KINESIS]
    
  7. Checking kinesis stream using aws command line tool

    • Describe the stream and get the shard-id

        aws kinesis describe-stream --stream-name generator
      
    • Get the shard-iterator

        aws kinesis get-shard-iterator --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON --stream-name generator
      
    • Get the records

        SHARD_ITERATOR=$(aws kinesis get-shard-iterator --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON --stream-name generator --query 'ShardIterator')
        aws kinesis get-records --shard-iterator $SHARD_ITERATOR --debug
      
Clone this wiki locally