Skip to content

senzing-garage/stream-producer

Repository files navigation

stream-producer

If you are beginning your journey with Senzing, please start with Senzing Quick Start guides.

You are in the Senzing Garage where projects are "tinkered" on. Although this GitHub repository may help you understand an approach to using Senzing, it's not considered to be "production ready" and is not considered to be part of the Senzing product. Heck, it may not even be appropriate for your application of Senzing!

Synopsis

Populate a queue with records to be consumed by stream-loader.

Overview

The stream-produder.py python script reads files of different formats (JSON, CSV, Parquet, Avro) and publishes it to a queue (RabbitMQ, Kafka, AWS SQS). The senzing/stream-producer docker image is a wrapper for use in docker formations (e.g. docker-compose, kubernetes).

To see all of the subcommands, run:

$ ./stream-producer.py --help
usage: stream-producer.py [-h]
                          {avro-to-kafka,avro-to-rabbitmq,avro-to-sqs,avro-to-sqs-batch,avro-to-stdout,csv-to-kafka,csv-to-rabbitmq,csv-to-sqs,csv-to-sqs-batch,csv-to-stdout,gzipped-json-to-kafka,gzipped-json-to-rabbitmq,gzipped-json-to-sqs,gzipped-json-to-sqs-batch,gzipped-json-to-stdout,json-to-kafka,json-to-rabbitmq,json-to-sqs,json-to-sqs-batch,json-to-stdout,parquet-to-kafka,parquet-to-rabbitmq,parquet-to-sqs,parquet-to-sqs-batch,parquet-to-stdout,websocket-to-kafka,websocket-to-rabbitmq,websocket-to-sqs,websocket-to-sqs-batch,websocket-to-stdout,sleep,version,docker-acceptance-test}
                          ...

Queue messages. For more information, see https://github.com/Senzing/stream-
producer

positional arguments:
  {avro-to-kafka,avro-to-rabbitmq,avro-to-sqs,avro-to-sqs-batch,avro-to-stdout,csv-to-kafka,csv-to-rabbitmq,csv-to-sqs,csv-to-sqs-batch,csv-to-stdout,gzipped-json-to-kafka,gzipped-json-to-rabbitmq,gzipped-json-to-sqs,gzipped-json-to-sqs-batch,gzipped-json-to-stdout,json-to-kafka,json-to-rabbitmq,json-to-sqs,json-to-sqs-batch,json-to-stdout,parquet-to-kafka,parquet-to-rabbitmq,parquet-to-sqs,parquet-to-sqs-batch,parquet-to-stdout,websocket-to-kafka,websocket-to-rabbitmq,websocket-to-sqs,websocket-to-sqs-batch,websocket-to-stdout,sleep,version,docker-acceptance-test}
                              Subcommands (SENZING_SUBCOMMAND):
    avro-to-kafka             Read Avro file and send to Kafka.
    avro-to-rabbitmq          Read Avro file and send to RabbitMQ.
    avro-to-sqs               Read Avro file and print to AWS SQS.
    avro-to-stdout            Read Avro file and print to STDOUT.

    csv-to-kafka              Read CSV file and send to Kafka.
    csv-to-rabbitmq           Read CSV file and send to RabbitMQ.
    csv-to-sqs                Read CSV file and print to SQS.
    csv-to-stdout             Read CSV file and print to STDOUT.

    gzipped-json-to-kafka     Read gzipped JSON file and send to Kafka.
    gzipped-json-to-rabbitmq  Read gzipped JSON file and send to RabbitMQ.
    gzipped-json-to-sqs       Read gzipped JSON file and send to AWS SQS.
    gzipped-json-to-stdout    Read gzipped JSON file and print to STDOUT.

    json-to-kafka             Read JSON file and send to Kafka.
    json-to-rabbitmq          Read JSON file and send to RabbitMQ.
    json-to-sqs               Read JSON file and send to AWS SQS.
    json-to-stdout            Read JSON file and print to STDOUT.

    parquet-to-kafka          Read Parquet file and send to Kafka.
    parquet-to-rabbitmq       Read Parquet file and send to RabbitMQ.
    parquet-to-sqs            Read Parquet file and print to AWS SQS.
    parquet-to-stdout         Read Parquet file and print to STDOUT.

    sleep                     Do nothing but sleep. For Docker testing.
    version                   Print version of program.
    docker-acceptance-test    For Docker acceptance testing.

optional arguments:
  -h, --help                  show this help message and exit

Contents

  1. Demonstrate using Docker
  2. Demonstrate using docker-compose
  3. Demonstrate using Command Line Interface
    1. Prerequisites for CLI
    2. Download
    3. Run command
  4. Configuration
  5. AWS configuration
  6. References

Preamble

At Senzing, we strive to create GitHub documentation in a "don't make me think" style. For the most part, instructions are copy and paste. Whenever thinking is needed, it's marked with a "thinking" icon 🤔. Whenever customization is needed, it's marked with a "pencil" icon ✏️. If the instructions are not clear, please let us know by opening a new Documentation issue describing where we can improve. Now on with the show...

Legend

  1. 🤔 - A "thinker" icon means that a little extra thinking may be required. Perhaps there are some choices to be made. Perhaps it's an optional step.
  2. ✏️ - A "pencil" icon means that the instructions may need modification before performing.
  3. ⚠️ - A "warning" icon means that something tricky is happening, so pay attention.

Expectations

  • Space: This repository and demonstration require 6 GB free disk space.
  • Time: Budget 40 minutes to get the demonstration up-and-running, depending on CPU and network speeds.
  • Background knowledge: This repository assumes a working knowledge of:

Demonstrate using Docker

  1. Run Docker container. This command will show help. Example:

    docker run \
      --rm \
      senzing/stream-producer --help
    
  2. For more examples of use, see Examples of Docker.

Demonstrate using docker-compose

  1. Deploy the Backing Services required by the Stream Loader.

  2. Specify a directory to place artifacts in. Example:

    export SENZING_VOLUME=~/my-senzing
    mkdir -p ${SENZING_VOLUME}
    
  3. Download docker-compose.yaml file. Example:

    curl -X GET \
      --output ${SENZING_VOLUME}/docker-compose.yaml \
      https://raw.githubusercontent.com/Senzing/stream-producer/main/docker-compose.yaml
    
  4. Bring up docker-compose stack. Example:

    docker-compose -f ${SENZING_VOLUME}/docker-compose.yaml up
    

Demonstrate using Command Line Interface

Prerequisites for CLI

🤔 The following tasks need to be complete before proceeding. These are "one-time tasks" which may already have been completed.

  1. Install Python prerequisites. Example:

    pip3 install -r https://raw.githubusercontent.com/Senzing/stream-producer/main/requirements.txt
    
    1. See requirements.txt for list.
      1. Installation hints

Download

  1. Get a local copy of stream-producer.py. Example:

    1. ✏️ Specify where to download file. Example:

      export SENZING_DOWNLOAD_FILE=~/stream-producer.py
      
    2. Download file. Example:

      curl -X GET \
        --output ${SENZING_DOWNLOAD_FILE} \
        https://raw.githubusercontent.com/Senzing/stream-producer/main/stream-producer.py
      
    3. Make file executable. Example:

      chmod +x ${SENZING_DOWNLOAD_FILE}
      
  2. 🤔 Alternative: The entire git repository can be downloaded by following instructions at Clone repository

Run command

  1. Run the command. Example:

    ${SENZING_DOWNLOAD_FILE} --help
    
  2. For more examples of use, see Examples of CLI.

Configuration

Configuration values specified by environment variable or command line parameter.

AWS configuration

stream-producer.py uses AWS SDK for Python (Boto3) to access AWS services. This library may be configured via environment variables or ~/.aws/config file.

Example environment variables for configuration:

References

  1. Boto3 Configuration
  2. Boto3 Credentials
  3. Development
  4. Errors
  5. Examples
  6. Related artifacts:
    1. DockerHub