Skip to content

Latest commit

 

History

History
65 lines (34 loc) · 1.38 KB

README.md

File metadata and controls

65 lines (34 loc) · 1.38 KB

kafka-streaming-greenplum

Streaming data to Pivotal Greenplum through Kafka and Pivotal Kafka Connector

ssh -i MOD.pem [email protected]

Pivotal Greenplum Command Center link:

http://ec2-35-158-228-50.eu-central-1.compute.amazonaws.com:28080/ gpmon / changeme

Apache Zeppelin link:

http://ec2-35-158-228-50.eu-central-1.compute.amazonaws.com:8090/ no authentication required

Start Kafka:

./kafka_start.sh

List existing Kafka Topics:

kafka-topics.sh --list --zookeeper localhost:2181

If crime_topic doesn't exist, create it:

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic crime_topic

Create table as your Greenplum database:

cd crime

psql < create_crime_table.sql

Run gpkafka (connector process)

./rum_gpkafka.sh

Stream data to kafka from the web source:

./web_stream_data.sh

you can streal from local file if no access to web data:

./local_stream_data.sh

Check progress:

  • at psql selecting count from crimes table

  • check commits done by gpkafka:

    gpkafka check --show-commit-history all crime_cfg.yaml

Run Zeppelin notebooks.

NOTES:

Describe Kafka topic

kafka-topics.sh --describe --zookeeper localhost:2181 --topic crime_topic

Alter number of pertitions for topic:

kafka-topics.sh --alter --zookeeper localhost:2181 --topic crime_topic --partitions 3