[DEPRECATED] Note that this repo has been moved to https://github.com/kimaina/openmrs-elt
- The motivation of this project is to provide ability of processing data in real-time from various sources like openmrs, eid, e.t.c
Make sure you have the latest docker and docker compose
- Install Docker.
- Install Docker-compose.
- Clone this repository
You will only have to run only 3 commands to get the entire cluster running. Open up your terminal and run these commands:
# this will install 5 containers (mysql, kafka, connect (dbz), openmrs, zookeeper, portainer and cAdvisor)
# cd /media/sf_akimaina/openmrs-etl
export DEBEZIUM_VERSION=0.8
docker-compose -f docker-compose.yaml up
# Start MySQL connector (VERY IMPORTANT)
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @register-mysql.json
# Realtime streaming and processing
Please use either spark(scala)/pyspark/ksql. For this project I'll demo using ksql
In order to avoid crashing of containers i.e code 137, please increase memory size and cpu of your docker VM to > 8gb and >4 cores as shown in the figure below
If everything runs as expected, expect to see all these containers running:
You can access this here: http://localhost:9000
Openmrs Application will be eventually accessible on http://localhost:8080/openmrs. Credentials on shipped demo data:
- Username: admin
- Password: Admin123
conda install pyspark=2.4.5
jupyter notebook encounter_job.ipynb
- Master Node: http://localhost:4040/
- Worker Node 1: http://localhost:8100/
- Worker Node 2: http://localhost:8200/
- Worker Node 3: http://localhost:8300/
- Worker Node 4: http://localhost:8400/
Besed on: https://github.com/big-data-europe/docker-spark/blob/master/README.md
for spark on kubernetes deployment: https://github.com/big-data-europe/docker-spark/blob/master/README.md
docker-compose -f docker-compose.yaml exec mysql bash -c 'mysql -u $MYSQL_USER -p$MYSQL_PASSWORD inventory'
docker-compose -f docker-compose.yaml exec kafka /kafka/bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --from-beginning --property print.key=true --topic schema-changes.openmrs
curl -H "Accept:application/json" localhost:8083/connectors/
docker-compose -f docker-compose.yaml down
- All you have to do is change the topic to --topic dbserver1.openmrs.
docker-compose -f docker-compose.yaml exec kafka /kafka/bin/kafka-console-consumer.sh \
--bootstrap-server kafka:9092 \
--from-beginning \
--property print.key=true \
--topic dbserver1.openmrs.obs
docker run --network openmrs-etl_default --rm --interactive --tty \
confluentinc/cp-ksql-cli:5.2.2 \
http://ksql-server:8088
After running the above command, a KSQL CLI will be presented interactively
You can call any KSQL streaming sql command as highlighted here https://docs.confluent.io/current/ksql/docs/tutorials/index.html Here are a few examples:
SHOW TOPICS;
For more KSQL streaming command please visit https://docs.confluent.io/current/ksql
-
This section attempts to explain how the clusters work by breaking everything down
-
Everything here has been dockerized so you don't need to do these steps
project
│ README.md
│ kafka.md
│ debezium.md
│ spark.md
│ docker-compose.yaml
│
│
template
│ │ java
│ │ python
│ │ scala
│ └───subfolder1
│ │ file111.txt
│ │ file112.txt
│ │ ...
Besed on: https://github.com/big-data-europe/docker-spark/blob/master/README.md