Live Market Data:
- During Market Hours
- Quote Data for AAPL
- Bar Data for AAPL, MSFT, GOOG, SPY, META, DIA, TSLA
- Outside Market Hours
- Quote Data for FAKEPACA
- Bar Data for FAKEPACA
Websocket :
- wss://stream.data.alpaca.markets/v2/iex
- wss://stream.data.alpaca.markets/v2/test
- Create EMR Cluster
- Configure Cluster
- Connect to Primary node using ssh
-
Edit EC2 Security Groups - give access to local machine's ip
-
SSH EMR Cluster on local machine
- Navigate to the aws-key.pem directory
- ssh into the emr cluster
- Set up kafka
- pip install kafka-python
- pip install boto3
- pip install websocket-client streamlit watchdog plotly
- wget https://downloads.apache.org/kafka/3.5.2/kafka_2.13-3.5.2.tgz
- tar -xzf kafka_2.13-3.5.2.tgz
- cd kafka_2.13-3.5.2
- nano config/server.properties
- look for advertized listeners and change it to ec2's host ip for master node [ ec2's host ip for master node - ip-10-x-x.ec2.internal ]
- look for zookeeper connect and change it to ec2's host ip for master node
- bin/kafka-server-start.sh config/server.properties
- Set up Producer.py
- vim producer.py
- update kafka-bootstrap-server with [ip-10-x-x.ec2.internal]
- bin/kafka-topics.sh --create --bootstrap-server ip-10-0-x-x.ec2.internal:9092 --replication-factor 1 --partitions 1 --topic symbol_topic
- bin/kafka-topics.sh --create --bootstrap-server ip-10-0-x-x.ec2.internal:9092 --replication-factor 1 --partitions 1 --topic symbol_topic2
- spark-submit producer.py
- Set up Consumer.py
- vim consumer.py
- update kafka-bootstrap-server with [ip-10-x-x.ec2.internal]
- bin/kafka-topics.sh --create --bootstrap-server ip-10-0-x-x.ec2.internal:9092 --replication-factor 1 --partitions 1 --topic visual_topic
- bin/kafka-topics.sh --create --bootstrap-server ip-10-0-x-x.ec2.internal:9092 --replication-factor 1 --partitions 1 --topic visual_topic2
- mkdir -p /home/hadoop/consumer1
- nano /home/hadoop/consumer1/log4j.properties
- spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/home/hadoop/consumer1/log4j.properties"
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/home/hadoop/consumer1/log4j.properties"
consumer1.py - vim consumer2.py
- update kafka-bootstrap-server with [ip-10-x-x.ec2.internal]
- spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 consumer2.py
- Set up Visualizer (Viz.py)
- ~/.local/bin/streamlit run viz.py --server.port 8501 --server.address 0.0.0.0
- ec2-> Security Groups -> Edit Inbound Rules -> add 8501, 0.0.0.0/0 ->Save rules
- Set up Log Visualizer (log_viz.py)
- sudo sysctl fs.inotify.max_user_watches=524288
- ~/.local/bin/streamlit run log_viz.py --server.port 8502 --server.address 0.0.0.0
- ec2-> Security Groups -> Edit Inbound Rules -> add 8502, 0.0.0.0/0 ->Save rules
- Ganglia (data monitoring)
- look for ganglia application under emr, enable ssh configuration
- ssh tunneling to port 8050 [ ssh -i ./aws-master-node.pem -ND 8050 [email protected]]
- copy uri into a new tab
- create a SOCKS5 proxy on the browser
- proxy onto SOCKS5
- reload