Skip to content

Commit a2c6a75

Browse files
committed
init: first commit
0 parents  commit a2c6a75

14 files changed

+394
-0
lines changed

README.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# RealTime StockStream
2+
3+
RealTime StockStream is a streamlined system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis
4+
5+
6+
![real-time-stock-stream.gif](./assets/background.jpg)
7+
8+
## Getting Started
9+
10+
This guide will walk you through setting up and running the RealTime StockStream on your local machine for development and testing.
11+
12+
### Prerequisites
13+
14+
Ensure you have the following software installed:
15+
- Docker
16+
- Python (version 3.11 or higher)
17+
18+
### Installation
19+
20+
Follow these steps to set up your development environment:
21+
22+
#### Setting Up Kafka
23+
24+
1. **Create a Kafka Topic**:
25+
```bash
26+
kafka-topics.sh --create --topic stocks --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
27+
```
28+
29+
#### Configuring Cassandra
30+
31+
1. **Create a Keyspace and Table**:
32+
Execute the following CQL commands to set up your Cassandra database:
33+
```sql
34+
CREATE KEYSPACE stockdata WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};
35+
36+
CREATE TABLE stockdata.stocks (
37+
stock text,
38+
trade_id uuid,
39+
price decimal,
40+
quantity int,
41+
trade_type text,
42+
trade_date date,
43+
trade_time time,
44+
PRIMARY KEY (stock, trade_id)
45+
);
46+
```
47+
48+
## System Architecture
49+
50+
![system-architecture](./assets/systemArchitecture.svg)
51+
52+
53+
#### Docker Compose
54+
55+
1. **Launch Services**:
56+
Use Docker Compose to start Kafka, Zookeeper, Cassandra, and Spark services:
57+
```yaml
58+
version: '3.9'
59+
services:
60+
zookeeper:
61+
image: bitnami/zookeeper:latest
62+
ports:
63+
- "2181:2181"
64+
environment:
65+
- ALLOW_ANONYMOUS_LOGIN=yes
66+
networks:
67+
stock-net:
68+
ipv4_address: 172.28.1.1
69+
70+
kafka:
71+
image: bitnami/kafka:latest
72+
ports:
73+
- "9092:9092"
74+
environment:
75+
- KAFKA_BROKER_ID=1
76+
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
77+
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://172.28.1.2:9092
78+
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
79+
- ALLOW_PLAINTEXT_LISTENER=yes
80+
depends_on:
81+
- zookeeper
82+
networks:
83+
stock-net:
84+
ipv4_address: 172.28.1.2
85+
86+
cassandra:
87+
image: cassandra:latest
88+
ports:
89+
- "9042:9042"
90+
volumes:
91+
- ./init-cassandra:/init-cassandra
92+
environment:
93+
- CASSANDRA_START_RPC=true
94+
networks:
95+
stock-net:
96+
ipv4_address: 172.28.1.3
97+
98+
spark:
99+
image: bitnami/spark:latest
100+
volumes:
101+
- ./spark:/opt/bitnami/spark/jobs
102+
ports:
103+
- "8080:8080"
104+
depends_on:
105+
- kafka
106+
networks:
107+
stock-net:
108+
ipv4_address: 172.28.1.4
109+
110+
networks:
111+
stock-net:
112+
driver: bridge
113+
ipam:
114+
config:
115+
- subnet: 172.28.0.0/16
116+
```
117+
118+
2. **Run Docker Compose**:
119+
```bash
120+
docker-compose up -d
121+
```
122+
123+
### Dependencies
124+
125+
Install the necessary Python packages:
126+
127+
- Kafka Python client:
128+
```bash
129+
pip install kafka-python==2.0.2
130+
```
131+
- PySpark:
132+
```bash
133+
pip install pyspark==3.5.0
134+
```
135+
136+
### Usage
137+
138+
1. **Run the Spark Job**:
139+
Use the `spark-submit` command to run your Spark job.
140+
```bash
141+
$ spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,com.datastax.spark:spark-cassandra-connector_2.12:3.0.0 spark_job.py stocks
142+
```
143+
144+
2. **Produce and Consume Data**:
145+
Start producing data to the `stocks` topic and monitor the pipeline's output.
146+
147+
## Monitoring and Logging
148+
149+
Check the logs for each service in their respective directories for monitoring and debugging.
150+
151+
152+
153+
154+
## Testing
155+
156+
![docker-compose-d](./assets/docker-compose-d.png)
157+
158+
159+
![docker-monitoring](./assets/docker-monitoring.png)
160+
161+
162+
![docker-ps](./assets/docker-ps.png)
163+
164+
165+
![cqlsh](./assets/cqlsh.png)
166+
167+
168+
![stocks-data-before](./assets/stocks-data-before.png)
169+
170+
171+
![creat-kafka-topic](./assets/create-kafka-topic.png)
172+
173+
## Contributing
174+
175+
Contributions to RealTime StockStream are welcome. Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the submission process.
176+
177+
## Authors
178+
179+
- [Abdullah 🚀](https://github.com/qahta0)
180+
181+
## License
182+
183+
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.

assets/background.jpg

1.36 MB
Loading

assets/cqlsh.png

40.9 KB
Loading

assets/create-kafka-topic.png

81.5 KB
Loading

assets/docker-compose-d.png

35.4 KB
Loading

assets/docker-monitoring.png

143 KB
Loading

assets/docker-ps.png

37.8 KB
Loading

assets/stocks-data-before.png

32.5 KB
Loading

assets/systemArchitecture.svg

Lines changed: 4 additions & 0 deletions
Loading

docker-compose.yaml

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
version: '3.9'
2+
3+
name: "realtime-stock-market"
4+
5+
services:
6+
zookeeper:
7+
image: bitnami/zookeeper:latest
8+
ports:
9+
- "2181:2181"
10+
environment:
11+
- ALLOW_ANONYMOUS_LOGIN=yes
12+
networks:
13+
stock-net:
14+
ipv4_address: 172.28.1.1
15+
16+
kafka:
17+
image: bitnami/kafka:latest
18+
ports:
19+
- "9092:9092"
20+
environment:
21+
- KAFKA_BROKER_ID=1
22+
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
23+
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://172.28.1.2:9092
24+
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
25+
- ALLOW_PLAINTEXT_LISTENER=yes
26+
depends_on:
27+
- zookeeper
28+
networks:
29+
stock-net:
30+
ipv4_address: 172.28.1.2
31+
32+
cassandra:
33+
image: cassandra:latest
34+
ports:
35+
- "9042:9042"
36+
volumes:
37+
- ./init-cassandra:/init-cassandra
38+
environment:
39+
- CASSANDRA_START_RPC=true
40+
networks:
41+
stock-net:
42+
ipv4_address: 172.28.1.3
43+
44+
spark:
45+
image: bitnami/spark:latest
46+
volumes:
47+
- ./spark:/opt/bitnami/spark/jobs
48+
ports:
49+
- "8080:8080"
50+
depends_on:
51+
- kafka
52+
networks:
53+
stock-net:
54+
ipv4_address: 172.28.1.4
55+
56+
networks:
57+
stock-net:
58+
driver: bridge
59+
ipam:
60+
config:
61+
- subnet: 172.28.0.0/16

0 commit comments

Comments
 (0)