Python implementation of the Spark video stream analytics project. The original project was implemented in Java and can be found here.
First, make sure you have a Python environment set up. You can use the following command to create a new environment:
python3 -m venv ./venv
Then source the enviroment with the following command:
source venv/bin/activate
Finally, install the required packages with the following command:
python3 -m pip install -r requirements.txt
Make sure to deactivate the environment when you are done with the following command:
deactivate
Whenever you want to work on the project, make sure to source the environment before running the code.
Currently, the project is utilizing the pyspark library in its version 3.5.1
Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.8+, and R 3.5+. Java 8 prior to version 8u371 support is deprecated as of Spark 3.5.0.
So, in order to run the project without issues, make sure you have Java 8/11/17 installed on your machine.
Even though the project is using the pyspark library, it is necessary to have the Spark service installed and running on your machine when launching thr project.
Before installing Spark, make sure you have Java 8/11/17 installed on your machine and the variable JAVA_HOME
is correctly set in your environment variables.
Now, the following steps will guide you through the installation of Spark:
- Download the latest version of Spark from the official website.
- Extract the downloaded file to a directory of your choice.
- Set the
SPARK_HOME
variable in your environment variables to the directory where you extracted the Spark files. - Test the installation by running the following command:
# launch Scala Based Spark
spark-shell
# launch PySpark
pyspark
If you see the Spark shell, then the installation was successful.
Even tho the project kafka-python
library to interact with Kafka, it is necessary to have Kafka and Zookeeper services running on your machine. In order to achieve that, those environments were configured to work through a Docker container.
So, having docker installed and working on your machine is a requirement to run the project.
Now, the following steps will guide you through the installation of the container with Kafka and Zookeeper:
In the project root, run the following command to build the containers:
make build
To stand up Kafka and Spark services, run:
make run
This command will start both Kafka and Spark. You can also build Spark services with 3 workers using:
make run-scaled
Before running the scripts, you must create and activate a virtual environment:
virtualenv venv
source venv/bin/activate
To run the video stream collector, run the following command:
python src/video-stream-collector.py --config {{ CONFIG_FILE }}
Where CONFIG_FILE is the path to the configuration file. Multiple example configuration files can be found in the config/collector directory.
Example
python src/stream_collector.py --config config/collector/file_cam_local.yaml
To run stream_processor.py
, run:
make submit app=src/stream_processor.py
There are several commands to build and manage standalone Spark cluster. You can check the Makefile to see them all. The simplest command to build is:
make build
To run the motion detection demo, run the following command:
python src/motion-demo.py
To run the video stream collector, run the following command:
python src/video-stream-collector.py --config {{ CONFIG_FILE }}
Where CONFIG_FILE
is the path to the configuration file. Multiple example configuration files can be found in the config/collector
directory.
All of the configuration files can be used to test the video stream collector.
To run the video stream processor, we need to have the Spark service running. Once the service is running, run the following command:
pyspark < src/video_stream_processor.py