CS643 Programming Assignment-2

This project demonstrates the distributed execution of a machine learning task using Apache Spark on a cluster of 4 instances. The task involves training a model to evaluate wine quality and achieving an F1 score of 0.7634 on the validation dataset using an SVC model. The instructions below guide you through the complete setup, from configuring instances to running the Spark job using Docker.

Link to Docker Image : https://hub.docker.com/r/shreyasshende/wine-quality-eval

Steps to Setup and Execute the Project

1. SSH into Instances

Log into your 4 instances using SSH. Replace <instance-ip> with the IP address of each instance.

ssh -i /path/to/your/private-key.pem ubuntu@<instance-ip>

2. Generate SSH Keys

On each instance, generate an SSH key pair to enable passwordless communication.

ssh-keygen -t rsa -N "" -f /home/ubuntu/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub

Copy the public key from each instance and add it to the authorized_keys file of all other instances.

3. Configure `/etc/hosts`

On each instance, map the hostnames of all instances in the /etc/hosts file.

sudo vim /etc/hosts

Add the following entries (replace <ip-address> with actual instance IPs):

<ip-address> nn
<ip-address> dd1
<ip-address> dd2
<ip-address> dd3

4. Install Required Software

Install Java, Maven, and Spark on all instances.

Install Java:

sudo apt update
sudo apt install openjdk-8-jdk -y

Install Maven:

sudo apt install maven -y

Install Spark:

Download and extract Spark:

wget https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz
tar -xvzf spark-3.4.1-bin-hadoop3.tgz

Set environment variables:

echo "export SPARK_HOME=/home/ubuntu/spark-3.4.1-bin-hadoop3" >> ~/.bashrc
echo "export PATH=\$SPARK_HOME/bin:\$PATH" >> ~/.bashrc
source ~/.bashrc

5. Configure Spark Workers

Copy the workers.template file to workers and update it:

cp $SPARK_HOME/conf/workers.template $SPARK_HOME/conf/workers
vim $SPARK_HOME/conf/workers

Add the following lines:

localhost
dd1/ip-address
dd2/ip-address
dd3/ip-address

6. Setup Training and Evaluation Directories

Create Training and Eval directories on all instances:

mkdir ~/Training
mkdir ~/Eval

Place the Java code files for training and evaluation into these directories.

7. Run the Training Code

Use the following command to execute the training code with Spark:

spark-submit --master spark://<master-ip>:7077 --class com.example.WineQualityEval /home/ubuntu/Training/wine-quality-train-1.0-SNAPSHOT.jar

Replace <master-ip> with the Spark master instance's IP address.

8. Create a Docker Image

Create a Docker image to package your application.

Dockerfile:

# Use the official Spark image as a base image
FROM bitnami/spark:3.4.1

# Set the working directory inside the container
WORKDIR /app

# Copy WineQualityEval (containing the JAR) to the container
COPY WineQualityEval /app/WineQualityEval

# Copy WineQualityPredictionModel to /home/ubuntu
COPY WineQualityPredictionModel /home/ubuntu/WineQualityPredictionModel

# Copy ValidationDataset.csv to /home/ubuntu
COPY ValidationDataset.csv /home/ubuntu/ValidationDataset.csv

# Set the command to run your Spark job
CMD ["spark-submit", "--master", "local", "--class", "com.example.WineQualityEval", "/app/WineQualityEval/target/wine-quality-eval-1.0-SNAPSHOT.jar"]

Build and Push Docker Image:

sudo docker build -t shreyasshende/wine-quality-eval:latest .
sudo docker push shreyasshende/wine-quality-eval:latest

9. Pull and Run the Docker Image

Pull the Docker image on each instance:

sudo docker pull shreyasshende/wine-quality-eval:latest

Run the container:

sudo docker run shreyasshende/wine-quality-eval:latest

10. Results

The F1 score achieved on the validation dataset is:

F1 Score: 0.7634

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
CS643 - Programming Assignment-2 -Shreyas_Shende.pdf		CS643 - Programming Assignment-2 -Shreyas_Shende.pdf
README.md		README.md
WinePredModel.zip		WinePredModel.zip
WineQualityEval.java		WineQualityEval.java
WineQualityPrediction.java		WineQualityPrediction.java
pom_eval.xml		pom_eval.xml
pom_train.xml		pom_train.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS643 Programming Assignment-2

Steps to Setup and Execute the Project

1. SSH into Instances

2. Generate SSH Keys

3. Configure `/etc/hosts`

4. Install Required Software

5. Configure Spark Workers

6. Setup Training and Evaluation Directories

7. Run the Training Code

8. Create a Docker Image

9. Pull and Run the Docker Image

10. Results

About

Uh oh!

Releases

Packages

Languages

ShreyasShende3/CS643_Programming_Assignment-2

Folders and files

Latest commit

History

Repository files navigation

CS643 Programming Assignment-2

Steps to Setup and Execute the Project

1. SSH into Instances

2. Generate SSH Keys

3. Configure /etc/hosts

4. Install Required Software

5. Configure Spark Workers

6. Setup Training and Evaluation Directories

7. Run the Training Code

8. Create a Docker Image

9. Pull and Run the Docker Image

10. Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

3. Configure `/etc/hosts`

Packages