CBIR-system-using-PySpark-and-Alluxio

The task of Content Based Image Retrieval (CBIR) is becoming increasingly complex due to the large number of images available on the internet. This task involves retrieval of similar images based on an input image given by the user. To enable faster computation of similar images, the proposed work uses Apache Spark and Alluxio, previously known as Tachyon. Spark is an open-source software used for processing Big Data. It provides parallelism that reduces computational time. Alluxio on the other hand is a virtual distributed storage system. Although models using Spark for CBIR have been proposed earlier, the proposed model aims at reducing the retrieval time of images by optimizing this task by modifying the feature extraction mechanism. Histogram of oriented gradients (HOG) feature descriptor has been used to find the similarity between images. The K Nearest Neighbours (KNN) algorithm has been used and optimized to compute the top K similar images to query images.

Documents

TinyImageNet.tar.xz - dataset zip
README.md - readme file
CBIR Report_19BCE1328_19BCE1295_19BCE1614.docx - project report
Code - uploadIMages.py, feature_extraction.py, similarity.ipynb
Reference Papers.zip - folder of reference paper
CBIR REVIEW 3 PPT.pptx - final ppt
CBIR_video - demonstration video

Requirements

Pre-requisite:

System with RAM greater than 4 GB (> 8 GB is recommended) for better performance.
Any Linux based Operating System (Ubuntu 20.04 preferred)
Installed Apache saprk
Installed Alluxio

The other libraries and packages include:

opencv-python
numpy
pandas
pyspark
scikit-image
skimage
pillow

Steps to run

Clone the github Repository.
Extract the file named TinyImageNet.tar.xz in the same repository to extract dataset.
Start alluxio using following commands :

$ cd <PATH_TO_ALLUXIO>
$ ./bin/alluxio format
$ ./bin/alluxio-start.sh local SudoMount

Visit http://localhost:19999 and http://localhost:30000 to check whether alluxio is started or not.
Run UploadIMages.py to store the images in Alluxio File System.
Run feature_extraction.py to extract HOG features from the images and store it in Alluxio in parquet format.
Finally run similarity.ipynb to run KNN to retrieve similar images.
To Stop alluxio :

$ ./bin/alluxio-stop.sh local

SPARK STEPS

Refer this link to install spark on your system.

ALLUXIO STEPS

Download Alluxio from this page. Select the desired release followed by the distribution built for default Hadoop. Unpack the downloaded file with the following commands.

$ tar -xzf alluxio-2.7.2-bin.tar.gz
$ cd alluxio-2.7.2

In the ${ALLUXIO_HOME}/conf directory, create the conf/alluxio-site.properties configuration file by copying the template file.

$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties

Set alluxio.master.hostname in conf/alluxio-site.properties to localhost.

$ echo "alluxio.master.hostname=localhost" >> conf/alluxio-site.properties

Alluxio provides commands to ensure the system environment is ready for running Alluxio services. Run the following command to validate the environment for running Alluxio locally:

$ ./bin/alluxio validateEnv local

Alluxio needs to be formatted before starting the process. The following command formats the Alluxio journal and worker storage directories.

$ ./bin/alluxio format
$ ./bin/alluxio-start.sh local SudoMount

SPARK SETUP FOR ALLUXIO

The Alluxio client jar must be distributed across the all nodes where Spark drivers or executors are running. Place the client jar on the same local path (e.g. /<PATH_TO_ALLUXIO>/client/alluxio-2.7.2-client.jar) on each node.

spark.driver.extraClassPath   /<PATH_TO_ALLUXIO>/client/alluxio-2.7.2-client.jar
spark.executor.extraClassPath /<PATH_TO_ALLUXIO>/client/alluxio-2.7.2-client.jar

Sample Output

References

https://spark.apache.org/downloads.html
https://computingforgeeks.com/how-to-install-apache-spark-on-ubuntu-debian/
https://www.alluxio.io/download/
https://docs.alluxio.io/os/user/stable/en/Overview.html
https://www.sciencedirect.com/science/article/pii/S1319157818307146

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CBIR-system-using-PySpark-and-Alluxio

Documents

Requirements

Pre-requisite:

The other libraries and packages include:

Steps to run

SPARK STEPS

ALLUXIO STEPS

SPARK SETUP FOR ALLUXIO

Sample Output

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

CBIR-system-using-PySpark-and-Alluxio

Documents

Requirements

Pre-requisite:

The other libraries and packages include:

Steps to run

SPARK STEPS

ALLUXIO STEPS

SPARK SETUP FOR ALLUXIO

Sample Output

References