Logovision - Logo Detection and Recognition

Overview

This project implements a logo detection and recognition system using the RF-DETR model trained on the LogoDet-3K dataset. The goal is to detect bounding boxes for logos in images, with the model generalizing to logos beyond those in the training set. A Gradio-based UI allows users to upload images, view detected logo bounding boxes, and send cropped logo regions to the Gemini API for recognition.

Setup and Execution Instructions

Prerequisites

Python 3.8+
Git
Gemini API key (to be stored in a .env file)

Installation

Clone the repository:

git clone https://github.com/Gk-rohan/Logovision
cd Logovision

Install dependencies:
```
pip install -r requirements.txt
```
Download the RF-DETR model weights:
```
python download_weights.py
```
Set up the Gemini API key:
- Create a .env file in the project root with the following content:
```
GEMINI_API_KEY=your-api-key
```
- The .env file is automatically loaded by the application using os.getenv.

Local Training

Open the training notebook:

notebooks/rf-detr_training.ipynb

Local UI Demo

Launch the Gradio UI:
```
python gradio_demo.py
```
Open the provided URL (e.g., http://127.0.0.1:7860) in your browser.
Upload an image to detect logos. The UI displays bounding boxes annotated frame and sends cropped logo regions to the Gemini API for recognition and displays the recognized brand names (Try at lower confidence threshold to see more logos on harder images).

GenAI Script

The gradio_demo.py script handles both logo detection and recognition:
- Detection: Uses the trained RF-DETR model to generate bounding boxes.
- Recognition: Crops detected regions and sends them to the Gemini API for logo identification.
The script loads the Gemini API key from the .env file using os.getenv.

Public URL

The Gradio demo is hosted publicly at: https://huggingface.co/spaces/gupta005/RF_DETR_LOGO_DET.

Model and Dataset Details

Model: RF-DETR (GitHub)*******
- A transformer-based object detection model fine-tuned for logo detection.
- Trained to predict bounding boxes for logos in images.
- Generalizes to unseen logos through robust feature learning on the diverse LogoDet-3K dataset.
Dataset: LogoDet-3K
- Contains 3,000 logo classes with annotated bounding boxes.
- Used for training and validation to ensure model generalization.
Gemini API:
- Used for logo recognition on cropped bounding box regions.
- API key is securely managed via the .env file and loaded with os.getenv.

Sample Results

Below are sample images showing bounding boxes generated by the RF-DETR model for two test images:

img1-output.jpeg:
img2-output.jpeg:

These images are located in the results/ folder and demonstrate the model's ability to detect logos accurately.

Assumptions

The LogoDet-3K dataset is representative of diverse logo types, enabling generalization.
The Gemini API provides reliable logo recognition for cropped regions.
Users have access to a GPU for faster training and inference.
Input images are of reasonable quality and resolution for accurate detection.

Challenges

Generalization: Some logos not in the training set were harder to detect, requiring careful hyperparameter tuning in the notebook (Try to reduce the confidence threshold).
API Rate Limits: The Gemini API has rate limits, which may affect real-time performance for multiple requests.
Model Size: RF-DETR large is computationally intensive, which may pose challenges for deployment on low-resource devices.
RT-DETRv2 Attempt: I attempted to train the RT-DETRv2 model using the notebook notebooks/rt-detrv2_training.ipynb, but training was not completed due to insufficient GPU memory resources.

Potential Improvements

Model Training: Consider fine-tuning the RF-DETR model for even more accurate logo detection in real world scenarios.
Model Benchmarks: Conduct benchmarks to compare RF-DETR with other state-of-the-art object detection models.
Data Augmentation: Apply advanced augmentation techniques (e.g., rotation, color jitter) in the training notebook to improve model robustness.
Model Optimization: Explore model pruning or quantization to reduce inference time for deployment.
Alternative APIs: Test other vision APIs/Models (e.g., Google Vision, AWS Rekognition) for improved recognition accuracy.
Active Learning: Use active learning to iteratively improve the model by selecting challenging samples for retraining.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
input		input
notebooks		notebooks
results		results
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
download_weights.py		download_weights.py
gradio_demo.py		gradio_demo.py
predict.py		predict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Logovision - Logo Detection and Recognition

Overview

Setup and Execution Instructions

Prerequisites

Installation

Local Training

Local UI Demo

GenAI Script

Public URL

Model and Dataset Details

Sample Results

Assumptions

Challenges

Potential Improvements

About

Uh oh!

Releases

Packages

Languages

Gk-rohan/Logovision

Folders and files

Latest commit

History

Repository files navigation

Logovision - Logo Detection and Recognition

Overview

Setup and Execution Instructions

Prerequisites

Installation

Local Training

Local UI Demo

GenAI Script

Public URL

Model and Dataset Details

Sample Results

Assumptions

Challenges

Potential Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages