This project implements a logo detection and recognition system using the RF-DETR model trained on the LogoDet-3K dataset. The goal is to detect bounding boxes for logos in images, with the model generalizing to logos beyond those in the training set. A Gradio-based UI allows users to upload images, view detected logo bounding boxes, and send cropped logo regions to the Gemini API for recognition.
- Python 3.8+
- Git
- Gemini API key (to be stored in a
.envfile)
-
Clone the repository:
git clone https://github.com/Gk-rohan/Logovision cd Logovision -
Install dependencies:
pip install -r requirements.txt
-
Download the RF-DETR model weights:
python download_weights.py
-
Set up the Gemini API key:
- Create a
.envfile in the project root with the following content:GEMINI_API_KEY=your-api-key
- The
.envfile is automatically loaded by the application usingos.getenv.
- Create a
Open the training notebook:
notebooks/rf-detr_training.ipynb- Launch the Gradio UI:
python gradio_demo.py
- Open the provided URL (e.g.,
http://127.0.0.1:7860) in your browser. - Upload an image to detect logos. The UI displays bounding boxes annotated frame and sends cropped logo regions to the Gemini API for recognition and displays the recognized brand names (Try at lower confidence threshold to see more logos on harder images).
- The
gradio_demo.pyscript handles both logo detection and recognition:- Detection: Uses the trained RF-DETR model to generate bounding boxes.
- Recognition: Crops detected regions and sends them to the Gemini API for logo identification.
- The script loads the Gemini API key from the
.envfile usingos.getenv.
The Gradio demo is hosted publicly at: https://huggingface.co/spaces/gupta005/RF_DETR_LOGO_DET.
- Model: RF-DETR (GitHub)*******
- A transformer-based object detection model fine-tuned for logo detection.
- Trained to predict bounding boxes for logos in images.
- Generalizes to unseen logos through robust feature learning on the diverse LogoDet-3K dataset.
- Dataset: LogoDet-3K
- Contains 3,000 logo classes with annotated bounding boxes.
- Used for training and validation to ensure model generalization.
- Gemini API:
- Used for logo recognition on cropped bounding box regions.
- API key is securely managed via the
.envfile and loaded withos.getenv.
Below are sample images showing bounding boxes generated by the RF-DETR model for two test images:
These images are located in the results/ folder and demonstrate the model's ability to detect logos accurately.
- The LogoDet-3K dataset is representative of diverse logo types, enabling generalization.
- The Gemini API provides reliable logo recognition for cropped regions.
- Users have access to a GPU for faster training and inference.
- Input images are of reasonable quality and resolution for accurate detection.
- Generalization: Some logos not in the training set were harder to detect, requiring careful hyperparameter tuning in the notebook (Try to reduce the confidence threshold).
- API Rate Limits: The Gemini API has rate limits, which may affect real-time performance for multiple requests.
- Model Size: RF-DETR large is computationally intensive, which may pose challenges for deployment on low-resource devices.
- RT-DETRv2 Attempt: I attempted to train the RT-DETRv2 model using the notebook
notebooks/rt-detrv2_training.ipynb, but training was not completed due to insufficient GPU memory resources.
- Model Training: Consider fine-tuning the RF-DETR model for even more accurate logo detection in real world scenarios.
- Model Benchmarks: Conduct benchmarks to compare RF-DETR with other state-of-the-art object detection models.
- Data Augmentation: Apply advanced augmentation techniques (e.g., rotation, color jitter) in the training notebook to improve model robustness.
- Model Optimization: Explore model pruning or quantization to reduce inference time for deployment.
- Alternative APIs: Test other vision APIs/Models (e.g., Google Vision, AWS Rekognition) for improved recognition accuracy.
- Active Learning: Use active learning to iteratively improve the model by selecting challenging samples for retraining.

