Skip to content

Logovision detects and recognizes logos in images using the RF-DETR model and LogoDet-3K dataset, with a Gradio UI for real-time detection and brand identification via the Gemini API.

Notifications You must be signed in to change notification settings

Gk-rohan/Logovision

Repository files navigation

Logovision - Logo Detection and Recognition

Overview

This project implements a logo detection and recognition system using the RF-DETR model trained on the LogoDet-3K dataset. The goal is to detect bounding boxes for logos in images, with the model generalizing to logos beyond those in the training set. A Gradio-based UI allows users to upload images, view detected logo bounding boxes, and send cropped logo regions to the Gemini API for recognition.

Setup and Execution Instructions

Prerequisites

  • Python 3.8+
  • Git
  • Gemini API key (to be stored in a .env file)

Installation

  1. Clone the repository:

    git clone https://github.com/Gk-rohan/Logovision
    cd Logovision
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download the RF-DETR model weights:

    python download_weights.py
  4. Set up the Gemini API key:

    • Create a .env file in the project root with the following content:
      GEMINI_API_KEY=your-api-key
    • The .env file is automatically loaded by the application using os.getenv.

Local Training

Open the training notebook:

notebooks/rf-detr_training.ipynb

Local UI Demo

  1. Launch the Gradio UI:
    python gradio_demo.py
  2. Open the provided URL (e.g., http://127.0.0.1:7860) in your browser.
  3. Upload an image to detect logos. The UI displays bounding boxes annotated frame and sends cropped logo regions to the Gemini API for recognition and displays the recognized brand names (Try at lower confidence threshold to see more logos on harder images).

GenAI Script

  • The gradio_demo.py script handles both logo detection and recognition:
    • Detection: Uses the trained RF-DETR model to generate bounding boxes.
    • Recognition: Crops detected regions and sends them to the Gemini API for logo identification.
  • The script loads the Gemini API key from the .env file using os.getenv.

Public URL

The Gradio demo is hosted publicly at: https://huggingface.co/spaces/gupta005/RF_DETR_LOGO_DET.

Model and Dataset Details

  • Model: RF-DETR (GitHub)*******
    • A transformer-based object detection model fine-tuned for logo detection.
    • Trained to predict bounding boxes for logos in images.
    • Generalizes to unseen logos through robust feature learning on the diverse LogoDet-3K dataset.
  • Dataset: LogoDet-3K
    • Contains 3,000 logo classes with annotated bounding boxes.
    • Used for training and validation to ensure model generalization.
  • Gemini API:
    • Used for logo recognition on cropped bounding box regions.
    • API key is securely managed via the .env file and loaded with os.getenv.

Sample Results

Below are sample images showing bounding boxes generated by the RF-DETR model for two test images:

  • img1-output.jpeg: Sample 1
  • img2-output.jpeg: Sample 2

These images are located in the results/ folder and demonstrate the model's ability to detect logos accurately.

Assumptions

  • The LogoDet-3K dataset is representative of diverse logo types, enabling generalization.
  • The Gemini API provides reliable logo recognition for cropped regions.
  • Users have access to a GPU for faster training and inference.
  • Input images are of reasonable quality and resolution for accurate detection.

Challenges

  • Generalization: Some logos not in the training set were harder to detect, requiring careful hyperparameter tuning in the notebook (Try to reduce the confidence threshold).
  • API Rate Limits: The Gemini API has rate limits, which may affect real-time performance for multiple requests.
  • Model Size: RF-DETR large is computationally intensive, which may pose challenges for deployment on low-resource devices.
  • RT-DETRv2 Attempt: I attempted to train the RT-DETRv2 model using the notebook notebooks/rt-detrv2_training.ipynb, but training was not completed due to insufficient GPU memory resources.

Potential Improvements

  • Model Training: Consider fine-tuning the RF-DETR model for even more accurate logo detection in real world scenarios.
  • Model Benchmarks: Conduct benchmarks to compare RF-DETR with other state-of-the-art object detection models.
  • Data Augmentation: Apply advanced augmentation techniques (e.g., rotation, color jitter) in the training notebook to improve model robustness.
  • Model Optimization: Explore model pruning or quantization to reduce inference time for deployment.
  • Alternative APIs: Test other vision APIs/Models (e.g., Google Vision, AWS Rekognition) for improved recognition accuracy.
  • Active Learning: Use active learning to iteratively improve the model by selecting challenging samples for retraining.

About

Logovision detects and recognizes logos in images using the RF-DETR model and LogoDet-3K dataset, with a Gradio UI for real-time detection and brand identification via the Gemini API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published