Skip to content

ramonatarantino/mocular-3d-object-detection-tracking

Repository files navigation


Omni3D with Custom Object Tracker

Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild
This repository extends the Omni3D model by integrating a custom object tracking mechanism, enhancing 3D detection with continuous tracking across frames.

Table of Contents:

  1. Overview
  2. Installation
  3. Running the Demo
  4. Training
  5. Inference
  6. Tracker Implementation
  7. Chat with Phi 3 Vision
  8. Citing Omni3D
  9. License
  10. Contributing

Overview

Omni3D, originally developed by Garrick Brazil et al., is a state-of-the-art model for 3D object detection. This project incorporates a custom tracking mechanism to extend the detection capabilities, enabling real-time object tracking in various environments.

For more details on the Omni3D project, refer to the original repository.

Installation

Follow the steps below to set up the environment:

# Create and activate a new conda environment
conda create -n cubercnn python=3.8
source activate cubercnn

# Install main dependencies
conda install -c fvcore -c iopath -c conda-forge -c pytorch3d -c pytorch fvcore iopath pytorch3d pytorch=1.8 torchvision=0.9.1 cudatoolkit=10.1

# Install additional dependencies
pip install cython opencv-python
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
conda install -c conda-forge scipy seaborn

Running the Demo

# Download sample images
sh demo/download_demo_COCO_images.sh

# Run the demo with the custom tracker
python demo/demo_detection.py \
--config-file cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \
--input-folder "datasets/coco_examples" \
--threshold 0.25 --display \
MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \
OUTPUT_DIR output/demo_with_tracking

Running the Demo with Tracker

# Download sample images
sh demo/download_demo_COCO_images.sh

# Run the demo with the custom tracker
python demo/demo_tracker.py \
--config-file cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \
--input-video "demo/video_indoor2.mp4" \
--threshold 0.40 --display 

Training

To train the Omni3D model with tracking:

python tools/train_net.py \
  --config-file configs/Base_Omni3D.yaml \
  OUTPUT_DIR output/omni3d_with_tracking

Inference

python tools/train_net.py \
  --eval-only --config-file cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \
  MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \
  OUTPUT_DIR output/evaluation

Tracker Implementation

This project adds an object tracker to the original Omni3D model. The tracker matches detected objects across frames using a custom algorithm based on:

  • 3D bounding box information
  • GIoU and Iou 3D computation
  • Object centers
  • Category types
  • Chamfer Distance

Key features:

  • Custom matching logic for continuous tracking across frames
  • High-cost match handling for challenging detections

Progetto senza titolo (1) (1) Screenshot from 2024-07-23 16-27-00 (1)

Citing Omni3D

Please cite the original Omni3D paper:

@inproceedings{brazil2023omni3d,
  author =       {Garrick Brazil and Abhinav Kumar and Julian Straub and Nikhila Ravi and Justin Johnson and Georgia Gkioxari},
  title =        {{Omni3D}: A Large Benchmark and Model for {3D} Object Detection in the Wild},
  booktitle =    {CVPR},
  address =      {Vancouver, Canada},
  month =        {June},
  year =         {2023},
  organization = {IEEE},
}

Chat with Phi-3 Vision

CHAT4

This folder is part of the Monocular 3D Object Detection and Tracking project and contains a Streamlit-based application that allows users to interact with a vision model, specifically Phi-3 Vision, to analyze images and return detailed descriptions.

Overview

The application is designed to:

  • Receive images from a server via a socket connection
  • Allow users to submit specific queries about the images
  • Leverage Phi-3 Vision to provide detailed descriptions of image content

Key features:

  • Real-time communication with a server for image acquisition
  • Text-based interaction for specific image details
  • Automatic frame fetching upon query submission
  • Customizable chat interface that resets after each response
  • Responsive design for smooth user experience

Installation

  1. Clone the repository:

    git clone https://github.com/ramonatarantino/mocular-3d-object-detection-tracking.git
    cd mocular-3d-object-detection-tracking/chat-with-phi-3-vision
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the application:

    streamlit run app.py
  4. Start the server: Ensure the server providing the image stream is running.

License (Chat with Phi 3 Vision)

This project is licensed under the MIT License.
The server code is based on the work from this repository.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published