Video Object Detection (Yolov5) and Multimodal Vision Language Model (Llava 13b)
Integrating YOLOv (You Only Look Once) with Large Language Models (LLMs) for Enhanced Object Detection and Contextual Understanding. This project combines state-of-the-art object detection with advanced language processing to improve accuracy and provide detailed context for detected objects. Ideal for applications in autonomous systems, surveillance, and AI-driven analytics.
HuggingFace Hub uploaded a fine-tuned model checkpoints- https://huggingface.co/AgamP/LLM_Custom_1/tree/main