Replies: 4 comments 4 replies
-
Let me convert to Q&A |
Beta Was this translation helpful? Give feedback.
-
@Rasantis based on your question I think it is detection issue normally If tracking ID change too much that's because detections change too much. I see some low conf rates so it could be just your training not enough IMO |
Beta Was this translation helpful? Give feedback.
-
Sure, here is the video, i´m sending 3 videos, the first one it´s the normal video, the second its just running the inference, and the third is the line counter with tracking: 1.mp42.mp43.mp4 |
Beta Was this translation helpful? Give feedback.
-
Hi @Rasantis. In my experience, the orientation of the camera matters quite a bit to to the performance of the detector. If you don't have very much training data from that angle, your detector will not perform well. (Think about how different a person looks from the top than from the side) I have experimented with many different multi-object trackers, and have found that no matter what tracker is used, if your detector is performing poorly, then the tracker will perform poorly. (garbage in garbage out rules). But if you want to improve the performance of your tracker without getting more training data from that angle for your detector, I would first try reducing the 'minimum_matching_threshold' and ensuring that the frame rate of your video is correct. These changes will help as it seems like the framerate of the video is quite low. This means that the tracker's predictions won't be as accurate as the amount an object moves from one frame to the next will be quite large compared to a high frame rate video. The first change makes it so that there is a lower threshold for the overlap measurement between the trackers prediction and the detectors input. The second change will help the tracker to know how long to keep predicting tracks after they are lost. Be aware that these changes probably won't help too much without having a better detector. The tracker is really designed to have a majority of the frames having the correct detections with only a few misses sprinkled in. Try to get the confidence of the detections in the .75-.90 range for best performance. |
Beta Was this translation helpful? Give feedback.
-
Search before asking
Question
Hello Supervision, I'm working with your line crossing counting solution I created this .py script, it works in common scenarios and with the camera in the ideal position (which is great), but I'm working in a scenario where the camera is positioned on the ceiling and pointing towards the floor... it can detect it, but the tracking does not remain constant because it is apparently looking for a center ID, which I think is hard to happen as the image is not three dimensional.... how could it adapt the tracking or this center ID for this type of scenario... The model that I´m using here it´s alredy a custom model
Additional
I´m using this code to run the inference/counting:
import os
import numpy as np
import supervision as sv
from ultralytics import YOLO
Define o caminho do vídeo fonte e do vídeo alvo
#HOME = os.getcwd()
SOURCE_VIDEO_PATH = os.path.join( "Identificacao-vendedor.mp4")
TARGET_VIDEO_PATH = SOURCE_VIDEO_PATH.replace(".mp4", "_output.mp4")
Inicialização do modelo YOLO
MODEL = "best (5).pt"
model = YOLO(MODEL)
model.fuse()
Define as classes de interesse
selected_classes = [0,1] # Exemplo: ID 0 para 'pessoa'
Configuração da zona de linha e anotadores
LINE_START = sv.Point(300, 600)
LINE_END = sv.Point(292, 50)
line_zone = sv.LineZone(start=LINE_START, end=LINE_END)
line_zone_annotator = sv.LineZoneAnnotator(thickness=4, text_thickness=4, text_scale=1)
Configuração do rastreador ByteTrack
byte_tracker = sv.ByteTrack(track_thresh=0.05, track_buffer=30, match_thresh=0.8, frame_rate=24)
Anotadores adicionais
box_annotator = sv.BoxAnnotator(thickness=3, text_thickness=0, text_scale=0.5)
trace_annotator = sv.TraceAnnotator(thickness=4, trace_length=50)
Função de callback para processamento de cada frame
def callback(frame: np.ndarray, index: int) -> np.ndarray:
results = model(frame, verbose=True, device='0', conf=0.015, iou=0.02, imgsz=1280)[0]
detections = sv.Detections.from_ultralytics(results)
detections = detections[np.isin(detections.class_id, selected_classes)]
detections = byte_tracker.update_with_detections(detections)
line_zone.trigger(detections)
Função para processar o vídeo
def process_video(source_path: str, target_path: str, callback, debug: bool = False) -> None:
source_video_info = sv.VideoInfo.from_video_path(video_path=source_path)
with sv.VideoSink(target_path=target_path, video_info=source_video_info) as sink:
for index, frame in enumerate(sv.get_video_frames_generator(source_path=source_path)):
result_frame = callback(frame, index)
sink.write_frame(frame=result_frame)
if debug and index == 0: # Exemplo de condição para depuração
break
Chamada da função para processar o vídeo
process_video(source_path=SOURCE_VIDEO_PATH, target_path=TARGET_VIDEO_PATH, callback=callback, debug=False)
Beta Was this translation helpful? Give feedback.
All reactions