Calibrate confidence scores #273

sfmig · 2024-08-16T16:48:00Z

Is your feature request related to a problem? Please describe.
We usually interpret confidence scores as a proxy for the error estimate in the keypoints prediction. However, it is well known that neural networks tend to be "overly confident" in their predictions. For example, for the multiclass classification case, reference [1] says:

the softmax output of modern neural networks, which typically is interpreted as a categorical distribution in classification, is poorly calibrated.

It would be very useful to be able to produce calibrated confidence scores of the keypoint predictions. That would allow us to compare results across frameworks, better filter high/low confidence values, and better interpret model performance.

Describe the solution you'd like
We could consider having a method in movement that calibrates confidence scores.

We could implement something similar to what keypoint-moseq does. They have functionality to fit a linear model to the relationship between keypoint error and confidence:

[the function] creates a widget for interactive annotation in jupyter lab. Users mark correct keypoint locations for a sequence of frames, and a regression line is fit to the log(confidence), log(error) pairs obtained through annotation. The regression coefficients are used during modeling to set a prior on the noise level for each keypoint on each frame.

Describe alternatives you've considered
\

Additional context
Nice explanations for the case of classification (note that in pose estimation we do a regression problem, not a classification one):

From a quick search I found:

[1] this paper, on the calibration of human pose estimation. They propose a neural network that learns specific adjustments for a pose estimator. Seems out of scope for movement but may be a useful read to understand the problem better.
this paper for object detection, could be similarly useful.

The text was updated successfully, but these errors were encountered:

sfmig · 2024-08-27T15:22:43Z

This euroscipy tutorial may be useful for this work

sfmig · 2025-03-03T15:48:43Z

Note that pose estimation is a regression problem not a classification one - we probably want to look into ways of applying the same to a regression problem in a reasonable way.

For example, maybe we can "transform" the problem into a classification one by establishing that a keypoint is correctly predicted if close enough to the ground truth label. This seems reasonable at a first glance?

sfmig added the enhancement New optional feature label Aug 16, 2024

sfmig added this to movement progress tracker Aug 27, 2024

github-project-automation bot moved this to 🤔 Triage in movement progress tracker Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calibrate confidence scores #273

Calibrate confidence scores #273

sfmig commented Aug 16, 2024 •

edited

Loading

sfmig commented Aug 27, 2024

sfmig commented Mar 3, 2025

Calibrate confidence scores #273

Calibrate confidence scores #273

Comments

sfmig commented Aug 16, 2024 • edited Loading

sfmig commented Aug 27, 2024

sfmig commented Mar 3, 2025

sfmig commented Aug 16, 2024 •

edited

Loading