Skip to content

Fix training-loss double-shift in Florence2 and CohereASR#46916

Closed
muhamedfazalps wants to merge 4 commits into
huggingface:mainfrom
muhamedfazalps:fix/training-loss-double-shift-florence2-cohereasr
Closed

Fix training-loss double-shift in Florence2 and CohereASR#46916
muhamedfazalps wants to merge 4 commits into
huggingface:mainfrom
muhamedfazalps:fix/training-loss-double-shift-florence2-cohereasr

Conversation

@muhamedfazalps

Copy link
Copy Markdown

self.loss_function in Florence2 and CohereASR uses ForCausalLMLoss which shifts labels again internally, causing the model to train against labels[..., 1:] instead of the actual labels (since the forward method already shifts labels into decoder_input_ids).

Same pattern as the Moonshine fix in #46784. Replacing with plain CrossEntropyLoss so the loss is computed against the original labels without a second shift.

Fixes #46897 (Florence2) and #46894 (CohereASR).

muhamedfazalps and others added 3 commits June 26, 2026 16:36
Replace self.loss_function (ForCausalLMLoss) with plain CrossEntropyLoss
to prevent double-shifting of labels. The forward method already shifts
labels into decoder_input_ids, so the loss must be computed against the
original labels without a second shift.

Same pattern as the Moonshine fix in huggingface#46784.

Fixes huggingface#46897 (Florence2) and huggingface#46894 (CohereASR)
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: cohere_asr, florence2

@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

@Rocketknight1

Copy link
Copy Markdown
Member

Duplicate of #46898!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Florence2 training-loss double-shift bug (same pattern as Moonshine #46784)

2 participants