[KD] add uld and jsd #2253

kashif · 2025-01-11T10:22:07Z

Description

Add ULD loss and JSD KD Loss

Paper

https://arxiv.org/abs/2402.12030

winglian · 2025-02-06T16:40:44Z

src/axolotl/integrations/kd/topk_logprob/uld.py

+    # Get masked student probabilities
+    student_probs_masked = student_probs[valid_mask]
+
+    # Get masked teacher probabilities
+    teacher_probs_masked = teacher_probs[valid_mask]
+
+    # Sort student probabilities in descending order
+    student_probs_sorted, _ = torch.sort(student_probs_masked, dim=-1, descending=True)
+
+    # For teacher probs, we already have top-K, so just ensure they're sorted
+    teacher_probs_sorted, _ = torch.sort(teacher_probs_masked, dim=-1, descending=True)


@kashif since the token_ids don't match, would it be better to sort, then. just take top_k of the student distribution?

so let me check with the definition of wasserstein loss

winglian force-pushed the kd-trainer branch 2 times, most recently from 2dcbc0d to 35a84f2 Compare January 15, 2025 04:30

add uld and jsd

9f2ee4f

winglian force-pushed the uld branch from 7b0d8ea to 9f2ee4f Compare January 15, 2025 16:02

winglian reviewed Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KD] add uld and jsd #2253

[KD] add uld and jsd #2253

kashif commented Jan 11, 2025 •

edited by winglian

Loading

winglian Feb 6, 2025

kashif Feb 6, 2025

[KD] add uld and jsd #2253

Are you sure you want to change the base?

[KD] add uld and jsd #2253

Conversation

kashif commented Jan 11, 2025 • edited by winglian Loading

Description

Paper

winglian Feb 6, 2025

Choose a reason for hiding this comment

kashif Feb 6, 2025

Choose a reason for hiding this comment

kashif commented Jan 11, 2025 •

edited by winglian

Loading