Add TVD Loss Kernel #324

saurabhkoshatwar · 2024-10-26T03:41:40Z

Summary

Resolves #281. Implements the TVD (Total Variation Distance) kernel by computing both the loss and gradient in the forward pass.

Testing Done

Implemented tests to verify that the results of the forward and backward passes match the Torch implementation. Additionally, added a script to benchmark the memory usage and speed of the Liger implementation compared to Torch, with the results shown below.

Hardware Type: Nvidia H100 (80GB PCIe)
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

* Add fused tvd loss

saurabhkoshatwar · 2024-10-26T03:51:37Z

@ByronHsu @qingquansong @lancerts Please let me know if any changes are required.

yundai424

Thanks a lot for the contribution! 😄

yundai424 · 2024-11-08T17:44:32Z

test/transformers/test_tvd.py

+        pytest.param(
+            torch.bfloat16,
+            1e-8,
+            5e-2,


could you help to experiment what is the lowest rtol that will not fail this test for bf16? Thanks!

yundai424 · 2024-11-08T17:46:54Z

test/transformers/test_tvd.py

+from liger_kernel.transformers.tvd import LigerTVDLoss
+
+
+class TorchTVDLoss(torch.nn.Module):


I feel it'll be very helpful if we can add ignore index along with this PR to make TVD complete, similar to how JSD is doing it -- https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/ops/jsd.py

+1 which would be very helpful to cover broader use cases

qingquansong

Thanks for the efforts! Could you also add this to the init function in transformers folder as well same as JSD? https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/transformers/__init__.py#L10

Tcc0403 · 2024-11-08T18:59:30Z

src/liger_kernel/ops/tvd.py

+        # TVD(P || Q) = 0.5 * |P - Q|
+        tv_loss = 0.5 * tl.abs(p - q)
+
+        grad_res = tl.where(p > q, 0.5, -0.5)


since we're doing gradients calculation in forward pass already, we can divide gradients by BT (BT * V) based on the reduction mode here to avoid extra calculations in backward pass and saving reduction mode in ctx

saurabhkoshatwar and others added 4 commits October 25, 2024 18:42

Feature/tvd loss fused (#1)

da24657

* Add fused tvd loss

Add TVD to README.md

7736e32

checkstyle fixes

a45f6ce

Merge branch 'main' into main

bc906d8

lancerts requested a review from ByronHsu October 26, 2024 16:31

ByronHsu mentioned this pull request Oct 31, 2024

2024 Q4 Roadmap #285

Open

Merge branch 'main' into main

0097d15

ByronHsu requested review from qingquansong and yundai424 November 8, 2024 17:40

yundai424 reviewed Nov 8, 2024

View reviewed changes

qingquansong reviewed Nov 8, 2024

View reviewed changes

Tcc0403 reviewed Nov 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TVD Loss Kernel #324

Add TVD Loss Kernel #324

saurabhkoshatwar commented Oct 26, 2024 •

edited

Loading

saurabhkoshatwar commented Oct 26, 2024 •

edited

Loading

yundai424 left a comment •

edited

Loading

yundai424 Nov 8, 2024

yundai424 Nov 8, 2024

qingquansong Nov 8, 2024

qingquansong left a comment

Tcc0403 Nov 8, 2024

		from liger_kernel.transformers.tvd import LigerTVDLoss


		class TorchTVDLoss(torch.nn.Module):

Add TVD Loss Kernel #324

Are you sure you want to change the base?

Add TVD Loss Kernel #324

Conversation

saurabhkoshatwar commented Oct 26, 2024 • edited Loading

Summary

Testing Done

saurabhkoshatwar commented Oct 26, 2024 • edited Loading

yundai424 left a comment • edited Loading

Choose a reason for hiding this comment

yundai424 Nov 8, 2024

Choose a reason for hiding this comment

yundai424 Nov 8, 2024

Choose a reason for hiding this comment

qingquansong Nov 8, 2024

Choose a reason for hiding this comment

qingquansong left a comment

Choose a reason for hiding this comment

Tcc0403 Nov 8, 2024

Choose a reason for hiding this comment

saurabhkoshatwar commented Oct 26, 2024 •

edited

Loading

saurabhkoshatwar commented Oct 26, 2024 •

edited

Loading

yundai424 left a comment •

edited

Loading