Skip to content

Conversation

@141forever
Copy link
Contributor

@141forever 141forever commented Jan 22, 2026

What does this PR do?

In the GOLD algorithm, there is a mapping relationship between the student and the teacher when computing the loss. In the original implementation, this part involved frequent switching between the CPU and GPU, which not only incurred significant time overhead but also easily led to GPU memory fragmentation. This PR fixes this issue by keeping the mapping relationship on the GPU in advance.

Fixes #4864

Before submitting

  • [YES] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [YES] Did you read the contributor guideline,
    Pull Request section?
  • [YES] Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case. If there are any training acceleration techniques? #4864
  • [YES] Did you make sure to update the documentation with your changes?
  • [NO] Did you write any new necessary tests?

Who can review?

People from HuggingFace.

@141forever 141forever changed the title training speed up GOLD training speed up Jan 22, 2026
@kashif kashif merged commit e106972 into huggingface:main Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

If there are any training acceleration techniques?

2 participants