Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new paper: #49

Open
wyzh0912 opened this issue Feb 23, 2025 · 0 comments
Open

Add new paper: #49

wyzh0912 opened this issue Feb 23, 2025 · 0 comments

Comments

@wyzh0912
Copy link
Contributor

Title

Exploring Translation Mechanism of Large Language Models

Published Date

2025-02-17

Source

arXiv

Head Name

Source Heads ,Indicator Heads ,Positional Heads

Summary

  • Innovation: The paper investigates the translation mechanisms of LLMs by examining crucial computational components (e.g., attention heads, MLPs) using path patching techniques. It identifies a sparse subset of specialized attention heads that significantly contribute to translation tasks and proposes a targeted fine-tuning approach that enhances translation performance with minimal parameter adjustments.

  • Tasks: The study employs path patching to determine causal relationships in LLMs during translation tasks. It analyzes the behavioral patterns of crucial components, characterizing roles of attention heads and examining MLP interactions with translation-relevant tokens. The insights are used to implement a targeted supervised fine-tuning strategy to improve translation capabilities.

  • Significant Result: The research finds that less than 5% of attention heads are crucial for translation, exhibiting specialized functions to process translation-relevant features. Fine-tuning just 64 of these heads achieves performance comparable to full-parameter tuning, preserving general model capabilities while enhancing translation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant