Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new paper: #45

Open
wyzh0912 opened this issue Feb 23, 2025 · 0 comments
Open

Add new paper: #45

wyzh0912 opened this issue Feb 23, 2025 · 0 comments

Comments

@wyzh0912
Copy link
Contributor

Title

Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference

Published Date

2025-01-22

Source

arXiv

Head Name

Evaluator Heads

Summary

  • Innovation: The paper introduces EHPC, a training-free method that leverages specific attention heads, termed evaluator heads, to efficiently compress prompts by retaining only significant tokens in long-context transformer inference, thus reducing computational costs and improving performance.
  • Tasks: The study involves identifying evaluator heads in transformer-based LLMs through pilot experiments with synthetic data, applying these heads for prompt compression across benchmarks like LongBench and ZeroSCROLLS, and evaluating the method's efficiency in reducing API costs and accelerating long-context inference.
  • Significant Result: EHPC achieves state-of-the-art performance in prompt compression benchmarks, effectively reducing API costs and memory usage while maintaining competitive results compared to key-value cache-based methods, improving direct inference performance by up to 40% on question-answering datasets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant