You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On Mechanistic Circuits for Extractive Question-Answering
Published Date
2025-02-12
Source
arXiv
Head Name
Attribution Head
Summary
Innovation: The paper introduces a method to extract mechanistic circuits from language models for extractive QA tasks using causal mediation analysis. It demonstrates how these circuits can elucidate the interplay between parametric memory and context usage, leading to practical applications like data attribution and model steering.
Tasks: The study involves designing a probe dataset and applying causal mediation analysis to identify circuits responsible for context and memory faithfulness in language models. The insights from these circuits are then used to develop ATTNATTRIB, a fast data attribution algorithm, and to steer models towards improved context faithfulness in QA tasks.
Significant Result: The research finds that a small set of attention heads within the identified circuits performs reliable data attribution by default, allowing for state-of-the-art attribution results in extractive QA benchmarks. Additionally, using these attribution insights, the model can be steered to improve its reliance on context over parametric memory, enhancing context faithfulness by up to 9% in QA datasets.
The text was updated successfully, but these errors were encountered:
Title
On Mechanistic Circuits for Extractive Question-Answering
Published Date
2025-02-12
Source
arXiv
Head Name
Attribution Head
Summary
Innovation: The paper introduces a method to extract mechanistic circuits from language models for extractive QA tasks using causal mediation analysis. It demonstrates how these circuits can elucidate the interplay between parametric memory and context usage, leading to practical applications like data attribution and model steering.
Tasks: The study involves designing a probe dataset and applying causal mediation analysis to identify circuits responsible for context and memory faithfulness in language models. The insights from these circuits are then used to develop ATTNATTRIB, a fast data attribution algorithm, and to steer models towards improved context faithfulness in QA tasks.
Significant Result: The research finds that a small set of attention heads within the identified circuits performs reliable data attribution by default, allowing for state-of-the-art attribution results in extractive QA benchmarks. Additionally, using these attribution insights, the model can be steered to improve its reliance on context over parametric memory, enhancing context faithfulness by up to 9% in QA datasets.
The text was updated successfully, but these errors were encountered: