Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new paper: #42

Open
wyzh0912 opened this issue Feb 23, 2025 · 0 comments
Open

Add new paper: #42

wyzh0912 opened this issue Feb 23, 2025 · 0 comments

Comments

@wyzh0912
Copy link
Contributor

Title

Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing

Published Date

2025-01-24

Source

arXiv

Head Name

Gender Head

Summary

  • Innovation: The paper introduces an interpretable neuron editing method to mitigate gender bias in LLMs by targeting specific biased neurons while preserving the model’s original capabilities. This approach combines logit-based and causal-based strategies to selectively edit neurons responsible for gender bias.

  • Tasks: The study involves the creation of the CommonWords dataset to evaluate gender bias across five LLMs and employs a detailed analysis of neuron circuits to identify "gender neurons" and "general neurons" that contribute to gender bias, followed by experiments to test the proposed neuron editing method.

  • Significant Result: The proposed method effectively reduces gender bias in LLMs while maintaining their original capabilities, outperforming existing fine-tuning and neuron editing approaches across several benchmarks, as demonstrated by improvements in ICAT scores and reductions in entropy differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant