Skip to content

[TrustNLP@NAACL 2025] BiasEdit: Debiasing Stereotyped Language Models via Model Editing

Notifications You must be signed in to change notification settings

zjunlp/BiasEdit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BiasEdit: Debiasing Stereotyped Language Models via Model Editing

📃 Paper 💻 Code 🌏 Web

BiasEdit is an efficient model editing method to eliminate stereotyped bias from language models with small editor networks, including a debiasing loss to guide edits on partial parameters and a remaining loss to maintain the language modeling abilities during editing. Experimental results show BiasEdit' excellent performance on debiasing, modeling ability preservation, and robustness of gender reverse and semantic generality.

📌 Table of Contents

🛠️ Setup

This codebase uses Python 3.9.18. Other versions may work as well.

Create an environment and install the dependencies:

$ conda create -n biasedit python=3.9
$ conda activate biasedit
(biasedit) $ pip install -r requirements.txt

💻 BiasEdit

With StereoSet, editor networks are trained to generate parameter shifts for debiasing at first. Then, the trained editor networks are used to conduct edits on language models and produce an unbiased model.

⌚️ Training Editor Networks

  • Formatted datasets with train/dev/test (gender_test.json, race_test.json, religion_test.json) splits are in data/stereoset.
  • Configurations are in config. Partial parameters to be edited are presented in editor. The configurations, like weights to be edited, are in model.
  • Experimental scripts are in scripts. All hyper-parameters are in the scripts. Since hyper-parameters have a great effect on hyper-network tuning, higly recommand conducting hyper-paramter tuning.
  • For the ablation study on the remaining loss, set editor.loc_coef=0.
  • Metrics can be found in the training log.

🚀 Debiasing with Editor Networks

  • Set eval_only=True
  • Set data.valid_path as the path of the test set
  • Metrics can be found at the end of the debiasing log, like "Test ------- XXX".
  • Experimental scripts are in scripts.

👀 Bias Tracing

Enter bias_tracing.

📝 Citation

If this code or paper was useful, please consider using the following citation:

@article{xin25BiasEdit,
    title={BiasEdit: Debiasing Stereotyped Language Models via Model Editing},
    author={Xin Xu, Wei Xu, Ningyu Zhang, Julian McAuley},
    year={2025},
    url={https://arxiv.org/pdf/2503.08588}
}

✨ Acknowledgements

  • Thanks for the original code from MALMEN.
  • Thanks for StereoSet and all the baselines from bias-bench.
  • For more model editing methods, please try EasyEdit.