This repository contains the code for the project on "Machine Unlearning in Large Language Models"
The project synthesizes methods from the papers "Who’s Harry Potter? Approximate Unlearning in LLMs" and "Locating and Editing Factual Associations in GPT" to develop a framework capable of implementing two distinct unlearning approaches:
- Selective Unlearning - Employs reinforced model predictions to selectively remove knowledge of specific content.
- Rank-One Model Editing (ROME) - Utilizes direct manipulation of model weights to update factual associations precisely.
/Selective_Unlearning
: Contains scripts and notebooks implementing the selective unlearning process./ROME
: Includes the implementation of Rank-One Model Editing (ROME) for precise factual modifications in models.
- Mannal Kamble - [email protected]
- Karthvik Sarvade - [email protected]
- Thanks to Professor Gustavo Sandoval for his guidance and mentorship throughout this project.
- Inspired by the methodologies detailed in recent unlearning research papers.