Skip to content

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433

Notifications You must be signed in to change notification settings

JayZhang42/SLED

Repository files navigation

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models [NeurIPS 2024]

The official implementation for our NeurIPS 2024 paper "SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models"

Jianyi Zhang1 Da-Cheng Juan2 Cyrus Rashtchian2 Chun-Sung Ferng2 Heinrich Jiang2 Yiran Chen1

1Duke University Logo

2Google Research Logo

📌News

[2024.11.27] - We released the latest code on Github.
[2024.11.26] - We launched the official project website launched here!
[2024.11.01] - The paper is available at Arxiv.
[2024.09.25] - Our SLED paper accepted for NeurIPS 2024!

🧨 Why Choose SLED?

  • Model Versatility: Compatible with most large language model (LLM) families due to their multi-layered structures, such as LLaMA 2, LLaMA 3, Gemma, and MoE LLMs; scalable from 2B to 70B parameters.
  • Task Versatility: Tested with factual accuracy enhancement across various tasks and benchmarks, such as TruthfulQA, StrategyQA, FACTOR, GSM8K, HotPotQA, Natural Questions, and TriviaQA.
  • High Compatibility: SLED can be flexibly combined with other decoding methods, enhancing their performance.
  • High-Quality Outputs: Reduces repetition, ensures fluent responses.
  • Negligible Computational Overhead: Minimal additional costs, suited for real-time use.
  • Interpretability: Provides new insights into inference-time computing algorithms.

🔮Overview of SLED

SLED

We introduce Self Logits Evolution Decoding (SLED), a novel factuality decoding approach that leverages the latent knowledge within LLMs by contrasting the final layer’s logits with early layers' logits. SLED tracks the logits evolution process to unearth the latent knowledge within LLMs, and enables the self-evolution of the output distribution further to align it more closely with real-world facts.

🛠Installation

  • Hardware: We recommend using the NVIDIA A100 80GB GPU for efficient inference. While this configuration is recommended, other hardware configurations also work but could yield slightly different performance outcomes.
  • Python: Recommended to use Python 3.10 or higher.
  • PyTorch: We recommend using PyTorch version 2.0.1 with CUDA 11.8. You can install this specific version of PyTorch using the following command:
    pip3 install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
  • Transformers: Install the transformers library from the local directory included in the project folder.
    pip install -e transformers
  • Other Dependencies:
    pip install -r requirements.txt

📈Evaluation

Below we provide example scripts for running SLED and other baseline methods such as dola and Greedy Decoding. For SLED and dola, the default setting for --early-exit-layers will include all the earlier layers of the LLM model before the final output layer.

Dataset Preparation

tar -xzvf demo_dataset.tar.gz

FACTOR (Multiple Choices)

python run_factor.py --model-name meta-llama/Llama-2-7b-hf  --data-path Data/FACTOR/wiki_factor.csv  --output-path output-path.json --num-gpus 1 --decoding_method VanillaGreedy
python run_factor.py --model-name meta-llama/Llama-2-7b-hf  --data-path Data/FACTOR/wiki_factor.csv  --output-path output-path.json --num-gpus 1 --decoding_method dola
python run_factor.py --model-name meta-llama/Llama-2-7b-hf  --data-path Data/FACTOR/wiki_factor.csv  --output-path output-path.json --num-gpus 1 --decoding_method SLED --evolution_rate 2  --evolution_scale 10

TruthfulQA (Multiple Choices)

python run_tfqa.py --model-name meta-llama/Llama-2-7b-hf  --data-path Data/TruthfulQA --output-path output-path.json --num-gpus 1 --decoding_method VanillaGreedy
python run_tfqa.py --model-name meta-llama/Llama-2-7b-hf  --data-path Data/TruthfulQA --output-path output-path.json --num-gpus 1 --decoding_method dola
python run_tfqa.py --model-name meta-llama/Llama-2-7b-hf  --data-path Data/TruthfulQA --output-path output-path.json --num-gpus 1 --decoding_method SLED --evolution_rate 2.5  --evolution_scale 75

StrategyQA

python run_strqa.py  --model-name meta-llama/Llama-2-7b-hf  --data-path Data/StrategyQA --output-path output-path.json --num-gpus 1 --decoding_method VanillaGreedy
python run_strqa.py  --model-name meta-llama/Llama-2-7b-hf  --data-path Data/StrategyQA --output-path output-path.json --num-gpus 1 --decoding_method dola
python run_strqa.py  --model-name meta-llama/Llama-2-7b-hf  --data-path Data/StrategyQA --output-path output-path.json --num-gpus 1 --decoding_method SLED --evolution_rate 1.75 --evolution_scale 5

GSM8K

python run_gsm8k.py  --model-name meta-llama/Llama-2-7b-hf  --data-path Data/gsm8k_test --output-path output-path.json --num-gpus 1 --decoding_method VanillaGreedy
python run_gsm8k.py  --model-name meta-llama/Llama-2-7b-hf  --data-path Data/gsm8k_test --output-path output-path.json --num-gpus 1 --decoding_method dola
python run_gsm8k.py  --model-name meta-llama/Llama-2-7b-hf  --data-path Data/gsm8k_test --output-path output-path.json --num-gpus 1 --decoding_method SLED --evolution_rate 2 --evolution_scale 10

Additional experiments involving various models can be found in the scripts folder.

💡Important Recommendations

We strongly encourage you to try SLED method on your own open-ended generation tasks and datasets. To ensure good performance and effective outcomes, consider the following recommended parameters:

  • Evolution Rate: Set --evolution_rate within a range of 0.5 to 3.
  • Evolution Scale: Set --evolution_scale values of 5, 10, or 20.
  • Repetition Penalty: Adjust the --repetition_penalty to between 1.01 and 1.05.

We hope this will be a good starting point for your experiments!

Acknowledgement

This codebase is based on the official repo of DoLa. We also highly recommend reading their excellent work.

Citation

We would greatly appreciate it if you cite our SLED paper when you find our repository helpful for your research or projects.

@inproceedings{
zhang2024sled,
title={SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models},
author={Jianyi Zhang and Da-Cheng Juan and Cyrus Rashtchian and Chun-Sung Ferng and Heinrich Jiang and Yiran Chen},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024},
year={2024},
url={https://arxiv.org/abs/2411.02433}
}