Skip to content

Public code repository for "Scaling External Knowledge Input Beyond the Context Length of LLMs via Multi-Agent Collaboration"

License

Notifications You must be signed in to change notification settings

THUNLP-MT/ExtAgents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExtAgents

🌐 Github | 📖 Paper | 🤗 Data

Introduction

ExtAgents is a framework for scaling external knowledge input beyond the context length of LLMs via multi-agent collaboration.

overview

Setup

conda create -n extagents python=3.10 -y
conda activate extagents
pip install -r requirements.txt

Data

You can download the data with the script:

bash scripts/download_data.sh

Or you can download the data manually from one of the following links:

The data should be organized as follows:

./
└── data/
    ├── sampled_hotpot_questions.json
    ├── rag_1000k.jsonl
    ├── longbook_qa_eng.jsonl
    └── longbook_qa_chn.jsonl

Update: We have uploaded the long context Q&A datasets both before and after enhancement to Hugging Face. original refers to the original dataset, full-enhanced indicates the fully enhanced dataset, and partial-enhanced signifies the partially enhanced dataset, where only samples with a length not exceeding 128k tokens have been augmented. Welcome to download and use!

Usage

Generation

We currently support three tasks: RAG, En.QA, Zh.QA.

The RAG task is a question answering task, where the input is a question and a context. The question and answer are sampled from the HotpotQA. The context is a long text, which is the concatenation of documents retrieved from Wikipedia using BM25 embedding. We use KILT knowledge source as our knowledge source. It is based on the 2019/08/01 Wikipedia dump. We have provided the context in the data/rag_1000k.jsonl file.

The En.QA and Zh.QA tasks are question answering tasks, where the input is a question and a context. The question, answer and context are from the InfiniteBench.

Here is an example command to generate predictions for RAG task:

python main.py \
    --task rag \
    --output_dir results_rag \
    --chunk_length 8000 \
    --input_length 128000 \
    --api_url "YOUR_API_URL" \
    --api_key "YOUR_API_KEY" \
    --model "gpt-4o-mini-2024-07-18" \
    --num_workers 8 \
    > rag.log

The generated predictions will be saved in the results_rag directory.

  • --task: Task, can be rag, en, zh.
  • --output_dir: Directory to save the generated predictions.
  • --chunk_length: Chunk length.
  • --input_length: Input length.
  • --model: Model to use, default is gpt-4o-mini-2024-07-18.
  • --api_url: Your API URL, default is os.getenv("OPENAI_BASE_URL").
  • --api_key: Your API Key, default is os.getenv("OPENAI_API_KEY").
  • --num_workers: Number of workers, each worker will process one example.

You can also set the environment variables OPENAI_BASE_URL and OPENAI_API_KEY to avoid typing them in the command line.

export OPENAI_BASE_URL="YOUR_API_URL"
export OPENAI_API_KEY="YOUR_API_KEY"

Evaluation

We provide a script to evaluate the generated predictions. For RAG task, the evaluation is based on the HotpotQA. For En.QA and Zh.QA task, the evaluation is based on the InfiniteBench.

For RAG task:

bash scripts/eval_rag.sh /path/to/your/output_dir

For En.QA task:

bash scripts/eval_en.sh /path/to/your/output_dir

For Zh.QA task:

bash scripts/eval_zh.sh /path/to/your/output_dir

Citation

If you find this project helpful, please cite it as follows:

@article{liu2025extagents,
  title={Scaling External Knowledge Input Beyond The Context Length of LLMs via Multi-Agent Collaboration},
  author={Zijun Liu and Zhennan Wan and Peng Li and Ming Yan and Ji Zhang and Fei Huang and Yang Liu},
  year={2025}
}

About

Public code repository for "Scaling External Knowledge Input Beyond the Context Length of LLMs via Multi-Agent Collaboration"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published