ExtAgents is a framework for scaling external knowledge input beyond the context length of LLMs via multi-agent collaboration.
conda create -n extagents python=3.10 -y
conda activate extagents
pip install -r requirements.txt
You can download the data with the script:
bash scripts/download_data.sh
Or you can download the data manually from one of the following links:
The data should be organized as follows:
./
└── data/
├── sampled_hotpot_questions.json
├── rag_1000k.jsonl
├── longbook_qa_eng.jsonl
└── longbook_qa_chn.jsonl
Update: We have uploaded the long context Q&A datasets both before and after enhancement to Hugging Face. original
refers to the original dataset, full-enhanced
indicates the fully enhanced dataset, and partial-enhanced
signifies the partially enhanced dataset, where only samples with a length not exceeding 128k tokens have been augmented. Welcome to download and use!
We currently support three tasks: RAG, En.QA, Zh.QA.
The RAG task is a question answering task, where the input is a question and a context. The question and answer are sampled from the HotpotQA. The context is a long text, which is the concatenation of documents retrieved from Wikipedia using BM25 embedding. We use KILT knowledge source as our knowledge source. It is based on the 2019/08/01 Wikipedia dump. We have provided the context in the data/rag_1000k.jsonl
file.
The En.QA and Zh.QA tasks are question answering tasks, where the input is a question and a context. The question, answer and context are from the InfiniteBench.
Here is an example command to generate predictions for RAG task:
python main.py \
--task rag \
--output_dir results_rag \
--chunk_length 8000 \
--input_length 128000 \
--api_url "YOUR_API_URL" \
--api_key "YOUR_API_KEY" \
--model "gpt-4o-mini-2024-07-18" \
--num_workers 8 \
> rag.log
The generated predictions will be saved in the results_rag
directory.
--task
: Task, can berag
,en
,zh
.--output_dir
: Directory to save the generated predictions.--chunk_length
: Chunk length.--input_length
: Input length.--model
: Model to use, default isgpt-4o-mini-2024-07-18
.--api_url
: Your API URL, default is os.getenv("OPENAI_BASE_URL").--api_key
: Your API Key, default is os.getenv("OPENAI_API_KEY").--num_workers
: Number of workers, each worker will process one example.
You can also set the environment variables OPENAI_BASE_URL
and OPENAI_API_KEY
to avoid typing them in the command line.
export OPENAI_BASE_URL="YOUR_API_URL"
export OPENAI_API_KEY="YOUR_API_KEY"
We provide a script to evaluate the generated predictions. For RAG task, the evaluation is based on the HotpotQA. For En.QA and Zh.QA task, the evaluation is based on the InfiniteBench.
For RAG task:
bash scripts/eval_rag.sh /path/to/your/output_dir
For En.QA task:
bash scripts/eval_en.sh /path/to/your/output_dir
For Zh.QA task:
bash scripts/eval_zh.sh /path/to/your/output_dir
If you find this project helpful, please cite it as follows:
@article{liu2025extagents,
title={Scaling External Knowledge Input Beyond The Context Length of LLMs via Multi-Agent Collaboration},
author={Zijun Liu and Zhennan Wan and Peng Li and Ming Yan and Ji Zhang and Fei Huang and Yang Liu},
year={2025}
}