LocAgent: Graph-Guided LLM Agents for Code Localization

📑 Paper | 📊 Loc-bench | 🤗 Qwen2.5-Coder-7B-CL | 🤗 Qwen2.5-Coder-32B-CL

ℹ️ Overview

We introduce LocAgent, a framework that addresses code localization through graph-based representation. By parsing codebases into directed heterogeneous graphs, LocAgent creates a lightweight representation that captures code structures and their dependencies, enabling LLM agents to effectively search and locate relevant entities through powerful multi-hop reasoning.

⚙️ Setup

Follow these steps to set up your development environment:

git clone [email protected]:gersteinlab/LocAgent.git
cd LocAgent

conda create -n locagent python=3.12
conda activate locagent
pip install -r requirements.txt

🚀 Launch LocAgent

(Optional but recommended) Parse the codebase for each issue in the benchmark to generate graph indexes in batch.
```
python dependency_graph/batch_build_graph.py \
      --dataset 'czlll/Loc-Bench_V1' \
      --split 'test' \
      --num_processes 50 \
      --download_repo
```
- dataset: select the benchmark (by default it will be SWE-Bench_Lite); you can choose from ['czlll/SWE-bench_Lite', 'czlll/Loc-Bench_V1'](adapted for code localization) and SWE-bench series datasets like ['princeton-nlp/SWE-bench_Lite', 'princeton-nlp/SWE-bench_Verified', 'princeton-nlp/SWE-bench']
- repo_path: the directory where you plan to pull or have already pulled the codebase
- index_dir: the base directory where the generated graph index will be saved
- download_repo: whether to download the codebase to repo_path before indexing
Export the directory of the graph indexes and the BM25 sparse index. If not generated in advance, the graph index will be generated during the localization process.
```
export GRAPH_INDEX_DIR='{INDEX_DIR}/{DATASET_NAME}/graph_index_v2.3'
export BM25_INDEX_DIR='{INDEX_DIR}/{DATASET_NAME}/BM25_index'
```
Run the script scripts/run_lite.sh to lauch LocAgent.
```
python auto_search_main.py \
   --dataset 'czlll/SWE-bench_Lite' \
   --split 'test' \
   --model 'azure/gpt-4o' \
   --localize \
   --merge \
   --output_folder $result_path/location \
   --eval_n_limit 300 \
   --num_processes 50 \
   --use_function_calling \
   --simple_desc
```
- localize: set to start the localization process
- merge: merge the result of multiple samples
- use_function_calling: enable function calling features of LLMs. If disabled, codeact will be used to support function calling
- simple_desc: use simplified function descriptions due to certain LLM limitations. Set to False for better performance when using Claude.
Evaluation After localization, the results will be saved in a JSONL file. You can evaluate them using evaluation.eval_metric.evaluate_results. Refer to evaluation/run_evaluation.ipynb for a demonstration.

📑 Cite Us

@inproceedings{chen-etal-2025-locagent,
 title = "{L}oc{A}gent: Graph-Guided {LLM} Agents for Code Localization",
 author = "Chen, Zhaoling  and
   Tang, Robert  and
   Deng, Gangda  and
   Wu, Fang  and
   Wu, Jialong  and
   Jiang, Zhiwei  and
   Prasanna, Viktor  and
   Cohan, Arman  and
   Wang, Xingyao",
 editor = "Che, Wanxiang  and
   Nabende, Joyce  and
   Shutova, Ekaterina  and
   Pilehvar, Mohammad Taher",
 booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
 month = jul,
 year = "2025",
 address = "Vienna, Austria",
 publisher = "Association for Computational Linguistics",
 url = "https://aclanthology.org/2025.acl-long.426/",
 doi = "10.18653/v1/2025.acl-long.426",
 pages = "8697--8727",
 ISBN = "979-8-89176-251-0",
 abstract = "Code localization{--}identifying precisely where in a codebase changes need to be made{--}is a fundamental yet challenging task in software maintenance. Existing approaches struggle to efficiently navigate complex codebases when identifying relevant code snippets.The challenge lies in bridging natural language problem descriptions with the target code elements, often requiring reasoning across hierarchical structures and multiple dependencies.We introduce LocAgent, a framework that addresses code localization through a graph-guided agent.By parsing codebases into directed heterogeneous graphs, LocAgent creates a lightweight representation that captures code structures and their dependencies, enabling LLM agents to effectively search and locate relevant entities through powerful multi-hop reasoning.Experimental results on real-world benchmarks demonstrate that our approach significantly enhances accuracy in code localization.Notably, our method with the fine-tuned Qwen-2.5-Coder-Instruct-32B model achieves comparable results to SOTA proprietary models at greatly reduced cost (approximately 86{\%} reduction), reaching up to 92.7{\%} accuracy on file-level localization while improving downstream GitHub issue resolution success rates by 12{\%} for multiple attempts (Pass@10). Our code is available at \url{https://github.com/gersteinlab/LocAgent}."
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
dependency_graph		dependency_graph
evaluation		evaluation
plugins		plugins
repo_index		repo_index
scripts		scripts
util		util
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
auto_search_main.py		auto_search_main.py
build_bm25_index.py		build_bm25_index.py
requirements.txt		requirements.txt
sft_train.py		sft_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LocAgent: Graph-Guided LLM Agents for Code Localization

ℹ️ Overview

⚙️ Setup

🚀 Launch LocAgent

📑 Cite Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

License

gersteinlab/LocAgent

Folders and files

Latest commit

History

Repository files navigation

LocAgent: Graph-Guided LLM Agents for Code Localization

ℹ️ Overview

⚙️ Setup

🚀 Launch LocAgent

📑 Cite Us

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages