MedChain

This repository provides the official implementation of MedChain.

MedChain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence
Jie Liu^1*, Wenxuan Wang^2*, Zizhan Ma², Guolin Huang³, SU Yihang²,
Kao-Jung Chang^4,5, Wenting Chen¹, Haoliang Li¹, Linlin Shen³, Michael Lyu²

¹City University of Hong Kong, ²The Chinese University of Hong Kong, ³Shenzhen University,
⁴National Yang Ming Chiao Tung University, ⁵Taipei Veterans General Hospital

paper | code | dataset

🚀Overview

In this paper, we introduce MedChain, a novel benchmark designed to bridge the gap between Large Language Model (LLM) agents and real-world clinical decision-making (CDM). Unlike existing medical benchmarks that focus on isolated tasks, MedChain emphasizes three core features of clinical practice: personalization, interactivity, and sequentiality.

MedChain comprises 12,163 rigorously validated clinical cases spanning 19 medical specialties and 156 sub-categories, including 7,338 medical images with reports. Each case progresses through five sequential stages: 1️⃣ Specialty Referral 2️⃣ History-taking 3️⃣ Examination 4️⃣ Diagnosis 5️⃣ Treatment

To address the challenges of MedChain, the authors propose MedChain-Agent, a multi-agent framework integrating:

Three specialized agents (General, Summarizing, Feedback) for collaborative decision-making.
MedCase-RAG, a retrieval-augmented module that dynamically expands a structured medical case database (12D feature vectors) for context-aware reasoning.

Key Results:

MedChain-Agent outperforms state-of-the-art models (e.g., GPT-4o, Claude-3.5) with an average score of 0.5269 across tasks, showcasing superior adaptability in sequential CDM.
Ablation studies confirm the critical roles of feedback mechanisms and MedCase-RAG .
The benchmark exposes limitations in existing LLMs, with single-agent models scoring ≤0.4327 due to error propagation in sequential stages.

MedChain sets a new standard for evaluating AI in clinical workflows, highlighting the need for frameworks that mirror real-world complexity while enabling reliable, patient-centric decision-making. The dataset and code will be released publicly to foster progress in medical AI.

📦Code

You can find the workflow code for the five core tasks of MedChain in the task_framework, and they can be individually tested.

cd task_framework
Specialty Referral：  python task1_triage.py
History-taking：      python task2_interrogation.py
Examination：         python task3_image.py
Diagnosis：           python task4_diagnosis.py
Treatment：           python task5_treatment.py

The doctor_patient_interaction file presents a detailed process of doctor-patient interaction. By inputting the correct LLM API interface and running the wenzhen_main.py script, you can achieve this. The remaining scripts are for extracting specified information and calculating the IoU value.

You can locate the core workflow code for MedChain-Agent, MedCase-RAG, and the feedback mechanism in the main.py file, and run tests to evaluate MedChain-Agent.
```
python main.py
```
Below are the comparative and ablation study results for MedChain-Agent.

For More Details, please see our paper.

🔍 Insights

Sequential clinical decision-making exposes critical gaps in current AI systems, with single-agent models achieving only 43.27% average accuracy (Claude-3.5), while MedChain-Agent improves performance to 52.69% through multi-agent collaboration and error mitigation.
Structured medical knowledge retrieval (MedCase-RAG) drives significant performance gains, contributing a 8.11% improvement in clinical task accuracy by enabling dynamic case matching through 12-dimensional feature vectors.
Iterative feedback mechanisms are pivotal for clinical reasoning, reducing error propagation and boosting average scores by 3.73% through continuous refinement across sequential stages.

🎈Acknowledgements

Weare particularly indebted to the administrators of the iiyi website for their generosity in allowing us to utilize their data for our research purposes. We would like to acknowledge the assistance provided by Claude-3.5 in proofreading our manuscript for grammatical accuracy and in facilitating the creation of LaTeX tables.

📜Citation

If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝

@misc{liu2024medchainbridginggapllm,
      title={MedChain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence}, 
      author={Jie Liu and Wenxuan Wang and Zizhan Ma and Guolin Huang and Yihang SU and Kao-Jung Chang and Wenting Chen and Haoliang Li and Linlin Shen and Michael Lyu},
      year={2024},
      eprint={2412.01605},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.01605}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
doctor_patient_interaction		doctor_patient_interaction
my_prompt		my_prompt
task_framework		task_framework
utils		utils
.DS_Store		.DS_Store
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MedChain

🚀Overview

📦Code

🔍 Insights

🎈Acknowledgements

📜Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ljwztc/MedChain

Folders and files

Latest commit

History

Repository files navigation

MedChain

🚀Overview

📦Code

🔍 Insights

🎈Acknowledgements

📜Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages