AMOS-MM-Solution

This repository contains our solution for the MICCAI24 AMOS-MM: Abdominal Multimodal Analysis Challenge.

Installation

Requirements:

Python ≥ 3.10.12 and < 3.12

Setup steps:

Create a Python (or conda) virtual environment:
```
python -m venv mllm
source mllm/bin/activate
```

Clone the repository:

git clone https://github.com/bowang-lab/AMOS-MM-Solution.git
cd AMOS-MM-Solution

Install dependencies:
```
pip install -r requirements.txt
```

Dataset Download

To replicate or expand upon our experiments, download the AMOS-MM dataset from here. Once downloaded, you can proceed with dataset preparation.

Data Preparation

The dataset requires a JSON file structured similarly to Data/dataset.json. To generate it, run the following command:

python prepare_data.py \
  --report_json <PATH_TO_report_generation_train_val.json> \
  --vqa_json <PATH_TO_vqa_train_val.json> \
  --output <PATH_TO_OUTPUT_DIR> \
  --train_src <PATH_TO_imagesTr> \
  --val_src <PATH_TO_imagesVa>

Training

Medical Report Generation (MRG)

Once data preparation is complete, train the LLaMA 3.1 model for report generation using:

PYTHONPATH=. accelerate launch --num_processes 1 --main_process_port 29500 LaMed/src/train/amos_train.py \
    --version v0 \
    --model_name_or_path meta-llama/Meta-Llama-3.1-8B-Instruct \
    --cache_dir <WHERE_MODEL_WILL_BE_SAVED> \
    --model_type llama \
    --freeze_llm True \
    --vision_tower vit3d \
    --pretrain_vision_model <PATH_TO_PRETRAINED_VISION_MODEL> \
    --bf16 True \
    --output_dir <WHERE_TO_SAVE_MODEL> \
    --num_train_epochs 100 \
    --per_device_train_batch_size 2 \
    --evaluation_strategy "no" \
    --do_eval False \
    --eval_accumulation_steps 1 \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 5e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 0.001 \
    --gradient_checkpointing False \
    --dataloader_pin_memory True \
    --dataloader_num_workers 4 \
    --report_to none \
    --prompt "simple" \
    --task mrg \
    --json_path <PATH_TO_DATASET_JSON> \
    --image_size "32, 256, 256" \
    --with_template True \
    --model_max_length 768

The json_path should point to the JSON file prepared earlier.
Set cache_dir and pretrain_vision_model appropriately.
The vision model we used is the 3D ViT from M3D.
Additional arguments:
- zoom_in: uses organ segmentation masks for region cropping.
- prompt: controls the prompt format (e.g. "simple" in LaMed/src/dataset/prompts.py).

Visual Question Answering (VQA)

To fine-tune the model for VQA, change the --task argument to vqa. Additional arguments include:

only_letter: to restrict answers to single letters.
with_reason: to include reasoning in answers.

Optional: Training the Triplet Model

For Binary-based Questioning (BQ), first prepare triplets:

python scripts/triplet_extraction.py \
  --json_path <PATH_TO_DATASET_JSON> \
  --openai_key <OPEN_AI_KEY>

You can modify the model used for triplet extraction inside the script.
The triplet files will be named to align with the report files for seamless training.

To train the triplet model, use the same training command as above, adding:

--triplet True

Inference

MRG Inference

Run inference for medical report generation:

CUDA_VISIBLE_DEVICES="0" accelerate launch --num_processes 1 --main_process_port 29500 infer.py \
  --model_name_or_path <PATH_TO_TRAINED_MODEL> \
  --json_path <PATH_TO_DATA_JSON> \
  --model_max_length 768 \
  --prompt "simple" \
  --post_process "normality" "bq" \
  --triplet_model_path <PATH_TO_TRAINED_TRIPLET_MODEL> \
  --proj_out_num 256

Note:

If you did not train a triplet model, omit the "bq" argument and --triplet_model_path.
The post_process argument enables:
- Knowledge-based normality inference.
- Focused questioning based on specific findings.
The knowledge base is defined in utils/postprocessor.py. Adapt it for different datasets.

VQA Inference

Run VQA inference with:

CUDA_VISIBLE_DEVICES="0" accelerate launch --num_processes 1 --main_process_port 29500 infer_vqa.py \
  --model_name_or_path <PATH_TO_TRAINED_MODEL> \
  --json_path <PATH_TO_DATA_JSON> \
  --model_max_length 512 \
  --proj_out_num 256

The optional --with_acc argument computes VQA accuracy if ground truth answers are available in the competition format.

Editing the Knowledge Base for NN and BQ

Our paper introduces two report augmentation methods:

Naive Normality (NN)
Binary-based Questioning (BQ)

Both methods rely on a pre-defined knowledge base specific to AMOS-MM. To customize this for other datasets, edit the mappings in:

utils/postprocessor.py

Acknowledgements

We thank the organizers of the MICCAI24 AMOS-MM challenge for their efforts.
This codebase builds upon the M3D repository, and we gratefully acknowledge its authors.

Bibtex

@InProceedings{ Bah_Exploring_MICCAI2025,
   author = { Baharoon, Mohammed and Ma, Jun and Fang, Congyu and Toma, Augustin and Wang, Bo },
   title = { { Exploring the Design Space of 3D MLLMs for CT Report Generation } }, 
   booktitle = {Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
   year = {2025},
   publisher = {Springer Nature Switzerland}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Data		Data
GREEN		GREEN
LaMed/src		LaMed/src
figs		figs
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
generate_green_score.py		generate_green_score.py
infer.py		infer.py
infer_vqa.py		infer_vqa.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AMOS-MM-Solution

Installation

Dataset Download

Data Preparation

Training

Medical Report Generation (MRG)

Visual Question Answering (VQA)

Optional: Training the Triplet Model

Inference

MRG Inference

VQA Inference

Editing the Knowledge Base for NN and BQ

Acknowledgements

Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

bowang-lab/AMOS-MM-Solution

Folders and files

Latest commit

History

Repository files navigation

AMOS-MM-Solution

Installation

Dataset Download

Data Preparation

Training

Medical Report Generation (MRG)

Visual Question Answering (VQA)

Optional: Training the Triplet Model

Inference

MRG Inference

VQA Inference

Editing the Knowledge Base for NN and BQ

Acknowledgements

Bibtex

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages