Skip to content
/ CVC Public

Code for "Boosting Visual Knowledge-Intensive Training for LVLMs through Causality-driven Visual Object Completion" (IJCAI 2025)

Notifications You must be signed in to change notification settings

XMUDeepLIT/CVC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CVC

Code for "Boosting Visual Knowledge-Intensive Training for LVLMs through Causality-driven Visual Object Completion" (IJCAI 2025)

CVC

Data Preparation

High-Causality Entity Collection

  1. Download the COCO dataset using LAVIS.

  2. Format the input into a JSON list. Each entry should contain:

    {
        "image": "image file",
        "text_input": "image caption"
    }
  3. Extract entities for each caption:

    python cvc/data_preparation/1-0_entity_extractor.py
  4. Tag the causality for each entity:

    python cvc/data_preparation/1-1_causality_tagger.py

Image Occlusion

  1. Use GLIP to detect bounding boxes of high-causality entities. Download the GLIP checkpoint and run the following script within the GLIP repository:

    python cvc/data_preparation/2-1_detect_bbox.py
  2. Use SAM to mask high-causality objects:

    python cvc/data_preparation/2-2_segment.py

Instruction Generation

  1. Generate the specific instruction for each high-causality entity:
    python cvc/data_preparation/3_instruction_generator.py

Model Training

Trial Sampling

  1. Sample multiple rationales (trials) for each CVC instance:

    python cvc/model_training/1_cot_generator_llava.py
  2. Extract the final answer from each trial:

    python cvc/model_training/2_answer_extractor.py
  3. Verify the correctness of each trial's answer using soft matching with the BGE-M3 embedding model.

    python cvc/model_training/3_answer_checker.py

Trial Learning

  1. Collect challenging successful CVC instances and construct the training data using hybrid formats. The resulting dataset is combined with the instruction data of LLaVA-1.5:

    python cvc/model_training/4_hybrid_format.py
  2. Download the pretrained checkpoint of LLaVA-1.5 and use the official LLaVA training script for model training.

🤝 Acknowledgements

This project builds upon the excellent work of several open-source repositories. We sincerely thank the authors for their contributions:

  • LLaVA: for the base LVLM architecture and training pipeline
  • LAVIS: for dataset downloading
  • GLIP: for object detection

Please make sure to install all required dependencies as specified in the respective repositories.

About

Code for "Boosting Visual Knowledge-Intensive Training for LVLMs through Causality-driven Visual Object Completion" (IJCAI 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages