Visual Question Answering for Medical Domain

Description

Visual Question Answering (VQA) is a rising interdisciplinary problem that demands the knowledge of both Computer Vision (CV) and Natural Language Processing (NLP). Many domain-specific VQA tasks have emerged in the last few years, and VQA in the medical domain is one such that plays a significant role in providing medical assistance to both doctors and patients. For example, doctors could use VQA model answers as assistance in medical diagnosis, while patients could ask questions from VQA related to their medical images to better understand their physical condition.
A VQA system takes as input an image and a natural language question about the image and produces an answer consistent with the visual content of a given image. To facilitate the research of implementing VQA in medical domain, ImageCLEF, which is part of the Conference and Labs of the Evaluation Forum (CLEF), has been conducting annual VQA-Med challenges since 2018. In this project, we will be using the dataset coming from VQA-Med 2020 challenge for training. As far as our knowledge goes, recent VQA-Med challenge participants have not used self-supervised learning (SSL) techniques and instead have focused on transfer learning, ensemble models, etc. Thus, towards the possible solution of the VQA-Med problem, in this project we implement two contrastive learning frameworks, MoCo and Barlow Twins pretrained on different medical datasets and fine-tuned on VQA-Med 2020 dataset.

This repository contains two folders: Barlow Twins and MoCo, each containing the code baselines for Barlow Twins and MoCo contrastive learning frameworks.

How to run

Install dependencies

# clone project
git clone https://github.com/numanai/Visual-Question-Answering-for-Medical-domain
cd Visual-Question-Answering-for-Medical-domain

# [OPTIONAL] create conda environment
conda env create -f conda_env_gpu.yaml -n your_env_name
conda activate your_env_name

# install requirements
pip install -r requirements.txt

The instructions to run the pretraining and finetuning codes for both SSL methods, i.e., MoCo and Barlow Twins, can be found in the respective directories.

Demo

We further provide some of the sample inputs and outputs of the model. Inputs are the natural language questions about the images, and answers are the medical diseases. Some of the answers are consistent with the ground truth, while others are not.

Datasets

Datasets used for pretraining:

ImageCLEF from 2018 till 2020 - ImageCLEF-2018, ImageCLEF-2019, ImageCLEF-2020. Please note that a signed and approved End User Agreement (EUA) is required to use these datasets.
CheXpert dataset
MIMIC-CXR

CheXpert and MIMIC-CXR datasets were not explicitly used in this project, rather the weights of the model pretrained on them were adapted to initialize our own models.

Datasets used for fine-tuning:

ImageCLEF-2020 from ImageCLEF-2020 VQA in medical domain challenge

Authors

Elnura Zhalieva (ryuzakizh) and Numan Saeed (numanai)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Barlow Twins		Barlow Twins
MoCo		MoCo
demo_images		demo_images
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Question Answering for Medical Domain

Description

How to run

Demo

Datasets

Datasets used for pretraining:

Datasets used for fine-tuning:

Authors

About

Releases

Packages

Contributors 2

Languages

numanai/Visual-Question-Answering-for-Medical-domain

Folders and files

Latest commit

History

Repository files navigation

Visual Question Answering for Medical Domain

Description

How to run

Demo

Datasets

Datasets used for pretraining:

Datasets used for fine-tuning:

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages