This project is a web interface built with Gradio to run image-to-text inference using Google's medgemma-4b-it
vision-language model. MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension.
- Upload and analyze medical images (X-ray, MRI, etc.) (accepts ".jpg" ".png", but not ".dicom" raw images. ".dicom" images needs to first be converted to normal image formats)
- Ask free-form medical questions related to the image
- Powered by
medgemma-4b-it
, a multimodal transformer - Clean, interactive, and easy to understand web UI using Gradio
- Needs to run while not a root user
- Python 3.10+
- Download the
medgemma-4b-it
model from Hugging Face:- download the project from github, and navigate into its folder
git clone https://github.com/Google-Health/medgemma.git cd medgemma
- download the model file (requires huggingface account)
huggingface-cli download google/medgemma-4b-it --local-dir checkpoints/medgemma-4b-it
- download the project from github, and navigate into its folder
Install Python library dependencies:
pip install torch transformers accelerate bitsandbytes gradio pillow
-
Run the app
python medgemma.py
-
The program will prompt it is serving on "0.0.0.0:7860" , you will need to use a web browser to visit "server-IP:7860"
-
Upload a medical image (e.g., X-ray.jpg).
-
Ask a question like:
“What is shown in this image?”
“Is there evidence of pneumonia?”
-
Click Submit.
-
The model will respond with an expert-level interpretation in the right window