Skip to content

Violet-sword/Google-MedGemma-WebUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Google MedGemma WebUI

This project is a web interface built with Gradio to run image-to-text inference using Google's medgemma-4b-it vision-language model. MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension.

Features

  • Upload and analyze medical images (X-ray, MRI, etc.) (accepts ".jpg" ".png", but not ".dicom" raw images. ".dicom" images needs to first be converted to normal image formats)
  • Ask free-form medical questions related to the image
  • Powered by medgemma-4b-it, a multimodal transformer
  • Clean, interactive, and easy to understand web UI using Gradio

Requirements

  • Needs to run while not a root user
  • Python 3.10+
  • Download the medgemma-4b-it model from Hugging Face:
    • download the project from github, and navigate into its folder
      git clone https://github.com/Google-Health/medgemma.git
      cd medgemma
    • download the model file (requires huggingface account)
      huggingface-cli download google/medgemma-4b-it --local-dir checkpoints/medgemma-4b-it

Install Python library dependencies:

pip install torch transformers accelerate bitsandbytes gradio pillow

Usage

  • Run the app

    python medgemma.py
  • The program will prompt it is serving on "0.0.0.0:7860" , you will need to use a web browser to visit "server-IP:7860"

  • Upload a medical image (e.g., X-ray.jpg).

  • Ask a question like:

    “What is shown in this image?”

    “Is there evidence of pneumonia?”

  • Click Submit.

  • The model will respond with an expert-level interpretation in the right window

About

Gradio-based web UI for medical image interpretation using Google's MedGemma vision-language model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages