A ComfyUI custom node set for integrating Ovis2, a powerful multimodal large language model designed to analyze images and videos.
- Image Captioning: Generate detailed descriptions of images
- Multi-Image Analysis: Compare and analyze up to 4 images simultaneously
- Video Description: Process video frames for scene understanding
- Auto-Download: Automatically download models from Hugging Face
- Multiple Models: Support for all Ovis2 model sizes (1B to 34B parameters)
- Install ComfyUI Manager if you haven't already
- Open ComfyUI Manager
- Go to "Install Custom Nodes" tab
- Click "Install from Git URL"
- Enter the GitHub repository URL
- Click Install
- Navigate to your ComfyUI installation folder
- Go to the
custom_nodes
directory (create it if it doesn't exist) - Clone this repository:
git clone https://github.com/Andro-Meta/ComfyUI-Ovis2.git
- Install the required dependencies:
pip install -r custom_nodes/ComfyUI-Ovis2/requirements.txt
- Restart ComfyUI
- transformers>=4.46.2
- huggingface-hub>=0.23.0
- torch>=2.4.0
- pillow>=10.3.0
- flash-attn>=2.7.0
- numpy>=1.25.0
After installation, you'll find four new nodes in the "Ovis2" category:
Loads the Ovis2 model with configurable settings:
model_name
: Choose which Ovis2 model to loadprecision
: Set numerical precisionmax_token_length
: Maximum context lengthdevice
: Choose CPU or CUDA for inferenceauto_download
: Enable or disable automatic model downloading
Generates detailed descriptions of images:
model
: Connect to the Ovis2 modelimage
: Connect to an image inputprompt
: Instructions for the modelmax_new_tokens
: Maximum length of generated texttemperature
: Controls randomness
Analyzes multiple images together:
- Supports up to 4 images simultaneously
- Great for comparison or sequence analysis
Processes video frames:
- Works with ComfyUI's standard video frame output format
- Controls for frame_skip and max_frames to handle longer videos
- Add a "Load Image" node and select an image
- Add a "Load Ovis2 Model" node and choose your preferred model size
- Add an "Ovis2 Image Caption" node
- Connect:
- The image output to the "image" input on the caption node
- The model output to the "model" input on the caption node
- Run the workflow to generate a detailed caption
- Load two or more images using "Load Image" nodes
- Add a "Load Ovis2 Model" node
- Add an "Ovis2 Multi-Image Analysis" node
- Connect:
- The model output to the "model" input
- Each image to the corresponding image inputs
- Set a prompt like "Compare these images and describe their similarities and differences"
- Run the workflow to get a comparative analysis
Models are stored in the models/ovis
directory inside your ComfyUI installation. The nodes will automatically create this directory if it doesn't exist.
If you encounter CUDA out of memory errors, try:
- Using a smaller model (Ovis2-1B or Ovis2-2B)
- Reducing the image size before processing
- Switching to "float16" precision
- Reducing max_token_length
- Check if auto_download is enabled
- Ensure you have a proper internet connection during first run
- Check if the model files are already downloaded to the correct location
- Verify that all dependencies are correctly installed
- Check the ComfyUI console for specific error messages
This project is licensed under the MIT License - see the LICENSE file for details.