Skip to content

Latest commit

 

History

History
140 lines (128 loc) · 5.53 KB

README.md

File metadata and controls

140 lines (128 loc) · 5.53 KB

Checkpoints Preparation

Here, we provide the instructions and scripts for setting up the checkpoints employed in the Vitron.

The List of Checkpoints

Model Function Saving Path Downloading Link
GLIGEN image generation & editing .checkpoints/gligen Link
i2vgen-xl image-to-video generation .checkpoints/i2vgen-xl Link
LanguageBind image & video encoder .checkpoints/LanguageBind Link
OpenCLIP image & text encoder .checkpoints/openai Link
SEEM image & video segmentation .checkpoints/seem Link
StableVideo video editing .checkpoints/stablevideo Link
Vitron-base reasoning .checkpoints/Vitron-base Link
Vitron-lora reasoning .checkpoints/Vitron-lora Link
ZeroScope video generation .checkpoints/zeroscope Link

The File Structure

checkpoints
├── gligen
│   ├── demo_config_legacy
│   │   ├── gligen-generation-text-box.pth
│   │   ├── gligen-generation-text-image-box.pth
│   │   └── gligen-inpainting-text-box.pth
│   ├── gligen-generation-text-box
│   │   └── diffusion_pytorch_model.bin
│   ├── gligen-generation-text-image-box
│   │   └── diffusion_pytorch_model.bin
│   └── gligen-inpainting-text-box
│       └── diffusion_pytorch_model.bin
├── i2vgen-xl
│   ├── feature_extractor
│   │   └── preprocessor_config.json
│   ├── image_encoder
│   │   ├── config.json
│   │   ├── model.fp16.safetensors
│   │   └── model.safetensors
│   ├── model_index.json
│   ├── models
│   │   ├── i2vgen_xl_00854500.pth
│   │   ├── open_clip_pytorch_model.bin
│   │   ├── stable_diffusion_image_key_temporal_attention_x1.json
│   │   └── v2-1_512-ema-pruned.ckpt
│   ├── scheduler
│   │   └── scheduler_config.json
│   ├── text_encoder
│   │   ├── config.json
│   │   ├── model.fp16.safetensors
│   │   └── model.safetensors
│   ├── tokenizer
│   │   ├── merges.txt
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   └── vocab.json
│   ├── unet
│   │   ├── config.json
│   │   ├── diffusion_pytorch_model.fp16.safetensors
│   │   └── diffusion_pytorch_model.safetensors
│   └── vae
│       ├── config.json
│       ├── diffusion_pytorch_model.fp16.safetensors
│       └── diffusion_pytorch_model.safetensors
├── LanguageBind
│   ├── LanguageBind_Image
│   │   ├──  ...
│   ├── LanguageBind_Video
│   │   ├──  ...
│   └── LanguageBind_Video_merge
│   │   ├──  ...
├── openai
│   ├── clip-vit-base-patch32
│   │   ├──  ...
│   └── clip-vit-large-patch14
│   │   ├──  ...
├── seem
│   └── seem_focall_v1.pt
├── stablevideo
│   ├── cldm_v15.yaml
│   ├── control_sd15_canny.pth
│   ├── control_sd15_depth.pth
│   ├── download.py
│   ├── dpt_hybrid-midas-501f0c75.pt
│   └── flan-t5-xl
│   │   ├──  ...
├── Vitron-base
│   ├── config.json
│   ├── generation_config.json
│   ├── pytorch_model-00001-of-00002.bin
│   ├── pytorch_model-00002-of-00002.bin
│   ├── pytorch_model.bin.index.json
│   ├── README.md
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.model
├── Vitron-lora
│   ├── adapter_config.json
│   ├── adapter_model.bin
│   ├── config.json
│   ├── non_lora_trainables.bin
│   └── trainer_state.json
└── zeroscope
    ├── model_index.json
    ├── scheduler
    │   └── scheduler_config.json
    ├── text_encoder
    │   ├── config.json
    │   └── pytorch_model.bin
    ├── tokenizer
    │   ├── merges.txt
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── vocab.json
    ├── unet
    │   ├── config.json
    │   └── diffusion_pytorch_model.bin
    ├── vae
    │   ├── config.json
    │   └── diffusion_pytorch_model.bin
    └── zs2_576w
        ├── open_clip_pytorch_model.bin
        └── text2video_pytorch_model.pth

Downloading Checkpoints

To obtain the model checkpoints, you have two options:

  • First, you can manually download them using the provided links and place them in their respective directories as outlined above.
  • Alternatively, for a more automated approach, you can execute the scripts below:
cd checkpoints
bash download.sh