Skip to content

Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"

License

Notifications You must be signed in to change notification settings

adobe-research/EditVerse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EditVerse

This repository contains the instruction-based video editing evaluation code for EditVerseBench in the paper "EditVerse: A Unified Framework for Editing and Generation via In-Context Learning".

Xuan Ju12, Tianyu Wang1, Yuqian Zhou1, He Zhang1, Qing Liu1, Nanxuan Zhao1, Zhifei Zhang1, Yijun Li1, Yuanhao Cai3, Shaoteng Liu1, Daniil Pakhomov1, Zhe Lin1, Soo Ye Kim1*, Qiang Xu2*
1Adobe Research 2The Chinese University of Hong Kong 3Johns Hopkins University *Corresponding Author

🌐 Project Page | 📜 Arxiv | 🤗 Benchmark | 📹 Slides | 👀 Comparison

Setup Environment

(Optional) Create a Conda environment

conda create -n EditVerse python=3.10
conda activate EditVerse

Install Pytorch

(You may adjust the version or CUDA support depending on your hardware)

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Install required packages

pip install -r requirements.txt

Download Benchmark & Results

Download benchmark dataset

git lfs install
git clone https://huggingface.co/datasets/sooyek/EditVerseBench

Download the videos

The source videos cannot be directly distributed due to licensing restrictions. Instead, you can download them using the provided script with the Pixabay API. (The network connection may occasionally fail, so you might need to run the script multiple times.)

⚠️ Note: Please remember to revise the API key to your own key in download_source_video.py. You can find the API key here (marked in Parameters-key(required) on the website). The API is free, but you need to sign up for an account to get the API key.

cd EditVerseBench
python download_source_video.py

The benchmark file structure should be like:

EditVerseBench/
  ├── test.json
  ├── depths/
  │   ├── xx.mp4
  ├── edited_first_frame/
  │   ├── xx.mp4
  ├── images/
  │   ├── xx.mp4
  ├── inpaint_video_and_masks/
  │   ├── xx.mp4
  ├── poses/
  │   ├── xx.mp4
  ├── sketchs/
  │   ├── xx.mp4
  ├── videos/
  │   ├── xx.mp4

Unpack comparison results

cd EditVerseBench
tar -zxvf EditVerse_Comparison_Results.tar.gz
rm EditVerse_Comparison_Results.tar.gz

Evaluation

Command

python eval.py --metrics [metrics] \
--test_json_path EditVerseBench/EditVerseBench/test.json \
--generate_results_dir [results_dir] \
--output_csv [output_csv] \
--gpt_api_key [your_api_key]

Arguments

  • metrics: Use all to evaluate all metrics.

    To select specific metrics, provide a comma-separated list (no spaces). Example: clip_temporal_consistency,dino_temporal_consistency

    Supported metrics include:

    • clip_temporal_consistency
    • dino_temporal_consistency
    • frame_text_alignment
    • video_text_alignment
    • pick_score_video_quality
    • editing_vlm_evaluation
  • test_json_path: Path to the benchmark entrypoint JSON file.

  • generate_results_dir: Directory containing generated results (must follow the required structure).

  • output_csv: Path to save the evaluation CSV file.

  • gpt_api_key: penAI API key (required for editing_vlm_evaluation).

Example

Evaluate the provided EditVerse results and save output to EditVerse_eval.csv:

python eval.py --metrics all \
--test_json_path EditVerseBench/EditVerseBench/test.json \
--generate_results_dir EditVerseBench/EditVerse_Comparison_Results/EditVerse \
--output_csv EditVerse_eval.csv \
--gpt_api_key [Your API key]

👉 Pre-computed evaluation results for EditVerse and previous methods are available at: EditVerseBench/automatic_evaluation_results.

Evaluate Your Own Model

You can also evaluate your model outputs by following the same format.

Step 1: Refer to benchmark JSON format

See EditVerseBench/EditVerseBench/test.json for reference.

Each entry looks like this:

{
    "0": {
        "<text>": "<video1> Add a small golden crown ...",
        "<video1>": "videos/174008-850361316.mp4",
        "<video1> link": "https://pixabay.com/videos/woman-smile-communication-gesture-174008/",
        "direction": "horizontal",
        "target_prompt": "A young woman stands outside in front of ...",
        "type": "add object",
        "source_prompt": "A young woman stands outside in front of ..."
    },
    "1": {
        ...
    },
    ...
}

Key fields:

  • <text>: A natural language instruction describing the required edit in an interleaved format.
    • The instruction may include special tags such as <video1>, <video2>, or <image1>.
    • Each tag corresponds to a specific key field defined in the same JSON entry.
  • <video1>: The local file path of the source video.
  • <video1> link: The reference URL pointing to the source video’s original location.
  • direction: horizontal or vertical.
  • target_prompt: A detailed textual description of the desired edited video outcome.
  • type: The category of the edit
  • source_prompt: A description of the original, unedited video.

Step 2: Format your results

After generating results with your model, arrange files as follows:

Your_Folder/
  ├── 0/
  │   ├── generate.mp4   # model-generated video
  │   └── video1.mp4     # source video
  ├── 1/
  │   ├── generate.mp4
  │   └── video1.mp4
  ...

Step 3: Run evaluation

python eval.py --metrics all \
--test_json_path EditVerseBench/EditVerseBench/test.json \
--generate_results_dir [Your_Folder] \
--output_csv [Your_Results.csv] \
--gpt_api_key [your_api_key]

Benchmark Results

Method VLM evaluation Video Quality Text Alignment Temporal Consistency
Editing Quality ↑ Pick Score ↑ Frame ↑ Video ↑ CLIP ↑ DINO ↑
Attention Manipulation (Training-free)
TokenFlow 5.2619.7325.5722.7098.3698.09
STDF 4.4119.4525.2422.2696.0495.22
First-Frame Propagation (w/ End-to-End Training)
Señorita-2M 6.9719.7126.3423.2498.0597.99
Instruction-Guided (w/ End-to-End Training)
InsV2V 5.2119.3924.9922.5497.1596.57
Lucy Edit 5.8919.6726.0023.1198.4998.38
Ours (Ours) 7.6520.0726.7323.9398.5698.42

License

Files under ./automatic_evaluation/viclip are from InternVideo and under Apache 2.0 License. Files under ./automatic_evaluation except for those under the folder viclip are modified from awesome-diffusion-v2v under MIT License and modifications by Adobe are under Adobe Research License. All other materials are licensed under Adobe Research License.

Cite Us

If you find our work useful for your research, please consider citing our paper:

@article{ju2025editverse,
  title   = {EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning},
  author  = {Xuan Ju and Tianyu Wang and Yuqian Zhou and He Zhang and Qing Liu and Nanxuan Zhao and Zhifei Zhang and Yijun Li and Yuanhao Cai and Shaoteng Liu and Daniil Pakhomov and Zhe Lin and Soo Ye Kim and Qiang Xu},
  journal = {arXiv preprint arXiv:2509.20360},
  year    = {2025},
  url     = {https://arxiv.org/abs/2509.20360}
}

About

Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages