-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Feature request
Introduce a new pipeline to extend existing image inpainting capabilities (StableDiffusionInpaintPipeline) to videos. The goal is to provide a native, GPU-optimized API within Diffusers that performs temporally coherent video inpainting instead of independent per-frame processing.
Motivation
Current video inpainting approaches in the community simply loop over frames and call the image inpainting pipeline repeatedly.
This leads to:
- Temporal flicker and inconsistent textures between frames.
- Poor GPU utilization and high memory overhead.
- Lack of tools to maintain motion coherence or reuse diffusion latents across time.
- A built-in VideoInpaintPipeline would make it possible to remove objects, restore scenes, or creatively edit videos using diffusion models while keeping motion and lighting consistent across frames.
Your contribution
I plan to:
Implement VideoInpaintPipeline as a subclass of DiffusionPipeline, leveraging StableDiffusionInpaintPipeline under the hood.
Add temporal consistency mechanisms, such as latent reuse between frames and optional optical-flow–guided warping (RAFT / GMFlow).
Optimize performance through batched FP16 inference, scheduler noise reuse, and optional torch.compile acceleration.
Provide a clean user API compatible with existing pipelines:
from diffusers import VideoInpaintPipeline
pipe = VideoInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
use_optical_flow=True,
compile=True,
)
result = pipe(
video_path="input.mp4",
mask_path="mask.mp4",
prompt="replace background with a snowy mountain",
num_inference_steps=10,
)
result.video.save("output.mp4")
Contribute documentation and tests demonstrating temporal coherence, performance benchmarks, and example notebooks for real-world use.