Skip to content

Commit

Permalink
Replicate Init 2.0
Browse files Browse the repository at this point in the history
Replicate READMEs
  • Loading branch information
zsxkib committed Jan 23, 2024
1 parent 97bf07a commit 4c64b28
Show file tree
Hide file tree
Showing 5 changed files with 295 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,6 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# Cog
.cog
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
<a href='https://arxiv.org/abs/2401.07519'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
<a href='https://huggingface.co/papers/2401.07519'><img src='https://img.shields.io/static/v1?label=Paper&message=Huggingface&color=orange'></a>
<a href='https://huggingface.co/spaces/InstantX/InstantID'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
[![Replicate](https://replicate.com/zsxkib/instant-id/badge)](https://replicate.com/zsxkib/instant-id)

**InstantID : Zero-shot Identity-Preserving Generation in Seconds**

Expand Down
31 changes: 31 additions & 0 deletions cog.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
# set to true if your model requires a GPU
gpu: true
# cuda: "12.1"

# a list of ubuntu apt packages to install
system_packages:
- "libgl1-mesa-glx"
- "libglib2.0-0"

# python version in the form '3.11' or '3.11.4'
python_version: "3.11"

# a list of packages in the format <package-name>==<version>
python_packages:
- "opencv-python==4.9.0.80"
- "transformers==4.37.0"
- "accelerate==0.26.1"
- "insightface==0.7.3"
- "diffusers==0.25.1"
- "onnxruntime==1.16.3"

# commands run after the environment is setup
run:
- curl -o /usr/local/bin/pget -L "https://github.com/replicate/pget/releases/download/v0.6.0/pget_linux_x86_64" && chmod +x /usr/local/bin/pget

# predict.py defines how predictions are run on your model
predict: "cog/predict.py:Predictor"
60 changes: 60 additions & 0 deletions cog/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# InstantID Cog Model

[![Replicate](https://replicate.com/zsxkib/instant-id/badge)](https://replicate.com/zsxkib/instant-id)

## Overview
This repository contains the implementation of [InstantID](https://github.com/InstantID/InstantID) as a [Cog](https://github.com/replicate/cog) model.

Using [Cog](https://github.com/replicate/cog) allows any users with a GPU to run the model locally easily, without the hassle of downloading weights, installing libraries, or managing CUDA versions. Everything just works.

## Development
To push your own fork of InstantID to [Replicate](https://replicate.com), follow the [Model Pushing Guide](https://replicate.com/docs/guides/push-a-model).

## Basic Usage
To make predictions using the model, execute the following command from the root of this project:

```bash
cog predict \
-i image=@examples/sam_resize.png \
-i prompt="analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality" \
-i negative_prompt="nsfw" \
-i width=680 \
-i height=680 \
-i ip_adapter_scale=0.8 \
-i controlnet_conditioning_scale=0.8 \
-i num_inference_steps=30 \
-i guidance_scale=5
```

<table>
<tr>
<td>
<p align="center">Input</p>
<img src="https://replicate.delivery/pbxt/KGy0R72cMwriR9EnCLu6hgVkQNd60mY01mDZAQqcUic9rVw4/musk_resize.jpeg" alt="Sample Input Image" width="90%"/>
</td>
<td>
<p align="center">Output</p>
<img src="https://replicate.delivery/pbxt/oGOxXELcLcpaMBeIeffwdxKZAkuzwOzzoxKadjhV8YgQWk8IB/result.jpg" alt="Sample Output Image" width="100%"/>
</td>
</tr>
</table>

## Input Parameters

The following table provides details about each input parameter for the `predict` function:

| Parameter | Description | Default Value | Range |
| ------------------------------- | ---------------------------------- | -------------------------------------------------------------------------------------------------------------- | ----------- |
| `image` | Input image | A path to the input image file | Path string |
| `prompt` | Input prompt | "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, ... " | String |
| `negative_prompt` | Input Negative Prompt | (empty string) | String |
| `width` | Width of output image | 640 | 512 - 2048 |
| `height` | Height of output image | 640 | 512 - 2048 |
| `ip_adapter_scale` | Scale for IP adapter | 0.8 | 0.0 - 1.0 |
| `controlnet_conditioning_scale` | Scale for ControlNet conditioning | 0.8 | 0.0 - 1.0 |
| `num_inference_steps` | Number of denoising steps | 30 | 1 - 500 |
| `guidance_scale` | Scale for classifier-free guidance | 5 | 1 - 50 |

This table provides a quick reference to understand and modify the inputs for generating predictions using the model.


200 changes: 200 additions & 0 deletions cog/predict.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# Prediction interface for Cog ⚙️
# https://github.com/replicate/cog/blob/main/docs/python.md

import os
import sys

import time
import subprocess
from cog import BasePredictor, Input, Path

import cv2
import torch
import numpy as np
from PIL import Image

from diffusers.utils import load_image
from diffusers.models import ControlNetModel

from insightface.app import FaceAnalysis

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
from pipeline_stable_diffusion_xl_instantid import (
StableDiffusionXLInstantIDPipeline,
draw_kps,
)

# for `ip-adaper`, `ControlNetModel`, and `stable-diffusion-xl-base-1.0`
CHECKPOINTS_CACHE = "./checkpoints"
CHECKPOINTS_URL = (
"https://weights.replicate.delivery/default/InstantID/checkpoints.tar"
)

# for `models/antelopev2`
MODELS_CACHE = "./models"
MODELS_URL = "https://weights.replicate.delivery/default/InstantID/models.tar"


def resize_img(
input_image,
max_side=1280,
min_side=1024,
size=None,
pad_to_max_side=False,
mode=Image.BILINEAR,
base_pixel_number=64,
):
w, h = input_image.size
if size is not None:
w_resize_new, h_resize_new = size
else:
ratio = min_side / min(h, w)
w, h = round(ratio * w), round(ratio * h)
ratio = max_side / max(h, w)
input_image = input_image.resize([round(ratio * w), round(ratio * h)], mode)
w_resize_new = (round(ratio * w) // base_pixel_number) * base_pixel_number
h_resize_new = (round(ratio * h) // base_pixel_number) * base_pixel_number
input_image = input_image.resize([w_resize_new, h_resize_new], mode)

if pad_to_max_side:
res = np.ones([max_side, max_side, 3], dtype=np.uint8) * 255
offset_x = (max_side - w_resize_new) // 2
offset_y = (max_side - h_resize_new) // 2
res[
offset_y : offset_y + h_resize_new, offset_x : offset_x + w_resize_new
] = np.array(input_image)
input_image = Image.fromarray(res)
return input_image


def download_weights(url, dest):
start = time.time()
print("downloading url: ", url)
print("downloading to: ", dest)
subprocess.check_call(["pget", "-x", url, dest], close_fds=False)
print("downloading took: ", time.time() - start)


class Predictor(BasePredictor):
def setup(self) -> None:
"""Load the model into memory to make running multiple predictions efficient"""
if not os.path.exists(CHECKPOINTS_CACHE):
download_weights(CHECKPOINTS_URL, CHECKPOINTS_CACHE)

if not os.path.exists(MODELS_CACHE):
download_weights(MODELS_URL, MODELS_CACHE)

self.width, self.height = 640, 640
self.app = FaceAnalysis(
name="antelopev2",
root="./",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
self.app.prepare(ctx_id=0, det_size=(self.width, self.height))

# Path to InstantID models
face_adapter = f"./checkpoints/ip-adapter.bin"
controlnet_path = f"./checkpoints/ControlNetModel"

# Load pipeline
self.controlnet = ControlNetModel.from_pretrained(
controlnet_path,
torch_dtype=torch.float16,
cache_dir=CHECKPOINTS_CACHE,
local_files_only=True,
)

base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
self.pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
base_model_path,
controlnet=self.controlnet,
torch_dtype=torch.float16,
cache_dir=CHECKPOINTS_CACHE,
local_files_only=True,
)
self.pipe.cuda()
self.pipe.load_ip_adapter_instantid(face_adapter)

def predict(
self,
image: Path = Input(description="Input image"),
prompt: str = Input(
description="Input prompt",
default="analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality",
),
negative_prompt: str = Input(
description="Input Negative Prompt",
default="",
),
width: int = Input(
description="Width of output image",
default=640,
ge=512,
le=2048,
),
height: int = Input(
description="Height of output image",
default=640,
ge=512,
le=2048,
),
ip_adapter_scale: float = Input(
description="Scale for IP adapter",
default=0.8,
ge=0,
le=1,
),
controlnet_conditioning_scale: float = Input(
description="Scale for ControlNet conditioning",
default=0.8,
ge=0,
le=1,
),
num_inference_steps: int = Input(
description="Number of denoising steps",
default=30,
ge=1,
le=500,
),
guidance_scale: float = Input(
description="Scale for classifier-free guidance",
default=5,
ge=1,
le=50,
),
) -> Path:
"""Run a single prediction on the model"""
if self.width != width or self.height != height:
print(f"[!] Resizing output to {width}x{height}")
self.width = width
self.height = height
self.app.prepare(ctx_id=0, det_size=(self.width, self.height))

face_image = load_image(str(image))
face_image = resize_img(face_image)

face_info = self.app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR))
face_info = sorted(
face_info,
key=lambda x: (x["bbox"][2] - x["bbox"][0]) * (x["bbox"][3] - x["bbox"][1]),
reverse=True,
)[
0
] # only use the maximum face
face_emb = face_info["embedding"]
face_kps = draw_kps(face_image, face_info["kps"])

self.pipe.set_ip_adapter_scale(ip_adapter_scale)
image = self.pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image_embeds=face_emb,
image=face_kps,
controlnet_conditioning_scale=controlnet_conditioning_scale,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
).images[0]

output_path = "result.jpg"
image.save(output_path)
return Path(output_path)

0 comments on commit 4c64b28

Please sign in to comment.