Home

Welcome to wiki!

How to use:

Interface has forty one sub-tabs (some with their own sub-tabs) in seven main tabs (Text, Image, Video, 3D, Audio, Extras and Interface): LLM, TTS-STT, MMS, SeamlessM4Tv2, LibreTranslate, StableDiffusion, Kandinsky, Flux, HunyuanDiT, Lumina-T2X, Kolors, AuraFlow, Würstchen, DeepFloydIF, PixArt, PlaygroundV2.5, Wav2Lip, LivePortrait, ModelScope, ZeroScope 2, CogVideoX, Latte, StableFast3D, Shap-E, SV34D, Zero123Plus, StableAudio, AudioCraft, AudioLDM 2, SunoBark, RVC, UVR, Demucs, Upscale (Real-ESRGAN), FaceSwap, MetaData-Info, Wiki, Gallery, ModelDownloader, Settings and System. Select the one you need and follow the instructions below

Text:

LLM:

First upload your models to the folder: inputs/text/llm_models
Select your model from the drop-down list
Select model type (transformers or llama)
Set up the model according to the parameters you need
Type (or speak) your request
Click the Submit button to receive the generated text and audio response

Optional: you can enable `TTS` mode, select the `voice` and `language` needed to receive an audio response. You can enable `multimodal` and upload an image to get its description. You can enable `websearch` for Internet access. You can enable `libretranslate` to get the translate. Also you can choose `LORA` model to improve generation

Voice samples = inputs/audio/voices

LORA = inputs/text/llm_models/lora

The voice must be pre-processed (22050 kHz, mono, WAV)

Avatars of LLM, you change in avatars folder

TTS-STT:

Type text for text to speech
Input audio for speech to text
Click the Submit button to receive the generated text and audio response

Voice samples = inputs/audio/voices

The voice must be pre-processed (22050 kHz, mono, WAV)

MMS (text-to-speech and speech-to-text):

Type text for text to speech
Input audio for speech to text
Click the Submit button to receive the generated text or audio response

SeamlessM4Tv2:

Type (or speak) your request
Select source, target and dataset languages
Set up the model according to the parameters you need
Click the Submit button to get the translate

LibreTranslate:

First you need to install and run LibreTranslate

Select source and target languages
Click the Submit button to get the translate

Optional: you can save the translation history by turning on the corresponding button

Image:

StableDiffusion - has twenty sub-tabs:

txt2img:

First upload your models to the folder: inputs/image/sd_models
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

Optional: You can select your `vae`, `embedding` and `lora` models to improve the generation method

vae = inputs/image/sd_models/vae

lora = inputs/image/sd_models/lora

embedding = inputs/image/sd_models/embedding

img2img:

First upload your models to the folder: inputs/image/sd_models
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Upload the initial image with which the generation will take place
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

Optional: You can select your `vae` model to improve the generation method

vae = inputs/image/sd_models/vae

depth2img:

Upload the initial image
Set up the model according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

pix2pix:

Upload the initial image
Set up the model according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

controlnet:

First upload your stable diffusion models to the folder: inputs/image/sd_models
Upload the initial image
Select your stable diffusion and controlnet models from the drop-down lists
Set up the models according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

upscale (latent):

Upload the initial image
Select your model
Set up the model according to the parameters you need
Click the Submit button to get the upscaled image

refiner (SDXL):

Upload the initial image
Click the Submit button to get the refined image

inpaint:

First upload your models to the folder: inputs/image/sd_models/inpaint
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Upload the image with which the generation will take place to initial image and mask image
In mask image, select the brush, then the palette and change the color to #FFFFFF
Draw a place for generation and enter your request (+ and - for prompt weighting)
Click the Submit button to get the inpainted image

Optional: You can select your `vae` model to improve the generation method

vae = inputs/image/sd_models/vae

outpaint:

First upload your models to the folder: inputs/image/sd_models/inpaint
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Upload the image with which the generation will take place to initial image
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the outpainted image

gligen:

First upload your models to the folder: inputs/image/sd_models
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Enter your request for prompt (+ and - for prompt weighting) and GLIGEN phrases (in "" for box)
Enter GLIGEN boxes (Like a [0.1387, 0.2051, 0.4277, 0.7090] for box)
Click the Submit button to get the generated image

animatediff:

First upload your models to the folder: inputs/image/sd_models
Select your model from the drop-down list
Set up the model according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image animation

Optional: you can select a motion LORA to control your generation

hotshot-xl

Enter your request
Set up the model according to the parameters you need
Click the Submit button to get the generated GIF-image

video:

Upload the initial image
Select your model
Enter your request (for IV2Gen-XL)
Set up the model according to the parameters you need
Click the Submit button to get the video from image

ldm3d:

Enter your request
Set up the model according to the parameters you need
Click the Submit button to get the generated images

sd3 (txt2img, img2img, controlnet, inpaint):

Enter your request
Set up the model according to the parameters you need
Click the Submit button to get the generated image

cascade:

Enter your request
Set up the model according to the parameters you need
Click the Submit button to get the generated image

t2i-ip-adapter:

Upload the initial image
Select the options you need
Click the Submit button to get the modified image

ip-adapter-faceid:

Upload the initial image
Select the options you need
Click the Submit button to get the modified image

riffusion (text-to-image, image-to-audio, audio-to-image):

text-to-image:
- 1. Enter your request
  2. Set up the model according to the parameters you need
  3. Click the Submit button to get the generated image
image-to-audio:
- 1. Upload the initial image
  2. Select the options you need
  3. Click the Submit button to get the audio from image
audio-to-image:
- 1. Upload the initial audio
  2. Select the options you need
  3. Click the Submit button to get the image from audio

Kandinsky (txt2img, img2img, inpaint):

Enter your prompt
Select a model from the drop-down list
Set up the model according to the parameters you need
Click Submit to get the generated image

Flux:

Enter your prompt
Select your model
Set up the model according to the parameters you need
Click Submit to get the generated image

Optional: You can select your `lora` models to improve the generation method. You can also use quantized models by clicking on the `Enable quantize` button if you have low VRAM, but you need to download the model yourself: FLUX.1-dev or FLUX.1-schnell and also VAE, CLIP and T5XXL

lora = inputs/image/flux-lora

Quantize models =inputs/image/quantize-flux

HunyuanDiT (txt2img, controlnet):

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated image

Lumina-T2X:

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated image

Kolors (txt2img, img2img, ip-adapter-plus):

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated image

Optional: You can select your `lora` models to improve the generation method

lora = inputs/image/kolors-lora

AuraFlow:

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated image

Optional: You can select your `lora` models and enable `AuraSR` to improve the generation method

lora = inputs/image/auraflow-lora

Würstchen:

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated image

DeepFloydIF (txt2img, img2img, inpaint):

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated image

PixArt:

Enter your prompt
Select your model
Set up the model according to the parameters you need
Click Submit to get the generated image

PlaygroundV2.5:

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated image

Video:

Wav2Lip:

Upload the initial image of face
Upload the initial audio of voice
Set up the model according to the parameters you need
Click the Submit button to receive the lip-sync

LivePortrait:

Upload the initial image of face
Upload the initial video of face moving
Click the Submit button to receive the animated image of face

ModelScope:

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated video

ZeroScope 2:

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated video

CogVideoX:

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated video

Latte:

Enter your prompt
Set up the model according to the parameters you need
Click Submit to get the generated video

3D:

StableFast3D:

Upload the initial image
Set up the model according to the parameters you need
Click the Submit button to get the generated 3D object

Shap-E:

Enter your request or upload the initial image
Set up the model according to the parameters you need
Click the Submit button to get the generated 3D object

SV34D:

Upload the initial image (for 3D) or video (for 4D)
Set up the model according to the parameters you need
Click the Submit button to get the generated 3D video

Zero123Plus:

Upload the initial image
Set up the model according to the parameters you need
Click the Submit button to get the generated 3D rotation of image

Audio:

StableAudio:

Set up the model according to the parameters you need
Enter your request
Click the Submit button to get the generated audio

AudioCraft:

Select a model from the drop-down list
Select model type (musicgen, audiogen or magnet)
Set up the model according to the parameters you need
Enter your request
(Optional) upload the initial audio if you are using melody model
Click the Submit button to get the generated audio

Optional: You can enable `multiband diffusion` to improve the generated audio

AudioLDM 2:

Select a model from the drop-down list
Set up the model according to the parameters you need
Enter your request
Click the Submit button to get the generated audio

SunoBark:

Type your request
Set up the model according to the parameters you need
Click the Submit button to receive the generated audio response

RVC:

First upload your models to the folder: inputs/audio/rvc_models
Upload the initial audio
Select your model from the drop-down list
Set up the model according to the parameters you need
Click the Submit button to receive the generated voice cloning

UVR:

Upload the initial audio to separate
Click the Submit button to get the separated audio

Demucs:

Upload the initial audio to separate
Click the Submit button to get the separated audio

Extras (Image, Video, Audio):

Upload the initial file
Select the options you need
Click the Submit button to get the modified file

Upscale (Real-ESRGAN):

Upload the initial image
Select your model
Set up the model according to the parameters you need
Click the Submit button to get the upscaled image

FaceSwap:

Upload the source image of face
Upload the target image or video of face
Select the options you need
Click the Submit button to get the face swapped image

Optional: you can enable a FaceRestore to upscale and restore your face image/video

MetaData-Info:

Upload generated file
Click the Submit button to get a metadata info from file

Interface:

Wiki:

Here you can view online or offline wiki of project

Gallery:

Here you can view files from the outputs directory

ModelDownloader:

Here you can download LLM and StableDiffusion models. Just choose the model from the drop-down list and click the Submit button

`LLM` models are downloaded here: inputs/text/llm_models

`StableDiffusion` models are downloaded here: inputs/image/sd_models

Settings:

Here you can change the application settings

System:

Here you can see the indicators of your computer's sensors

Additional Information:

All generations are saved in the outputs folder. You can open the outputs folder using the Outputs button
You can turn off the application using the Close terminal button

Where can i get models and voices?

LLM models can be taken from HuggingFace or from ModelDownloader inside interface
StableDiffusion, vae, inpaint, embedding and lora models can be taken from CivitAI or from ModelDownloader inside interface
RVC models can be taken from VoiceModels
StableAudio, AudioCraft, AudioLDM 2, TTS, Whisper, MMS, SeamlessM4Tv2, Wav2Lip, LivePortrait, SunoBark, MoonDream2, Upscalers (Latent and Real-ESRGAN), Refiner, GLIGEN, Depth, Pix2Pix, Controlnet, AnimateDiff, HotShot-XL, Videos, LDM3D, SD3, Cascade, T2I-IP-ADAPTER, IP-Adapter-FaceID, Riffusion, Rembg, Roop, CodeFormer, DDColor, PixelOE, Real-ESRGAN, StableFast3D, Shap-E, SV34D, Zero123Plus, UVR, Demucs, Kandinsky, Flux, HunyuanDiT, Lumina-T2X, Kolors, AuraFlow, AuraSR, Würstchen, DeepFloydIF, PixArt, PlaygroundV2.5, ModelScope, ZeroScope 2, CogVideoX, Latte and Multiband diffusion models are downloads automatically in inputs folder when are they used
You can take voices anywhere. Record yours or take a recording from the Internet. Or just use those that are already in the project. The main thing is that it is pre-processed!

Home

Welcome to wiki!

How to use:

Text:

LLM:

Voice samples = inputs/audio/voices

LORA = inputs/text/llm_models/lora

The voice must be pre-processed (22050 kHz, mono, WAV)

Avatars of LLM, you change in avatars folder

TTS-STT:

Voice samples = inputs/audio/voices

The voice must be pre-processed (22050 kHz, mono, WAV)

MMS (text-to-speech and speech-to-text):

SeamlessM4Tv2:

LibreTranslate:

Optional: you can save the translation history by turning on the corresponding button

Image:

StableDiffusion - has twenty sub-tabs:

txt2img:

Optional: You can select your vae, embedding and lora models to improve the generation method

vae = inputs/image/sd_models/vae

lora = inputs/image/sd_models/lora

embedding = inputs/image/sd_models/embedding

img2img:

Optional: You can select your vae model to improve the generation method

vae = inputs/image/sd_models/vae

depth2img:

pix2pix:

controlnet:

upscale (latent):

refiner (SDXL):

inpaint:

Optional: You can select your vae model to improve the generation method

vae = inputs/image/sd_models/vae

outpaint:

gligen:

animatediff:

Optional: you can select a motion LORA to control your generation

hotshot-xl

video:

ldm3d:

sd3 (txt2img, img2img, controlnet, inpaint):

cascade:

t2i-ip-adapter:

ip-adapter-faceid:

riffusion (text-to-image, image-to-audio, audio-to-image):

Kandinsky (txt2img, img2img, inpaint):

Flux:

Optional: You can select your lora models to improve the generation method. You can also use quantized models by clicking on the Enable quantize button if you have low VRAM, but you need to download the model yourself: FLUX.1-dev or FLUX.1-schnell and also VAE, CLIP and T5XXL

lora = inputs/image/flux-lora

Quantize models =inputs/image/quantize-flux

HunyuanDiT (txt2img, controlnet):

Lumina-T2X:

Kolors (txt2img, img2img, ip-adapter-plus):

Optional: You can select your lora models to improve the generation method

lora = inputs/image/kolors-lora

AuraFlow:

Optional: You can select your lora models and enable AuraSR to improve the generation method

lora = inputs/image/auraflow-lora

Würstchen:

DeepFloydIF (txt2img, img2img, inpaint):

PixArt:

PlaygroundV2.5:

Video:

Wav2Lip:

LivePortrait:

ModelScope:

ZeroScope 2:

CogVideoX:

Latte:

3D:

StableFast3D:

Shap-E:

SV34D:

Zero123Plus:

Audio:

StableAudio:

AudioCraft:

Optional: You can enable multiband diffusion to improve the generated audio

AudioLDM 2:

Optional: You can select your `vae`, `embedding` and `lora` models to improve the generation method

Optional: You can select your `vae` model to improve the generation method

Optional: You can select your `vae` model to improve the generation method

Optional: You can select your `lora` models to improve the generation method. You can also use quantized models by clicking on the `Enable quantize` button if you have low VRAM, but you need to download the model yourself: FLUX.1-dev or FLUX.1-schnell and also VAE, CLIP and T5XXL

Optional: You can select your `lora` models to improve the generation method

Optional: You can select your `lora` models and enable `AuraSR` to improve the generation method

Optional: You can enable `multiband diffusion` to improve the generated audio

`LLM` models are downloaded here: inputs/text/llm_models

`StableDiffusion` models are downloaded here: inputs/image/sd_models