Skip to content

Releases: soten355/MetalDiffusion

v0.6543210 - Diffusers!

06 Aug 05:06
767f8a3
Compare
Choose a tag to compare

v0.6543210 - The Diffusers Update!

After a couple of months of debugging, and with the release of PyTorch 2.0, I'm excited to offer Diffusers as a render engine for MetalDiffusion! This includes a whole host of new features including more samplers, LoRAs, Token Merging, and so much more.

There's also a major overhaul of the Gradio GUI, setting the groundwork for future changes to the GUI options.

Animation has taken over for video and makes a huge debut with an improved camera movement system.

Additionally, there are some quality of life improvements, including a restructuring of the folders, import via PNG, and some memory efficiency.

Finally, I give my thoughts into the future of TensorFlow for MetalDiffusion.

Diffusers

This is the biggest update to MetalDiffusion. Diffusers is a python module developed by Hugging Face that utilizes "pipelines" to create images with diffusion (Stable Diffusion in particular). Diffusers, up until May, was reliant only on PyTorch 1.x, which wasn't as stable to use on Intel Macs. However, PyTorch 2.0 has been incredibly stable during my tests and I can safely say it's faster than TensorFlow for image generation on an Intel Mac.

Incorporating Diffusers was rather easy, thanks to their excellent documentation. As a user, you'll notice almost no difference aside from some new features. To switch between "render engines", simply go to Advanced Settings and select the render engine you want. Gradio will automatically hide/unhide options specific to the render engine, including weights that only work with either engine.

LoRA's

LoRA's, an incredibly popular tool, is available to MetalDiffusion, exclusively in the Diffusers render engine. Place them in the models/LoRA folder.

Make sure, when using them, to select DPM Solver for best results.

Samplers

Diffusers allows MetalDiffusion to finally use the most common and popular samplers, especially ones like DPM Solver, Euler, and DDIM. These samplers are for the Diffusers render engine only. Selecting them for the TensorFlow engine will do nothing.

Token Merging

By default, Token Merging is activated at 50%. This gives a huge speed increase, but you can also deactivate this for change the token merging strength in the Advanced Settings. Setting Token Merging to 0% deactivates it.

CLIP Skip

Some weights work better with skipping certain layers of the CLIP Text Model. You can now select how many layers to skip in the Advanced Settings.

Animation, the new Video

The video section of MetalDiffusion has been renamed Animation and has a major overhaul in code, specifically with camera movment. The new features are:

  • When you have an input image, you can preview what will change from frame to frame in the preview tab of the main GUI.
  • XYZ movement and rotation
  • Focal length
  • Zoom and angle are no longer an option, instead they are replaced with xyz rotation/movement.

GUI

The GUI has been recoded to be a module that dream.py imports. My goal is to move away from Gradio and use PyQT. Ultimately, this would mean I can create a MacOS app that can be installed just like any other app and launched by double clicking, instead of the current method of using terminal heavily.

Redesign

Inspired by Blender's layout, I've redesigned the GUI to favor bigger preview and result images. This means a big result/preview area on the left with all of the settings on the right.

Previews

ControlNet and Animation previews are now bigger and easier to access. They're under the Preview tab at the top.

Quality of Life

  • Import prior MetalDiffusion settings via .png files
  • Convert safetensors weights into either Keras .h5 folders or Diffusers/HuggingFace folders. Really useful!
  • Some memory efficiency regarding freeing up variables
  • Rich console text utilizing the Rich python module
  • Major restructure of folders, particularly models.

Model Folder Restructure

The models folder, where the weights for Stable Diffusion are stored, has been reorganized to reflect the different render engines. See the ReadMe.md for more information

Future of TensorFlow

I'm quite proud of the work I did with getting TensorFlow to utilize Textural Inversion and ControlNet, but I've been underwhelmed by it's performance on Intel Macs (and performance overall). There are more speed increases I can implement, particularly Token Merging, but it still pales in comparison to PyTorch.

However, I won't remove the TensorFlow render engine because I do think it has it's uses, especially once the TensorFlow team figures out how to use multiple GPUs on Metal.

I hope I can implement LoRA's for the TensorFlow engine by the end of the year, but my main focus will now shift to implementing more popular features and, most importantly, SDXL. Thankfully, Diffusers will help with that!

v0.54 Star Wars

25 Jun 01:18
fc195cd
Compare
Choose a tag to compare

v0.54 - The Star Wars Update!

After a few weeks of debugging, puzzle solving, and continuous testing, MetalDiffusion now fully supports ControlNet Tile!

ControlNet Update: Tile

This is the biggest update to MetalDiffusion. ControlNet Tile required reworking the main dream.py file along with creating tileSetter.py which handles cutting up an input image into smaller images, sending them to MetalDiffusion, and then receiving the final images to combine them together into a larger image.

Pre-Processing

Pre-processing is more dynamic in the Gradio application. Available options for the pre-processor now appear/disappear dynamically when you pick a specific option.

Tile pre-processing is the newest option and it includes the ability to upscale the tiles with two methods:

  • Bicubic - Default
  • ESRGAN - Tensorflow version, downloaded automatically with TensorFlowHub

Tools

The newest tool included with MetalDiffusion is depth mapping. Simply input an image and it will output the depth mapping.

v0.426

10 May 22:27
8a15188
Compare
Choose a tag to compare

v0.426 - The Alien Update!

After a few months of debugging, puzzle solving, and continuous testing, MetalDiffusion now fully supports ControlNet!

ControlNet

This is the biggest update to MetalDiffusion. ControlNet required an entire reworking of the sampler and UNET model. The program supports all ControlNet's that are in .safetensors and .pth format.

Pre-Processing

The program is limited currently on the built in pre-processing functions. I wanted to keep the frontend of the program agnostic of a specific module (like PyTorch or TensorFlow), so the pre-processing options are limited to:

  • Canny Edge Detection
  • Soft Edge Detection (also known as HED)

That being said, you can always choose to bypass the pre-processing by simply unchecking Pre-Process Image? in the ControlNet tab.

ControlNet Cache

If the input image for the ControlNet didn't change, then why make ControlNet process it? ControlNet Cache is an option you can choose to safe the results from the ControlNet section temporarily and speed up the image creation process. Of course, the cache will update itself and run ControlNet again if any of these change:

  • Image Size
  • ControlNet Weight
  • Input Image (both for Stable Diffusion and ControlNet)
  • Steps

Video

ControlNet is fully integrated into the Video tab, but is still limited in terms of what it can do frame to frame. The video tab, in general, will need the ability to read frames from a folder as the input image(s) instead of its own frames. BUT, this means that ControlNet will eventually read those frames as well because it's fully integrated. Needless to say, ControlNet will be ready.

Quality of Life

There are a whole host of quality of life enhancements too, including:

  • Quicker Weight loading
  • Better UI
  • Small Increase in Speed
  • Better Folder Structure
  • Audio Notification (Beep) when image generation finishes
  • Utilizing the KerasCV Model Structure, an improved version from Divum Gupta's

v0.314 The Pi Update

16 Mar 17:38
5792921
Compare
Choose a tag to compare

Metal Diffusion

This is a major update to my initial fork release. I finished it on 3/14/2023, so I've dubbed this version the Pi Update.

What's new?!

  • GPU Selection
  • Text Embedding Support
  • Keras ".5" Weight Support
  • User Interface Facelift
  • More TensorFlow Adoption

GPU Selection

Under Advanced Settings, you can select the render device: any of your available Metal GPU's or your CPU!

Text Embedding Support

Text Embeddings/Inversion for inference (the creation of an image) is fully supported now. The program can not create new text embeddings, but it certainly can use them.

Save your favorite text embeddings into models/embeddings and the program will have them available under: Settings>Text Embeddings

Keras ".h5" Weight Support

Though still uncommon, pretrained Stable Diffusion models in the Keras ".h5" format are fully supported. Save them in the models/ folder as a folder. More details on how to do it here: Keras ".h5" Info

User Interface Facelift

With some advice from current users, the user interface has been changed to make more sense. Model selection is now one of the first options as well as input images and text embeddings. More advanced options, specifically experimental choices, are in the Advanced Settings.

TensorFlow Adoption

I'm getting the program closer and closer to being pure TensorFlow. There are still a new numpy sections, but the goal is to get the program to run in graph mode, which in theory, is faster than eager mode.