[LoRA] support loading Flux Control LoRAs with  `bitsandbytes` quantization

https://github.com/huggingface/diffusers/pull/10578 fixed loading LoRAs into 4bit quantized models for Flux.

https://github.com/huggingface/diffusers/pull/10576 added a test to ensure Flux LoRAs can be loaded when 8bit `bitsandbytes` quantization is applied.

We still need to support all of this for Flux Control LoRAs as we do quite a bit of expansion gymnastics as well as new layer assignments to make it all work.

Some stuff I wanted to discuss before attempting a PR:

(when I say quantization always assumed quantization from `bitsandbytes` for this thread)

https://github.com/huggingface/diffusers/blob/c944f0651f679728d4ec7b6488120ac49c2f1315/src/diffusers/loaders/lora_pipeline.py#L2020 is responsible for initializing an expanded module. This is perfectly fine for non-quantized scenarios but for quantization we cannot be using `nn.Linear`. It needs to configured based on what quantization scheme we're using (4bit/8bit).

Same goes for:
https://github.com/huggingface/diffusers/blob/c944f0651f679728d4ec7b6488120ac49c2f1315/src/diffusers/loaders/lora_pipeline.py#L1917

@BenjaminBossan I wanted to pick your brains here to have a robust design for approaching the solution. Suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoRA] support loading Flux Control LoRAs with `bitsandbytes` quantization #10588

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[LoRA] support loading Flux Control LoRAs with bitsandbytes quantization #10588

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[LoRA] support loading Flux Control LoRAs with `bitsandbytes` quantization #10588