Skip to content

[LoRA] support loading Flux Control LoRAs with bitsandbytes quantization #10588

Closed
@sayakpaul

Description

@sayakpaul

#10578 fixed loading LoRAs into 4bit quantized models for Flux.

#10576 added a test to ensure Flux LoRAs can be loaded when 8bit bitsandbytes quantization is applied.

We still need to support all of this for Flux Control LoRAs as we do quite a bit of expansion gymnastics as well as new layer assignments to make it all work.

Some stuff I wanted to discuss before attempting a PR:

(when I say quantization always assumed quantization from bitsandbytes for this thread)

expanded_module = torch.nn.Linear(
is responsible for initializing an expanded module. This is perfectly fine for non-quantized scenarios but for quantization we cannot be using nn.Linear. It needs to configured based on what quantization scheme we're using (4bit/8bit).

Same goes for:

original_module = torch.nn.Linear(

@BenjaminBossan I wanted to pick your brains here to have a robust design for approaching the solution. Suggestions?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions