Llama-3.3-70B-Instruct-4bit LoRA Fine-Tuning: No Change (or Instability) - Adapter Issue? #1147

corozcop1980 · 2024-12-10T16:07:20Z

corozcop1980
Dec 10, 2024

Hi everyone,
I’m very new to this and still learning the ropes. I'm struggling to fine-tune mlx-community/Llama-3.3-70B-Instruct-4bit using LoRA (mlx_lm.lora V0.21.0) . The model either doesn't change the output at all (scale=1.0) or becomes completely unstable with gibberish output and NaN/inf values in the logits when I increase the scale even slightly.

The core problem is that the LoRA adapter seems to be having no usable effect on the model's output, despite successful training (loss decreases normally). It's not a matter of fine-tuning the scale - it's like the adaptation either does nothing or breaks the model.

Here's what I've tried:

Multiple rank and alpha values: I've experimented with rank values of 16, 32, and 64, and alpha values of 4, 8, 16, and 32.
Different scale values during inference:
- scale=1.0: No style change whatsoever. The output is identical to the base model.
- scale=1.5: Model outputs only "?"
- scale slightly above 1.9355 (with a previous training run): Gibberish output, NaN/inf in logits.
- scale=4.0 and scale=10 (with a previous training run): Gibberish/repeating tokens, numerical instability.
Debugging:
- I've verified that the LoRA weights are loaded correctly from the adapter file (adapters.safetensors).
- I've confirmed that the scale parameter is correctly passed to the LoRALinear layers during inference.
- I've checked that the LoRA computations (matrix multiplications) are happening without shape errors during the forward pass.
- I've inspected the lora_a and lora_b matrices and the z values (LoRA contribution) in LoRALinear.call. The z values have a very small standard deviation, suggesting the LoRA modifications are extremely subtle.
- [DEBUG] LoRA computation - z_std: 0.078338, scale: 4.0, final_std: 0.265381 [DEBUG] Generated logits - mean: inf, std: inf
Training Results:
- Training loss decreases steadily and reaches around 1.0, indicating the model is learning something.
- Initial loss: ~4.8 Final loss: ~1.0-1.2
Here's my latest training config:

model: "mlx-community/Llama-3.3-70B-Instruct-4bit"
train: true
fine_tune_type: "lora"
data: "dataset"
batch_size: 8 
iters: 1000 
learning_rate: 1e-5
lora_parameters:
  keys: ["self_attn.q_proj", "self_attn.v_proj", "self_attn.k_proj", "self_attn.o_proj", "mlp.gate_proj", "mlp.up_proj", "mlp.down_proj"]
  rank: 32 
  alpha: 8
  scale: 1.0 # During training
  dropout: 0.1
lr_schedule:
  name: "cosine_decay"
  warmup: 50
  arguments: [1e-5, 1000, 1e-7]

I'm really stuck here, and any insights or suggestions would be greatly appreciated!

Answered by awni

Dec 10, 2024

I tried training this:

mlx_lm.lora --model mlx-community/Llama-3.3-70B-Instruct-4bit --data mlx-community/wikisql --iters 100 --batch-size 1 --num-layers 8 --train

And then evaluating it like this:

mlx_lm.generate --model mlx-community/Llama-3.3-70B-Instruct-4bit --adapter-path adapters --max-tokens 50 \
               --prompt "table: 1-10015132-16
columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team
Q: What is terrence ross' nationality
A: "

And it generated the following which is very reasonable:

Prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_…

View full answer

awni · 2024-12-10T21:58:07Z

awni
Dec 10, 2024
Maintainer

I tried training this:

mlx_lm.lora --model mlx-community/Llama-3.3-70B-Instruct-4bit --data mlx-community/wikisql --iters 100 --batch-size 1 --num-layers 8 --train

And then evaluating it like this:

mlx_lm.generate --model mlx-community/Llama-3.3-70B-Instruct-4bit --adapter-path adapters --max-tokens 50 \
               --prompt "table: 1-10015132-16
columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team
Q: What is terrence ross' nationality
A: "

And it generated the following which is very reasonable:

Prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

table: 1-10015132-16
columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team
Q: What is terrence ross' nationality
A:<|eot_id|><|start_header_id|>assistant<|end_header_id|>


SELECT Nationality FROM 1-10015132-16 WHERE Player = 'Terrence Ross'

So I'm not sure where things are going wrong for you. A few suggestions:

Reproduce the above. If it works then either there is a problem in your data of your config
Run the above setup but using your own data. If it gives you gibberish, probably there is a problem with your data
Run the above dataset but with your custom config. If it gives you gibberish probably there is an issue with the config (which could be a bug elsewhere but at least it will help narrow things down)..

3 replies

corozcop1980 Dec 11, 2024
Author

I tried fine-tuning your example, and it worked perfectly. I then tried your config with my data, and it worked, too! So it was my configuration. Thanks!

awni Dec 11, 2024
Maintainer

I'm pretty curious why your config doesn't work.. nothing in it jumps out to me as obviously wrong..

corozcop1980 Dec 11, 2024
Author

Well, I’m still learning, so I basically followed Claude’s instructions like a monkey pressing buttons. After some tries, it suggested setting the scale to 1.0. Now I see with the default with your example, it's set to 10 in adapter_config.json, and even with that, doing inference, it still wasn't working. So, I started raising the scale until 13, and it started giving me better responses, so I left it at 13.5.

With my previous attempts (that didn't work), I tried this:
scale=1.0: Model outputs exact input text with no style adaptation
scale=1.5: Model still outputs exact input text
scale=2.0: Outputs "?"
scale=2.5: Outputs "?"
scale=4.0: Model starts producing gibberish/curse words and shows numerical instability.

so, In my endless sea of ignorance, I’m guessing the scale was the problem. I also, in my previous attempt, targeted specific layers with the rank: 32 and alpha: 8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-3.3-70B-Instruct-4bit LoRA Fine-Tuning: No Change (or Instability) - Adapter Issue? #1147

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Llama-3.3-70B-Instruct-4bit LoRA Fine-Tuning: No Change (or Instability) - Adapter Issue? #1147

corozcop1980 Dec 10, 2024

Replies: 1 comment · 3 replies

awni Dec 10, 2024 Maintainer

corozcop1980 Dec 11, 2024 Author

awni Dec 11, 2024 Maintainer

corozcop1980 Dec 11, 2024 Author

corozcop1980
Dec 10, 2024

Replies: 1 comment 3 replies

awni
Dec 10, 2024
Maintainer

corozcop1980 Dec 11, 2024
Author

awni Dec 11, 2024
Maintainer

corozcop1980 Dec 11, 2024
Author