Some weights of the model checkpoint at `finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM

I get the following error after finetuning this model on the R dataset following the example in the README.

```console
Some weights of the model checkpoint at finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM: ['model.layers.0.self_attn.k_proj.base_layer.bias', 'model.layers.0.self_attn.k_proj.base_layer.weight', 'model.layers.0.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.k_proj.lora_A.default.weight', 'model.layers.0.self_attn.k_proj.lora_B.default.weight', 'model.layers.0.self_attn.o_proj.base_layer.bias', 'model.layers.0.self_attn.o_proj.base_layer.weight', 'model.layers.0.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.o_proj.lora_A.default.weight', 'model.layers.0.self_attn.o_proj.lora_B.default.weight', 'model.layers.0.self_attn.q_proj.base_layer.bias', 'model.layers.0.self_attn.q_proj.base_layer.weight', 'model.layers.0.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.q_proj.lora_A.default.weight', 'model.layers.0.self_attn.q_proj.lora_B.default.weight', 'model.layers.0.self_attn.v_proj.base_layer.bias', 'model.layers.0.self_attn.v_proj.base_layer.weight', 'model.layers.0.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.v_proj.lora_A.default.weight', 'model.layers.0.self_attn.v_proj.lora_B.default.weight', 'model.layers.1.self_attn.k_proj.base_layer.bias', 'model.layers.1.self_attn.k_proj.base_layer.weight', 'model.layers.1.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.k_proj.lora_A.default.weight', 'model.layers.1.self_attn.k_proj.lora_B.default.weight', 'model.layers.1.self_attn.o_proj.base_layer.bias', 'model.layers.1.self_attn.o_proj.base_layer.weight', 'model.layers.1.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.o_proj.lora_A.default.weight', 'model.layers.1.self_attn.o_proj.lora_B.default.weight', 'model.layers.1.self_attn.q_proj.base_layer.bias', 'model.layers.1.self_attn.q_proj.base_layer.weight', 'model.layers.1.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.q_proj.lora_A.default.weight', 'model.layers.1.self_attn.q_proj.lora_B.default.weight', 'model.layers.1.self_attn.v_proj.base_layer.bias', 'model.layers.1.self_attn.v_proj.base_layer.weight', 'model.layers.1.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.v_proj.lora_A.default.weight', 'model.layers.1.self_attn.v_proj.lora_B.default.weight', 'model.layers.10.self_attn.k_proj.base_layer.bias', 'model.layers.10.self_attn.k_proj.base_layer.weight', 'model.layers.10.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.k_proj.lora_A.default.weight', 'model.layers.10.self_attn.k_proj.lora_B.default.weight', 'model.layers.10.self_attn.o_proj.base_layer.bias', 'model.layers.10.self_attn.o_proj.base_layer.weight', 'model.layers.10.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.o_proj.lora_A.default.weight', 'model.layers.10.self_attn.o_proj.lora_B.default.weight', 'model.layers.10.self_attn.q_proj.base_layer.bias', 'model.layers.10.self_attn.q_proj.base_layer.weight', 'model.layers.10.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.q_proj.lora_A.default.weight', 'model.layers.10.self_attn.q_proj.lora_B.default.weight', 'model.layers.10.self_attn.v_proj.base_layer.bias', 'model.layers.10.self_attn.v_proj.base_layer.weight', 'model.layers.10.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.v_proj.lora_A.default.weight', 'model.layers.10.self_attn.v_proj.lora_B.default.weight', 'model.layers.11.self_attn.k_proj.base_layer.bias', 'model.layers.11.self_attn.k_proj.base_layer.weight', 'model.layers.11.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.k_proj.lora_A.default.weight', 'model.layers.11.self_attn.k_proj.lora_B.default.weight', 'model.layers.11.self_attn.o_proj.base_layer.bias', 'model.layers.11.self_attn.o_proj.base_layer.weight', 'model.layers.11.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.o_proj.lora_A.default.weight', 'model.layers.11.self_attn.o_proj.lora_B.default.weight', 'model.layers.11.self_attn.q_proj.base_layer.bias', 'model.layers.11.self_attn.q_proj.base_layer.weight', 'model.layers.11.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.q_proj.lora_A.default.weight', 'model.layers.11.self_attn.q_proj.lora_B.default.weight', 'model.layers.11.self_attn.v_proj.base_layer.bias', 'model.layers.11.self_attn.v_proj.base_layer.weight', 'model.layers.11.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.v_proj.lora_A.default.weight', 'model.layers.11.self_attn.v_proj.lora_B.default.weight', 'model.layers.12.self_attn.k_proj.base_layer.bias', 'model.layers.12.self_attn.k_proj.base_layer.weight', 'model.layers.12.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.k_proj.lora_A.default.weight', 'model.layers.12.self_attn.k_proj.lora_B.default.weight', 'model.layers.12.self_attn.o_proj.base_layer.bias', 'model.layers.12.self_attn.o_proj.base_layer.weight', 'model.layers.12.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.o_proj.lora_A.default.weight', 'model.layers.12.self_attn.o_proj.lora_B.default.weight', 'model.layers.12.self_attn.q_proj.base_layer.bias', 'model.layers.12.self_attn.q_proj.base_layer.weight', 'model.layers.12.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.q_proj.lora_A.default.weight', 'model.layers.12.self_attn.q_proj.lora_B.default.weight', 'model.layers.12.self_attn.v_proj.base_layer.bias', 'model.layers.12.self_attn.v_proj.base_layer.weight', 'model.layers.12.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.v_proj.lora_A.default.weight', 'model.layers.12.self_attn.v_proj.lora_B.default.weight', 'model.layers.13.self_attn.k_proj.base_layer.bias', 'model.layers.13.self_attn.k_proj.base_layer.weight', 'model.layers.13.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.k_proj.lora_A.default.weight', 'model.layers.13.self_attn.k_proj.lora_B.default.weight', 'model.layers.13.self_attn.o_proj.base_layer.bias', 'model.layers.13.self_attn.o_proj.base_layer.weight', 'model.layers.13.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.o_proj.lora_A.default.weight', 'model.layers.13.self_attn.o_proj.lora_B.default.weight', 'model.layers.13.self_attn.q_proj.base_layer.bias', 'model.layers.13.self_attn.q_proj.base_layer.weight', 'model.layers.13.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.q_proj.lora_A.default.weight', 'model.layers.13.self_attn.q_proj.lora_B.default.weight', 'model.layers.13.self_attn.v_proj.base_layer.bias', 'model.layers.13.self_attn.v_proj.base_layer.weight', 'model.layers.13.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.v_proj.lora_A.default.weight', 'model.layers.13.self_attn.v_proj.lora_B.default.weight', 'model.layers.14.self_attn.k_proj.base_layer.bias', 'model.layers.14.self_attn.k_proj.base_layer.weight', 'model.layers.14.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.k_proj.lora_A.default.weight', 'model.layers.14.self_attn.k_proj.lora_B.default.weight', 'model.layers.14.self_attn.o_proj.base_layer.bias', 'model.layers.14.self_attn.o_proj.base_layer.weight', 'model.layers.14.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.o_proj.lora_A.default.weight', 'model.layers.14.self_attn.o_proj.lora_B.default.weight', 'model.layers.14.self_attn.q_proj.base_layer.bias', 'model.layers.14.self_attn.q_proj.base_layer.weight', 'model.layers.14.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.q_proj.lora_A.default.weight', 'model.layers.14.self_attn.q_proj.lora_B.default.weight', 'model.layers.14.self_attn.v_proj.base_layer.bias', 'model.layers.14.self_attn.v_proj.base_layer.weight', 'model.layers.14.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.v_proj.lora_A.default.weight', 'model.layers.14.self_attn.v_proj.lora_B.default.weight', 'model.layers.15.self_attn.k_proj.base_layer.bias', 'model.layers.15.self_attn.k_proj.base_layer.weight', 'model.layers.15.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.k_proj.lora_A.default.weight', 'model.layers.15.self_attn.k_proj.lora_B.default.weight', 'model.layers.15.self_attn.o_proj.base_layer.bias', 'model.layers.15.self_attn.o_proj.base_layer.weight', 'model.layers.15.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.o_proj.lora_A.default.weight', 'model.layers.15.self_attn.o_proj.lora_B.default.weight', 'model.layers.15.self_attn.q_proj.base_layer.bias', 'model.layers.15.self_attn.q_proj.base_layer.weight', 'model.layers.15.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.q_proj.lora_A.default.weight', 'model.layers.15.self_attn.q_proj.lora_B.default.weight', 'model.layers.15.self_attn.v_proj.base_layer.bias', 'model.layers.15.self_attn.v_proj.base_layer.weight', 'model.layers.15.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.v_proj.lora_A.default.weight', 'model.layers.15.self_attn.v_proj.lora_B.default.weight', 'model.layers.16.self_attn.k_proj.base_layer.bias', 'model.layers.16.self_attn.k_proj.base_layer.weight', 'model.layers.16.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.k_proj.lora_A.default.weight', 'model.layers.16.self_attn.k_proj.lora_B.default.weight', 'model.layers.16.self_attn.o_proj.base_layer.bias', 'model.layers.16.self_attn.o_proj.base_layer.weight', 'model.layers.16.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.o_proj.lora_A.default.weight', 'model.layers.16.self_attn.o_proj.lora_B.default.weight', 'model.layers.16.self_attn.q_proj.base_layer.bias', 'model.layers.16.self_attn.q_proj.base_layer.weight', 'model.layers.16.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.q_proj.lora_A.default.weight', 'model.layers.16.self_attn.q_proj.lora_B.default.weight', 'model.layers.16.self_attn.v_proj.base_layer.bias', 'model.layers.16.self_attn.v_proj.base_layer.weight', 'model.layers.16.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.v_proj.lora_A.default.weight', 'model.layers.16.self_attn.v_proj.lora_B.default.weight', 'model.layers.17.self_attn.k_proj.base_layer.bias', 'model.layers.17.self_attn.k_proj.base_layer.weight', 'model.layers.17.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.k_proj.lora_A.default.weight', 'model.layers.17.self_attn.k_proj.lora_B.default.weight', 'model.layers.17.self_attn.o_proj.base_layer.bias', 'model.layers.17.self_attn.o_proj.base_layer.weight', 'model.layers.17.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.o_proj.lora_A.default.weight', 'model.layers.17.self_attn.o_proj.lora_B.default.weight', 'model.layers.17.self_attn.q_proj.base_layer.bias', 'model.layers.17.self_attn.q_proj.base_layer.weight', 'model.layers.17.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.q_proj.lora_A.default.weight', 'model.layers.17.self_attn.q_proj.lora_B.default.weight', 'model.layers.17.self_attn.v_proj.base_layer.bias', 'model.layers.17.self_attn.v_proj.base_layer.weight', 'model.layers.17.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.v_proj.lora_A.default.weight', 'model.layers.17.self_attn.v_proj.lora_B.default.weight', 'model.layers.18.self_attn.k_proj.base_layer.bias', 'model.layers.18.self_attn.k_proj.base_layer.weight', 'model.layers.18.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.k_proj.lora_A.default.weight', 'model.layers.18.self_attn.k_proj.lora_B.default.weight', 'model.layers.18.self_attn.o_proj.base_layer.bias', 'model.layers.18.self_attn.o_proj.base_layer.weight', 'model.layers.18.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.o_proj.lora_A.default.weight', 'model.layers.18.self_attn.o_proj.lora_B.default.weight', 'model.layers.18.self_attn.q_proj.base_layer.bias', 'model.layers.18.self_attn.q_proj.base_layer.weight', 'model.layers.18.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.q_proj.lora_A.default.weight', 'model.layers.18.self_attn.q_proj.lora_B.default.weight', 'model.layers.18.self_attn.v_proj.base_layer.bias', 'model.layers.18.self_attn.v_proj.base_layer.weight', 'model.layers.18.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.v_proj.lora_A.default.weight', 'model.layers.18.self_attn.v_proj.lora_B.default.weight', 'model.layers.19.self_attn.k_proj.base_layer.bias', 'model.layers.19.self_attn.k_proj.base_layer.weight', 'model.layers.19.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.k_proj.lora_A.default.weight', 'model.layers.19.self_attn.k_proj.lora_B.default.weight', 'model.layers.19.self_attn.o_proj.base_layer.bias', 'model.layers.19.self_attn.o_proj.base_layer.weight', 'model.layers.19.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.o_proj.lora_A.default.weight', 'model.layers.19.self_attn.o_proj.lora_B.default.weight', 'model.layers.19.self_attn.q_proj.base_layer.bias', 'model.layers.19.self_attn.q_proj.base_layer.weight', 'model.layers.19.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.q_proj.lora_A.default.weight', 'model.layers.19.self_attn.q_proj.lora_B.default.weight', 'model.layers.19.self_attn.v_proj.base_layer.bias', 'model.layers.19.self_attn.v_proj.base_layer.weight', 'model.layers.19.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.v_proj.lora_A.default.weight', 'model.layers.19.self_attn.v_proj.lora_B.default.weight', 'model.layers.2.self_attn.k_proj.base_layer.bias', 'model.layers.2.self_attn.k_proj.base_layer.weight', 'model.layers.2.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.k_proj.lora_A.default.weight', 'model.layers.2.self_attn.k_proj.lora_B.default.weight', 'model.layers.2.self_attn.o_proj.base_layer.bias', 'model.layers.2.self_attn.o_proj.base_layer.weight', 'model.layers.2.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.o_proj.lora_A.default.weight', 'model.layers.2.self_attn.o_proj.lora_B.default.weight', 'model.layers.2.self_attn.q_proj.base_layer.bias', 'model.layers.2.self_attn.q_proj.base_layer.weight', 'model.layers.2.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.q_proj.lora_A.default.weight', 'model.layers.2.self_attn.q_proj.lora_B.default.weight', 'model.layers.2.self_attn.v_proj.base_layer.bias', 'model.layers.2.self_attn.v_proj.base_layer.weight', 'model.layers.2.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.v_proj.lora_A.default.weight', 'model.layers.2.self_attn.v_proj.lora_B.default.weight', 'model.layers.20.self_attn.k_proj.base_layer.bias', 'model.layers.20.self_attn.k_proj.base_layer.weight', 'model.layers.20.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.k_proj.lora_A.default.weight', 'model.layers.20.self_attn.k_proj.lora_B.default.weight', 'model.layers.20.self_attn.o_proj.base_layer.bias', 'model.layers.20.self_attn.o_proj.base_layer.weight', 'model.layers.20.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.o_proj.lora_A.default.weight', 'model.layers.20.self_attn.o_proj.lora_B.default.weight', 'model.layers.20.self_attn.q_proj.base_layer.bias', 'model.layers.20.self_attn.q_proj.base_layer.weight', 'model.layers.20.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.q_proj.lora_A.default.weight', 'model.layers.20.self_attn.q_proj.lora_B.default.weight', 'model.layers.20.self_attn.v_proj.base_layer.bias', 'model.layers.20.self_attn.v_proj.base_layer.weight', 'model.layers.20.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.v_proj.lora_A.default.weight', 'model.layers.20.self_attn.v_proj.lora_B.default.weight', 'model.layers.21.self_attn.k_proj.base_layer.bias', 'model.layers.21.self_attn.k_proj.base_layer.weight', 'model.layers.21.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.k_proj.lora_A.default.weight', 'model.layers.21.self_attn.k_proj.lora_B.default.weight', 'model.layers.21.self_attn.o_proj.base_layer.bias', 'model.layers.21.self_attn.o_proj.base_layer.weight', 'model.layers.21.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.o_proj.lora_A.default.weight', 'model.layers.21.self_attn.o_proj.lora_B.default.weight', 'model.layers.21.self_attn.q_proj.base_layer.bias', 'model.layers.21.self_attn.q_proj.base_layer.weight', 'model.layers.21.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.q_proj.lora_A.default.weight', 'model.layers.21.self_attn.q_proj.lora_B.default.weight', 'model.layers.21.self_attn.v_proj.base_layer.bias', 'model.layers.21.self_attn.v_proj.base_layer.weight', 'model.layers.21.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.v_proj.lora_A.default.weight', 'model.layers.21.self_attn.v_proj.lora_B.default.weight', 'model.layers.22.self_attn.k_proj.base_layer.bias', 'model.layers.22.self_attn.k_proj.base_layer.weight', 'model.layers.22.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.k_proj.lora_A.default.weight', 'model.layers.22.self_attn.k_proj.lora_B.default.weight', 'model.layers.22.self_attn.o_proj.base_layer.bias', 'model.layers.22.self_attn.o_proj.base_layer.weight', 'model.layers.22.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.o_proj.lora_A.default.weight', 'model.layers.22.self_attn.o_proj.lora_B.default.weight', 'model.layers.22.self_attn.q_proj.base_layer.bias', 'model.layers.22.self_attn.q_proj.base_layer.weight', 'model.layers.22.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.q_proj.lora_A.default.weight', 'model.layers.22.self_attn.q_proj.lora_B.default.weight', 'model.layers.22.self_attn.v_proj.base_layer.bias', 'model.layers.22.self_attn.v_proj.base_layer.weight', 'model.layers.22.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.v_proj.lora_A.default.weight', 'model.layers.22.self_attn.v_proj.lora_B.default.weight', 'model.layers.23.self_attn.k_proj.base_layer.bias', 'model.layers.23.self_attn.k_proj.base_layer.weight', 'model.layers.23.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.k_proj.lora_A.default.weight', 'model.layers.23.self_attn.k_proj.lora_B.default.weight', 'model.layers.23.self_attn.o_proj.base_layer.bias', 'model.layers.23.self_attn.o_proj.base_layer.weight', 'model.layers.23.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.o_proj.lora_A.default.weight', 'model.layers.23.self_attn.o_proj.lora_B.default.weight', 'model.layers.23.self_attn.q_proj.base_layer.bias', 'model.layers.23.self_attn.q_proj.base_layer.weight', 'model.layers.23.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.q_proj.lora_A.default.weight', 'model.layers.23.self_attn.q_proj.lora_B.default.weight', 'model.layers.23.self_attn.v_proj.base_layer.bias', 'model.layers.23.self_attn.v_proj.base_layer.weight', 'model.layers.23.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.v_proj.lora_A.default.weight', 'model.layers.23.self_attn.v_proj.lora_B.default.weight', 'model.layers.24.self_attn.k_proj.base_layer.bias', 'model.layers.24.self_attn.k_proj.base_layer.weight', 'model.layers.24.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.k_proj.lora_A.default.weight', 'model.layers.24.self_attn.k_proj.lora_B.default.weight', 'model.layers.24.self_attn.o_proj.base_layer.bias', 'model.layers.24.self_attn.o_proj.base_layer.weight', 'model.layers.24.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.o_proj.lora_A.default.weight', 'model.layers.24.self_attn.o_proj.lora_B.default.weight', 'model.layers.24.self_attn.q_proj.base_layer.bias', 'model.layers.24.self_attn.q_proj.base_layer.weight', 'model.layers.24.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.q_proj.lora_A.default.weight', 'model.layers.24.self_attn.q_proj.lora_B.default.weight', 'model.layers.24.self_attn.v_proj.base_layer.bias', 'model.layers.24.self_attn.v_proj.base_layer.weight', 'model.layers.24.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.v_proj.lora_A.default.weight', 'model.layers.24.self_attn.v_proj.lora_B.default.weight', 'model.layers.25.self_attn.k_proj.base_layer.bias', 'model.layers.25.self_attn.k_proj.base_layer.weight', 'model.layers.25.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.k_proj.lora_A.default.weight', 'model.layers.25.self_attn.k_proj.lora_B.default.weight', 'model.layers.25.self_attn.o_proj.base_layer.bias', 'model.layers.25.self_attn.o_proj.base_layer.weight', 'model.layers.25.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.o_proj.lora_A.default.weight', 'model.layers.25.self_attn.o_proj.lora_B.default.weight', 'model.layers.25.self_attn.q_proj.base_layer.bias', 'model.layers.25.self_attn.q_proj.base_layer.weight', 'model.layers.25.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.q_proj.lora_A.default.weight', 'model.layers.25.self_attn.q_proj.lora_B.default.weight', 'model.layers.25.self_attn.v_proj.base_layer.bias', 'model.layers.25.self_attn.v_proj.base_layer.weight', 'model.layers.25.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.v_proj.lora_A.default.weight', 'model.layers.25.self_attn.v_proj.lora_B.default.weight', 'model.layers.26.self_attn.k_proj.base_layer.bias', 'model.layers.26.self_attn.k_proj.base_layer.weight', 'model.layers.26.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.k_proj.lora_A.default.weight', 'model.layers.26.self_attn.k_proj.lora_B.default.weight', 'model.layers.26.self_attn.o_proj.base_layer.bias', 'model.layers.26.self_attn.o_proj.base_layer.weight', 'model.layers.26.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.o_proj.lora_A.default.weight', 'model.layers.26.self_attn.o_proj.lora_B.default.weight', 'model.layers.26.self_attn.q_proj.base_layer.bias', 'model.layers.26.self_attn.q_proj.base_layer.weight', 'model.layers.26.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.q_proj.lora_A.default.weight', 'model.layers.26.self_attn.q_proj.lora_B.default.weight', 'model.layers.26.self_attn.v_proj.base_layer.bias', 'model.layers.26.self_attn.v_proj.base_layer.weight', 'model.layers.26.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.v_proj.lora_A.default.weight', 'model.layers.26.self_attn.v_proj.lora_B.default.weight', 'model.layers.27.self_attn.k_proj.base_layer.bias', 'model.layers.27.self_attn.k_proj.base_layer.weight', 'model.layers.27.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.k_proj.lora_A.default.weight', 'model.layers.27.self_attn.k_proj.lora_B.default.weight', 'model.layers.27.self_attn.o_proj.base_layer.bias', 'model.layers.27.self_attn.o_proj.base_layer.weight', 'model.layers.27.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.o_proj.lora_A.default.weight', 'model.layers.27.self_attn.o_proj.lora_B.default.weight', 'model.layers.27.self_attn.q_proj.base_layer.bias', 'model.layers.27.self_attn.q_proj.base_layer.weight', 'model.layers.27.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.q_proj.lora_A.default.weight', 'model.layers.27.self_attn.q_proj.lora_B.default.weight', 'model.layers.27.self_attn.v_proj.base_layer.bias', 'model.layers.27.self_attn.v_proj.base_layer.weight', 'model.layers.27.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.v_proj.lora_A.default.weight', 'model.layers.27.self_attn.v_proj.lora_B.default.weight', 'model.layers.28.self_attn.k_proj.base_layer.bias', 'model.layers.28.self_attn.k_proj.base_layer.weight', 'model.layers.28.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.k_proj.lora_A.default.weight', 'model.layers.28.self_attn.k_proj.lora_B.default.weight', 'model.layers.28.self_attn.o_proj.base_layer.bias', 'model.layers.28.self_attn.o_proj.base_layer.weight', 'model.layers.28.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.o_proj.lora_A.default.weight', 'model.layers.28.self_attn.o_proj.lora_B.default.weight', 'model.layers.28.self_attn.q_proj.base_layer.bias', 'model.layers.28.self_attn.q_proj.base_layer.weight', 'model.layers.28.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.q_proj.lora_A.default.weight', 'model.layers.28.self_attn.q_proj.lora_B.default.weight', 'model.layers.28.self_attn.v_proj.base_layer.bias', 'model.layers.28.self_attn.v_proj.base_layer.weight', 'model.layers.28.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.v_proj.lora_A.default.weight', 'model.layers.28.self_attn.v_proj.lora_B.default.weight', 'model.layers.29.self_attn.k_proj.base_layer.bias', 'model.layers.29.self_attn.k_proj.base_layer.weight', 'model.layers.29.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.k_proj.lora_A.default.weight', 'model.layers.29.self_attn.k_proj.lora_B.default.weight', 'model.layers.29.self_attn.o_proj.base_layer.bias', 'model.layers.29.self_attn.o_proj.base_layer.weight', 'model.layers.29.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.o_proj.lora_A.default.weight', 'model.layers.29.self_attn.o_proj.lora_B.default.weight', 'model.layers.29.self_attn.q_proj.base_layer.bias', 'model.layers.29.self_attn.q_proj.base_layer.weight', 'model.layers.29.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.q_proj.lora_A.default.weight', 'model.layers.29.self_attn.q_proj.lora_B.default.weight', 'model.layers.29.self_attn.v_proj.base_layer.bias', 'model.layers.29.self_attn.v_proj.base_layer.weight', 'model.layers.29.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.v_proj.lora_A.default.weight', 'model.layers.29.self_attn.v_proj.lora_B.default.weight', 'model.layers.3.self_attn.k_proj.base_layer.bias', 'model.layers.3.self_attn.k_proj.base_layer.weight', 'model.layers.3.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.k_proj.lora_A.default.weight', 'model.layers.3.self_attn.k_proj.lora_B.default.weight', 'model.layers.3.self_attn.o_proj.base_layer.bias', 'model.layers.3.self_attn.o_proj.base_layer.weight', 'model.layers.3.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.o_proj.lora_A.default.weight', 'model.layers.3.self_attn.o_proj.lora_B.default.weight', 'model.layers.3.self_attn.q_proj.base_layer.bias', 'model.layers.3.self_attn.q_proj.base_layer.weight', 'model.layers.3.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.q_proj.lora_A.default.weight', 'model.layers.3.self_attn.q_proj.lora_B.default.weight', 'model.layers.3.self_attn.v_proj.base_layer.bias', 'model.layers.3.self_attn.v_proj.base_layer.weight', 'model.layers.3.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.v_proj.lora_A.default.weight', 'model.layers.3.self_attn.v_proj.lora_B.default.weight', 'model.layers.4.self_attn.k_proj.base_layer.bias', 'model.layers.4.self_attn.k_proj.base_layer.weight', 'model.layers.4.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.k_proj.lora_A.default.weight', 'model.layers.4.self_attn.k_proj.lora_B.default.weight', 'model.layers.4.self_attn.o_proj.base_layer.bias', 'model.layers.4.self_attn.o_proj.base_layer.weight', 'model.layers.4.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.o_proj.lora_A.default.weight', 'model.layers.4.self_attn.o_proj.lora_B.default.weight', 'model.layers.4.self_attn.q_proj.base_layer.bias', 'model.layers.4.self_attn.q_proj.base_layer.weight', 'model.layers.4.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.q_proj.lora_A.default.weight', 'model.layers.4.self_attn.q_proj.lora_B.default.weight', 'model.layers.4.self_attn.v_proj.base_layer.bias', 'model.layers.4.self_attn.v_proj.base_layer.weight', 'model.layers.4.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.v_proj.lora_A.default.weight', 'model.layers.4.self_attn.v_proj.lora_B.default.weight', 'model.layers.5.self_attn.k_proj.base_layer.bias', 'model.layers.5.self_attn.k_proj.base_layer.weight', 'model.layers.5.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.k_proj.lora_A.default.weight', 'model.layers.5.self_attn.k_proj.lora_B.default.weight', 'model.layers.5.self_attn.o_proj.base_layer.bias', 'model.layers.5.self_attn.o_proj.base_layer.weight', 'model.layers.5.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.o_proj.lora_A.default.weight', 'model.layers.5.self_attn.o_proj.lora_B.default.weight', 'model.layers.5.self_attn.q_proj.base_layer.bias', 'model.layers.5.self_attn.q_proj.base_layer.weight', 'model.layers.5.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.q_proj.lora_A.default.weight', 'model.layers.5.self_attn.q_proj.lora_B.default.weight', 'model.layers.5.self_attn.v_proj.base_layer.bias', 'model.layers.5.self_attn.v_proj.base_layer.weight', 'model.layers.5.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.v_proj.lora_A.default.weight', 'model.layers.5.self_attn.v_proj.lora_B.default.weight', 'model.layers.6.self_attn.k_proj.base_layer.bias', 'model.layers.6.self_attn.k_proj.base_layer.weight', 'model.layers.6.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.k_proj.lora_A.default.weight', 'model.layers.6.self_attn.k_proj.lora_B.default.weight', 'model.layers.6.self_attn.o_proj.base_layer.bias', 'model.layers.6.self_attn.o_proj.base_layer.weight', 'model.layers.6.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.o_proj.lora_A.default.weight', 'model.layers.6.self_attn.o_proj.lora_B.default.weight', 'model.layers.6.self_attn.q_proj.base_layer.bias', 'model.layers.6.self_attn.q_proj.base_layer.weight', 'model.layers.6.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.q_proj.lora_A.default.weight', 'model.layers.6.self_attn.q_proj.lora_B.default.weight', 'model.layers.6.self_attn.v_proj.base_layer.bias', 'model.layers.6.self_attn.v_proj.base_layer.weight', 'model.layers.6.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.v_proj.lora_A.default.weight', 'model.layers.6.self_attn.v_proj.lora_B.default.weight', 'model.layers.7.self_attn.k_proj.base_layer.bias', 'model.layers.7.self_attn.k_proj.base_layer.weight', 'model.layers.7.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.k_proj.lora_A.default.weight', 'model.layers.7.self_attn.k_proj.lora_B.default.weight', 'model.layers.7.self_attn.o_proj.base_layer.bias', 'model.layers.7.self_attn.o_proj.base_layer.weight', 'model.layers.7.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.o_proj.lora_A.default.weight', 'model.layers.7.self_attn.o_proj.lora_B.default.weight', 'model.layers.7.self_attn.q_proj.base_layer.bias', 'model.layers.7.self_attn.q_proj.base_layer.weight', 'model.layers.7.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.q_proj.lora_A.default.weight', 'model.layers.7.self_attn.q_proj.lora_B.default.weight', 'model.layers.7.self_attn.v_proj.base_layer.bias', 'model.layers.7.self_attn.v_proj.base_layer.weight', 'model.layers.7.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.v_proj.lora_A.default.weight', 'model.layers.7.self_attn.v_proj.lora_B.default.weight', 'model.layers.8.self_attn.k_proj.base_layer.bias', 'model.layers.8.self_attn.k_proj.base_layer.weight', 'model.layers.8.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.k_proj.lora_A.default.weight', 'model.layers.8.self_attn.k_proj.lora_B.default.weight', 'model.layers.8.self_attn.o_proj.base_layer.bias', 'model.layers.8.self_attn.o_proj.base_layer.weight', 'model.layers.8.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.o_proj.lora_A.default.weight', 'model.layers.8.self_attn.o_proj.lora_B.default.weight', 'model.layers.8.self_attn.q_proj.base_layer.bias', 'model.layers.8.self_attn.q_proj.base_layer.weight', 'model.layers.8.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.q_proj.lora_A.default.weight', 'model.layers.8.self_attn.q_proj.lora_B.default.weight', 'model.layers.8.self_attn.v_proj.base_layer.bias', 'model.layers.8.self_attn.v_proj.base_layer.weight', 'model.layers.8.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.v_proj.lora_A.default.weight', 'model.layers.8.self_attn.v_proj.lora_B.default.weight', 'model.layers.9.self_attn.k_proj.base_layer.bias', 'model.layers.9.self_attn.k_proj.base_layer.weight', 'model.layers.9.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.k_proj.lora_A.default.weight', 'model.layers.9.self_attn.k_proj.lora_B.default.weight', 'model.layers.9.self_attn.o_proj.base_layer.bias', 'model.layers.9.self_attn.o_proj.base_layer.weight', 'model.layers.9.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.o_proj.lora_A.default.weight', 'model.layers.9.self_attn.o_proj.lora_B.default.weight', 'model.layers.9.self_attn.q_proj.base_layer.bias', 'model.layers.9.self_attn.q_proj.base_layer.weight', 'model.layers.9.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.q_proj.lora_A.default.weight', 'model.layers.9.self_attn.q_proj.lora_B.default.weight', 'model.layers.9.self_attn.v_proj.base_layer.bias', 'model.layers.9.self_attn.v_proj.base_layer.weight', 'model.layers.9.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.v_proj.lora_A.default.weight', 'model.layers.9.self_attn.v_proj.lora_B.default.weight']
- This IS expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Starcoder2ForCausalLM were not initialized from the model checkpoint at finetune_starcoder2/final_checkpoint and are newly initialized: ['model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.o_proj.bias', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.9.self_attn.v_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
```

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig()

checkpoint = "bigcode/starcoder2-3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained("finetune_starcoder2/final_checkpoint", quantization_config=quantization_config)

inputs = tokenizer.encode("hello_world_function <- function() {", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```

Also, I don't think doing 4-bit quantization as a default for finetuning is a good idea. It should be opt-in with a flag.

I am also wondering why do we use the Stack v1 for finetuning and not the Stack v2?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some weights of the model checkpoint at `finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some weights of the model checkpoint at `finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions