Manual model merges #555

dnhkng · 2024-07-18T07:51:52Z

Hi Turbo,

I am interested in doing some model self-merges. Currently, I do this with a script with huggingface models.

Basically, I calculate the mapping, eg to duplicate layer 3:
{1:1, 2:2,3:3,4:3,5:4}

Then I go through the safetensor files, and duplicate the tensors based on these layer numbers, and generate new keys with the right layer name (eg model.layer.3.up.mlp -> model.layer.6.up.mlp). Finally, I update the model config json with the new number of layers. This works for transformers models, but not for exl2 models. What else would I need to do?

turboderp · 2024-07-18T10:40:38Z

This should also work for EXL2 models, assuming you duplicate/rename all the sub-keys for each layer as well. The only difference is that the .weight tensors are split into .q_weight, .q_perm, .q_scale, .q_scale_max, .q_groups and .q_group_map.

Changes to the config.json should be the same as for a HF model.

Do note that the quantization of each layer is calibrated to the expected output of the previous layer, not to a copy of the same layer, so it's hard to predict how well this works if you're not starting from the original model and quantizing afterwards. But then I guess merges and self-merges were never really predictable to begin with.

dnhkng · 2024-07-18T11:08:18Z

I used dynamic relayering and it worked well, but duplicating the model layers in the safetensors didn't work. By dynamic, I mean I load the weights into memory, and then copy.copy the weights, and finally rebuild the cache.

In fact, my best results are from dynamic exl2 experiments. I can't get the same great results even with the original BFloat16 weights!

dnhkng · 2024-09-19T06:19:05Z

@turboderp
Late update, my models now lead the HuggingFace OpenLLM Leaderboard, under the name RYS.

I have some questions on caching, do you have time for an online chat via Gmeet or Zoom?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manual model merges #555

Manual model merges #555

dnhkng commented Jul 18, 2024 •

edited

Loading

turboderp commented Jul 18, 2024

dnhkng commented Jul 18, 2024

dnhkng commented Sep 19, 2024

Manual model merges #555

Manual model merges #555

Comments

dnhkng commented Jul 18, 2024 • edited Loading

turboderp commented Jul 18, 2024

dnhkng commented Jul 18, 2024

dnhkng commented Sep 19, 2024

dnhkng commented Jul 18, 2024 •

edited

Loading