Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual model merges #555

Open
dnhkng opened this issue Jul 18, 2024 · 3 comments
Open

Manual model merges #555

dnhkng opened this issue Jul 18, 2024 · 3 comments

Comments

@dnhkng
Copy link

dnhkng commented Jul 18, 2024

Hi Turbo,

I am interested in doing some model self-merges. Currently, I do this with a script with huggingface models.

Basically, I calculate the mapping, eg to duplicate layer 3:
{1:1, 2:2,3:3,4:3,5:4}

Then I go through the safetensor files, and duplicate the tensors based on these layer numbers, and generate new keys with the right layer name (eg model.layer.3.up.mlp -> model.layer.6.up.mlp). Finally, I update the model config json with the new number of layers. This works for transformers models, but not for exl2 models. What else would I need to do?

@turboderp
Copy link
Owner

This should also work for EXL2 models, assuming you duplicate/rename all the sub-keys for each layer as well. The only difference is that the .weight tensors are split into .q_weight, .q_perm, .q_scale, .q_scale_max, .q_groups and .q_group_map.

Changes to the config.json should be the same as for a HF model.

Do note that the quantization of each layer is calibrated to the expected output of the previous layer, not to a copy of the same layer, so it's hard to predict how well this works if you're not starting from the original model and quantizing afterwards. But then I guess merges and self-merges were never really predictable to begin with.

@dnhkng
Copy link
Author

dnhkng commented Jul 18, 2024

I used dynamic relayering and it worked well, but duplicating the model layers in the safetensors didn't work. By dynamic, I mean I load the weights into memory, and then copy.copy the weights, and finally rebuild the cache.

In fact, my best results are from dynamic exl2 experiments. I can't get the same great results even with the original BFloat16 weights!

@dnhkng
Copy link
Author

dnhkng commented Sep 19, 2024

@turboderp
Late update, my models now lead the HuggingFace OpenLLM Leaderboard, under the name RYS.

I have some questions on caching, do you have time for an online chat via Gmeet or Zoom?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants