-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Phi 3.5 MoE #116
Add Phi 3.5 MoE #116
Conversation
I will give it a try on Tuesday! |
It looks like there is a problem loading the weights:
this is on
And actually the parameters has a biases which is not expected here:
I need to look into this further |
OK, I think SwitchLinear isn't quite right -- it is missing the and I think the dimension mismatch comes down to quantization -- the SwitchLinear isn't being replaced by SwitchLinearQuantized because it doesn't implement the protocol: and I think QuantizedSwitchLinear will need to be a subtype of SwitchLinear for the replacement to happen -- this |
Thanks for that feedback. I've tried to make those changes, although I don't know how helpful this will be, since I unfortunately can't test it myself. If it's too much trouble and you'd prefer to focus on other priorities, don't worry about it. |
OK, I will see if I can test/finish this -- it may be a few days before I get a chance. |
Sorry for the long delay here -- was busy busy busy! I fixed up the quantized switch layers -- there were some issues with how they got initialized. The couple of lines you had commented as maybe issues were exactly the spots with problems. With a little testing everything is set there.
That takes about 22G to run. I think this is ready to merge once tests pass. |
Actually we may want to merge in the changes from #135 once that merges... Nope, no conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution!!
Fantastic, thank you! |
This is my attempt to port the Phi 3.5 MoE model from the Python implementation. Unfortunately I can't test it myself, since my MacBook doesn't have enough RAM. I marked two places in
PhiMoE.swift
with comments starting with!!
which need to be checked. You can test this withModelConfiguration.phi3_5MoE
. Go ahead and make any necessary changes if you'd like, since I won't be able to run this myself.