You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, could you please elaborate on what "Seems buggy, don't use this yet." means for the 8-bit + pipeline parallel example? What bug is there specifically? Does it affect training results or is it a tooling issue? I've been waiting to be able to fine tune the 65B model for a while now and if there's anything I can do with testing or fixing this bug, I'd love some pointers. Thanks!
The text was updated successfully, but these errors were encountered:
Hello, could you please elaborate on what "Seems buggy, don't use this yet." means for the 8-bit + pipeline parallel example? What bug is there specifically? Does it affect training results or is it a tooling issue? I've been waiting to be able to fine tune the 65B model for a while now and if there's anything I can do with testing or fixing this bug, I'd love some pointers. Thanks!
The text was updated successfully, but these errors were encountered: