This repository has been archived by the owner on Aug 7, 2024. It is now read-only.
[DISCUSSION] fix float8 all-gather in FSDP2 + TP: DTensor(WeightWithDynamicFloat8CastTensor) #179
Job | Run time |
---|---|
1m 53s | |
1m 53s |
Job | Run time |
---|---|
1m 53s | |
1m 53s |