Migration to latest versions of torch & flash-attn to solve warmstart/fsdp2/weight tying problem #384

flxst · 2025-07-08T14:13:40Z

What does this PR do?

This PR addresses #381. The problem can be traced back to a bug in torch 2.6 related to the fact that we flatten the optimizer state dict here and here.

A solution is to simply migrate to torch 2.7. This requires to also migrate flash-attn to version 2.8.

This PR includes both migrations, along with minor adjustments of the warmstart config files.

Unit tests pass with github actions.

General Changes

None

Breaking Changes

None

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py)
I have updated the internal changelog (CHANGELOG_DEV.md)

flxst · 2025-07-11T10:03:58Z

Everything seems to work if conda is used. However, with uv or python -m venv, the installation of flash-attn==2.8.0.post2 fails, as reported here.

flxst · 2025-07-18T11:39:27Z

Follow-up problem: Dao-AILab/flash-attention#1708

torch==2.6.0 & flash-attn==2.7.4.post1: works
torch==2.7.1 & flash-attn==2.8.0.post2: sometimes fails (depending on platform)

flxst added 3 commits July 8, 2025 15:05

fix: warmstart configs (avoid hardcoded paths)

2f887f3

chore: Merge branch 'main' into fix/warmstart_fsdp2_weight_tying

87799b0

chore: migrate to torch==2.7.1 and flash-attn=2.8.0.post2

b2d197c

flxst requested a review from le1nux July 8, 2025 14:13

flxst added 3 commits July 10, 2025 16:12

chore: increase wandb init timeout

e2a21c4

fix: warmstart configs (reset weight_tying to false)

4b630c1

chore: pin torch version in uv installation instructions

f280e46

flxst marked this pull request as draft July 11, 2025 11:38

chore: Merge branch 'main' into fix/warmstart_fsdp1_weight_tying

490b7f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migration to latest versions of torch & flash-attn to solve warmstart/fsdp2/weight tying problem #384

Migration to latest versions of torch & flash-attn to solve warmstart/fsdp2/weight tying problem #384

Uh oh!

flxst commented Jul 8, 2025 •

edited

Loading

Uh oh!

flxst commented Jul 11, 2025

Uh oh!

flxst commented Jul 18, 2025

Uh oh!

Uh oh!

Migration to latest versions of torch & flash-attn to solve warmstart/fsdp2/weight tying problem #384

Are you sure you want to change the base?

Migration to latest versions of torch & flash-attn to solve warmstart/fsdp2/weight tying problem #384

Uh oh!

Conversation

flxst commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

General Changes

Breaking Changes

Checklist before submitting final PR

Uh oh!

flxst commented Jul 11, 2025

Uh oh!

flxst commented Jul 18, 2025

Uh oh!

Uh oh!

flxst commented Jul 8, 2025 •

edited

Loading