Skip to content

Conversation

giordano
Copy link
Member

@giordano giordano commented Aug 2, 2025

Just an experiment for now, let's see how it goes.

@giordano
Copy link
Member Author

giordano commented Aug 2, 2025

I expect this to use ~20 GB of memory on each GPU, this should be a first test to push the grid size up to fill up the memory, if there aren't problems.

@giordano
Copy link
Member Author

giordano commented Aug 2, 2025

My prediction was correct https://github.com/EnzymeAD/Enzyme-JAX/actions/runs/16698290868/job/47265531418#step:15:745

│    bytes_in_use: 21894366976

but this is still OOM'ing https://github.com/EnzymeAD/Enzyme-JAX/actions/runs/16698290868/job/47265531418#step:15:777

2025-08-02 23:46:25.965629: W external/xla/xla/hlo/transforms/simplifiers/hlo_rematerialization.cc:3423] Can't reduce memory use below 10.64GiB (11429599125 bytes) by rematerialization; only reduced to 67.69GiB (72679996812 bytes), down from 71.05GiB (76287081352 bytes) originally

followed by exhausted resources when running.

@giordano
Copy link
Member Author

Closing in favour of #1307.

@giordano giordano closed this Aug 19, 2025
@giordano giordano deleted the mg/gb25-larger-grid branch August 19, 2025 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant