Skip to content

Commit fb70138

Browse files
authored
Fix NCCL_ASYNC_ERROR_HANDLING deprecation warning (#711)
* Fix NCCL_ASYNC_ERROR_HANDLING deprecation warning It looks like the patch from pytorch/pytorch#114077 landed in torch 2.2.0. Fixes #568. * Update CHANGELOG.md
1 parent e6d7b02 commit fb70138

File tree

2 files changed

+6
-1
lines changed

2 files changed

+6
-1
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3434

3535
- Update pytests to skip when the required dependencies are not present
3636
- Bug in data processing script in domino training example
37+
- Fixed NCCL_ASYNC_ERROR_HANDLING deprecation warning
3738

3839
### Security
3940

modulus/distributed/manager.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -332,7 +332,11 @@ def initialize():
332332
addr = os.getenv("MASTER_ADDR", "localhost")
333333
port = os.getenv("MASTER_PORT", "12355")
334334
# https://pytorch.org/docs/master/notes/cuda.html#id5
335-
os.environ["NCCL_ASYNC_ERROR_HANDLING"] = "0"
335+
# was changed in version 2.2
336+
if torch.__version__ < (2, 2):
337+
os.environ["NCCL_ASYNC_ERROR_HANDLING"] = "0"
338+
else:
339+
os.environ["TORCH_NCCL_ASYNC_ERROR_HANDLING"] = "0"
336340
initialization_method = os.getenv("MODULUS_DISTRIBUTED_INITIALIZATION_METHOD")
337341
if initialization_method is None:
338342
try:

0 commit comments

Comments
 (0)