You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The regression training runs fine but the diffusion training gives an error.
And I modified the modulus/metrics/diffusion/loss.py according to the issue #756,but it didn't work.
I don't know whether it is because of a certain file code or the library version I downloaded. Please help me greatly appreciated
Minimum reproducible example
(corrdiff) cyh@omnisky:/data/model/modulus-0.9.0/examples/generative/corrdiff$ python train.py --config-name=config_training_mini_diffusion.yaml ++dataset.data_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/hrrr_mini_train.nc ++dataset.stats_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/stats.json ++training.io.regression_checkpoint_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/UNet.0.2000000.mdlus
/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/modulus/distributed/manager.py:346: UserWarning: Could not initialize using ENV, SLURM or OPENMPI methods. Assuming this is a single process job
warn(
[2025-03-06 15:54:42,810][main][INFO] - Saving the outputs in /data/model/modulus-0.9.0/examples/generative/corrdiff/outputs/mini_diffusion
[2025-03-06 15:55:04,139][main][INFO] - Patch-based training disabled
[2025-03-06 15:55:04,527][main][INFO] - Loaded the pre-trained regression model
[2025-03-06 15:55:05,123][main][INFO] - Using 1 gradient accumulation rounds
[2025-03-06 15:55:05,123][checkpoint][WARNING] - Provided checkpoint directory ./checkpoints_diffusion does not exist, skipping load
[2025-03-06 15:55:05,123][main][INFO] - Training for 8000000 images...
Error executing job with overrides: ['++dataset.data_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/hrrr_mini_train.nc', '++dataset.stats_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/stats.json', '++training.io.regression_checkpoint_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/UNet.0.2000000.mdlus']
Traceback (most recent call last):
File "/data/model/modulus-0.9.0/examples/generative/corrdiff/train.py", line 333, in main
loss = loss_fn(**loss_fn_kwargs)
File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/modulus/metrics/diffusion/loss.py", line 521, in __call__
y_mean = self.unet(
File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/modulus/models/diffusion/unet.py", line 134, in forward
F_x = self.model(
File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/nvtx/nvtx.py", line 122, in inner
result = func(*args, **kwargs)
TypeError: SongUNetPosEmbd.forward() got an unexpected keyword argument 'lead_time_label'
This probably happens because you have an outdated version of the Modulus package compared to the CorrDiff example. To make sure you have the compatible versions of the Modulus package and the example, as a workaround you can install Modulus by going to the root directory (i.e. the directory that contains pyproject.toml) of the repository that you cloned to get the examples, and install Modulus from there:
Version
0.9.0
On which installation method(s) does this occur?
Pip
Describe the issue
Hello
I installed nvidia modulus on Ubuntu system with GPU and am trying to run the Mini HRRR CorrDiff example from https://docs.nvidia.com/deeplearning/modulus/modulus-core/examples/generative/corrdiff/readme.html
The regression training runs fine but the diffusion training gives an error.
And I modified the modulus/metrics/diffusion/loss.py according to the issue #756,but it didn't work.
I don't know whether it is because of a certain file code or the library version I downloaded. Please help me greatly appreciated
Minimum reproducible example
Relevant log output
Environment details
# Name Version Build Channel _libgcc_mutex 0.1 main anaconda _openmp_mutex 5.1 1_gnu anaconda absl-py 2.1.0 py310h06a4308_0 anaconda aiobotocore 2.20.0 pypi_0 pypi aiohappyeyeballs 2.4.6 pypi_0 pypi aiohttp 3.11.13 pypi_0 pypi aioitertools 0.12.0 pypi_0 pypi aiosignal 1.3.2 pypi_0 pypi alembic 1.14.1 pypi_0 pypi annotated-types 0.7.0 pypi_0 pypi antlr4-python3-runtime 4.9.3 pypi_0 pypi asciitree 0.3.3 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi async-timeout 5.0.1 pypi_0 pypi attrs 25.1.0 pypi_0 pypi blas 1.0 openblas anaconda blinker 1.9.0 pypi_0 pypi blosc 1.21.6 he440d0b_1 conda-forge botocore 1.36.23 pypi_0 pypi bzip2 1.0.8 h5eee18b_6 anaconda c-ares 1.34.4 hb9d3cd8_0 conda-forge ca-certificates 2025.1.31 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.5.2 pypi_0 pypi certifi 2025.1.31 pyhd8ed1ab_0 conda-forge cftime 1.6.4.post1 pypi_0 pypi charset-normalizer 3.4.1 pypi_0 pypi click 8.1.8 pypi_0 pypi cloudpickle 3.1.1 pypi_0 pypi contourpy 1.3.1 pypi_0 pypi cycler 0.12.1 pypi_0 pypi cython 3.0.12 pypi_0 pypi dask 2025.2.0 pypi_0 pypi databricks-sdk 0.44.1 pypi_0 pypi deprecated 1.2.18 pypi_0 pypi dm-tree 0.1.9 pypi_0 pypi docker 7.1.0 pypi_0 pypi docker-pycreds 0.4.0 pypi_0 pypi einops 0.8.1 pypi_0 pypi fasteners 0.19 pypi_0 pypi filelock 3.17.0 pypi_0 pypi flask 3.1.0 pypi_0 pypi fonttools 4.56.0 pypi_0 pypi frozenlist 1.5.0 pypi_0 pypi fsspec 2025.2.0 pypi_0 pypi gast 0.6.0 pypi_0 pypi gitdb 4.0.12 pypi_0 pypi gitpython 3.1.44 pypi_0 pypi google-auth 2.38.0 pypi_0 pypi graphene 3.4.3 pypi_0 pypi graphql-core 3.2.6 pypi_0 pypi graphql-relay 3.2.0 pypi_0 pypi greenlet 3.1.1 pypi_0 pypi grpcio 1.62.2 py310h1b8f574_0 conda-forge gunicorn 23.0.0 pypi_0 pypi h5netcdf 1.5.0 pyhd8ed1ab_0 conda-forge h5py 3.12.1 nompi_py310hacc6608_103 conda-forge hdf4 4.2.15 h2a13503_7 conda-forge hdf5 1.14.4 nompi_h2d575fe_105 conda-forge huggingface-hub 0.29.1 pypi_0 pypi hydra-core 1.3.2 pypi_0 pypi icu 75.1 he02047a_0 conda-forge idna 3.10 pypi_0 pypi importlib-metadata 8.5.0 pypi_0 pypi itsdangerous 2.2.0 pypi_0 pypi jinja2 3.1.5 pypi_0 pypi jmespath 1.0.1 pypi_0 pypi joblib 1.4.2 pypi_0 pypi keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.8 pypi_0 pypi krb5 1.21.3 h659f571_0 conda-forge ld_impl_linux-64 2.40 h12ee557_0 anaconda libabseil 20240116.2 cxx17_h6a678d5_0 anaconda libaec 1.1.3 h59595ed_0 conda-forge libcurl 8.12.1 h332b0f4_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libffi 3.4.4 h6a678d5_1 anaconda libgcc 14.2.0 h767d61c_2 conda-forge libgcc-ng 14.2.0 h69a702a_2 conda-forge libgfortran 14.2.0 h69a702a_2 conda-forge libgfortran-ng 14.2.0 h69a702a_2 conda-forge libgfortran5 14.2.0 hf1ad2bd_2 conda-forge libgomp 14.2.0 h767d61c_2 conda-forge libgrpc 1.62.2 h15f2491_0 conda-forge libiconv 1.18 h4ce23a2_1 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge liblzma 5.6.4 hb9d3cd8_0 conda-forge libnetcdf 4.9.2 nompi_h5ddbaa4_116 conda-forge libnghttp2 1.64.0 h161d5f1_0 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libopenblas 0.3.21 h043d6bf_0 anaconda libprotobuf 4.25.3 hd5b35b9_1 conda-forge libre2-11 2023.09.01 h5a48ba9_2 conda-forge libsqlite 3.45.2 h2797004_0 conda-forge libssh2 1.11.1 hf672d98_0 conda-forge libstdcxx 14.2.0 h8f9b012_2 conda-forge libstdcxx-ng 14.2.0 h4852527_2 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxml2 2.13.6 h8d12d68_0 conda-forge libzip 1.11.2 h6991a6a_0 conda-forge libzlib 1.3.1 hb9d3cd8_2 conda-forge llvmlite 0.44.0 pypi_0 pypi locket 1.0.0 pypi_0 pypi lz4-c 1.10.0 h5888daf_1 conda-forge mako 1.3.9 pypi_0 pypi markdown 3.4.1 py310h06a4308_0 anaconda markupsafe 3.0.2 py310h5eee18b_0 anaconda matplotlib 3.10.0 pypi_0 pypi mlflow 2.20.3 pypi_0 pypi mlflow-skinny 2.20.3 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi multidict 6.1.0 pypi_0 pypi ncurses 6.4 h6a678d5_0 anaconda netcdf4 1.7.2 nompi_py310h5146f0f_101 conda-forge networkx 3.4.2 pypi_0 pypi numba 0.61.0 pypi_0 pypi numcodecs 0.13.1 pypi_0 pypi numpy 2.2.3 pypi_0 pypi nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi nvidia-curand-cu12 10.3.5.147 pypi_0 pypi nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi nvidia-cusparselt-cu12 0.6.2 pypi_0 pypi nvidia-dali-cuda120 1.47.0 pypi_0 pypi nvidia-modulus 0.9.0 pypi_0 pypi nvidia-nccl-cu12 2.21.5 pypi_0 pypi nvidia-nvimgcodec-cu12 0.4.1.21 pypi_0 pypi nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi nvidia-nvjpeg2k-cu12 0.8.1.40 pypi_0 pypi nvidia-nvtiff-cu12 0.4.0.62 pypi_0 pypi nvidia-nvtx-cu12 12.4.127 pypi_0 pypi nvtx 0.2.11 pypi_0 pypi omegaconf 2.3.0 pypi_0 pypi onnx 1.17.0 pypi_0 pypi opencv-python 4.11.0.86 pypi_0 pypi openssl 3.4.1 h7b32b05_0 conda-forge opentelemetry-api 1.30.0 pypi_0 pypi opentelemetry-sdk 1.30.0 pypi_0 pypi opentelemetry-semantic-conventions 0.51b0 pypi_0 pypi packaging 24.2 pyhd8ed1ab_2 conda-forge pandas 2.2.3 pypi_0 pypi partd 1.4.2 pypi_0 pypi pillow 11.1.0 pypi_0 pypi pint 0.19.2 pypi_0 pypi pip 25.0 py310h06a4308_0 anaconda platformdirs 4.3.6 pypi_0 pypi propcache 0.3.0 pypi_0 pypi protobuf 4.25.3 py310he36ed58_1 anaconda psutil 5.9.0 py310h5eee18b_1 anaconda pyarrow 19.0.1 pypi_0 pypi pyasn1 0.6.1 pypi_0 pypi pyasn1-modules 0.4.1 pypi_0 pypi pydantic 2.10.6 pypi_0 pypi pydantic-core 2.27.2 pypi_0 pypi pyparsing 3.2.1 pypi_0 pypi python 3.10.13 hd12c33a_1_cpython conda-forge python-dateutil 2.9.0.post0 pypi_0 pypi python_abi 3.10 5_cp310 conda-forge pytz 2025.1 pypi_0 pypi pyyaml 6.0.2 pypi_0 pypi re2 2023.09.01 h7f4b329_2 conda-forge readline 8.2 h5eee18b_0 anaconda requests 2.32.3 pypi_0 pypi rsa 4.9 pypi_0 pypi s3fs 2025.2.0 pypi_0 pypi safetensors 0.5.3 pypi_0 pypi scikit-learn 1.6.1 pypi_0 pypi scipy 1.15.2 pypi_0 pypi sentry-sdk 2.22.0 pypi_0 pypi setproctitle 1.3.5 pypi_0 pypi setuptools 75.8.0 py310h06a4308_0 anaconda six 1.16.0 pyhd3eb1b0_1 anaconda smmap 5.0.2 pypi_0 pypi snappy 1.2.1 h8bd8927_1 conda-forge sqlalchemy 2.0.38 pypi_0 pypi sqlite 3.45.2 h2c6b66d_0 conda-forge sqlparse 0.5.3 pypi_0 pypi sympy 1.13.1 pypi_0 pypi tensorboard 2.17.0 py310h06a4308_0 anaconda tensorboard-data-server 0.7.0 py310h52d8a92_1 anaconda termcolor 2.5.0 pypi_0 pypi threadpoolctl 3.5.0 pypi_0 pypi timm 1.0.15 pypi_0 pypi tk 8.6.13 noxft_h4845f30_101 conda-forge toolz 1.0.0 pypi_0 pypi torch 2.6.0 pypi_0 pypi torchaudio 2.6.0 pypi_0 pypi torchvision 0.21.0 pypi_0 pypi tqdm 4.67.1 pypi_0 pypi treelib 1.7.0 pypi_0 pypi triton 3.2.0 pypi_0 pypi typing-extensions 4.12.2 pypi_0 pypi tzdata 2025.1 pypi_0 pypi urllib3 2.3.0 pypi_0 pypi wandb 0.19.7 pypi_0 pypi werkzeug 3.1.3 py310h06a4308_0 anaconda wheel 0.45.1 py310h06a4308_0 anaconda wrapt 1.17.2 pypi_0 pypi xarray 2025.1.2 pypi_0 pypi xz 5.6.4 h5eee18b_1 anaconda yarl 1.18.3 pypi_0 pypi zarr 2.18.3 pypi_0 pypi zipp 3.21.0 pypi_0 pypi zlib 1.3.1 hb9d3cd8_2 conda-forge zstd 1.5.7 hb8e6e7a_1 conda-forge
The text was updated successfully, but these errors were encountered: