Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛[BUG]: CorrDiff Mini-hrrrr #804

Closed
yhchen1112 opened this issue Mar 6, 2025 · 3 comments
Closed

🐛[BUG]: CorrDiff Mini-hrrrr #804

yhchen1112 opened this issue Mar 6, 2025 · 3 comments
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@yhchen1112
Copy link

Version

0.9.0

On which installation method(s) does this occur?

Pip

Describe the issue

Hello
I installed nvidia modulus on Ubuntu system with GPU and am trying to run the Mini HRRR CorrDiff example from https://docs.nvidia.com/deeplearning/modulus/modulus-core/examples/generative/corrdiff/readme.html

The regression training runs fine but the diffusion training gives an error.

And I modified the modulus/metrics/diffusion/loss.py according to the issue #756,but it didn't work.

I don't know whether it is because of a certain file code or the library version I downloaded. Please help me greatly appreciated

Minimum reproducible example

(corrdiff) cyh@omnisky:/data/model/modulus-0.9.0/examples/generative/corrdiff$ python train.py --config-name=config_training_mini_diffusion.yaml ++dataset.data_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/hrrr_mini_train.nc ++dataset.stats_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/stats.json ++training.io.regression_checkpoint_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/UNet.0.2000000.mdlus 
/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/modulus/distributed/manager.py:346: UserWarning: Could not initialize using ENV, SLURM or OPENMPI methods. Assuming this is a single process job
  warn(
[2025-03-06 15:54:42,810][main][INFO] - Saving the outputs in /data/model/modulus-0.9.0/examples/generative/corrdiff/outputs/mini_diffusion
[2025-03-06 15:55:04,139][main][INFO] - Patch-based training disabled
[2025-03-06 15:55:04,527][main][INFO] - Loaded the pre-trained regression model
[2025-03-06 15:55:05,123][main][INFO] - Using 1 gradient accumulation rounds
[2025-03-06 15:55:05,123][checkpoint][WARNING] - Provided checkpoint directory ./checkpoints_diffusion does not exist, skipping load
[2025-03-06 15:55:05,123][main][INFO] - Training for 8000000 images...
Error executing job with overrides: ['++dataset.data_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/hrrr_mini_train.nc', '++dataset.stats_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/stats.json', '++training.io.regression_checkpoint_path=/data/model/modulus-0.9.0/examples/generative/corrdiff/UNet.0.2000000.mdlus']
Traceback (most recent call last):
  File "/data/model/modulus-0.9.0/examples/generative/corrdiff/train.py", line 333, in main
    loss = loss_fn(**loss_fn_kwargs)
  File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/modulus/metrics/diffusion/loss.py", line 521, in __call__
    y_mean = self.unet(
  File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/modulus/models/diffusion/unet.py", line 134, in forward
    F_x = self.model(
  File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/Downloads/anaconda3/envs/corrdiff/lib/python3.10/site-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
TypeError: SongUNetPosEmbd.forward() got an unexpected keyword argument 'lead_time_label'

Relevant log output

Environment details

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    anaconda
_openmp_mutex             5.1                       1_gnu    anaconda
absl-py                   2.1.0           py310h06a4308_0    anaconda
aiobotocore               2.20.0                   pypi_0    pypi
aiohappyeyeballs          2.4.6                    pypi_0    pypi
aiohttp                   3.11.13                  pypi_0    pypi
aioitertools              0.12.0                   pypi_0    pypi
aiosignal                 1.3.2                    pypi_0    pypi
alembic                   1.14.1                   pypi_0    pypi
annotated-types           0.7.0                    pypi_0    pypi
antlr4-python3-runtime    4.9.3                    pypi_0    pypi
asciitree                 0.3.3                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
async-timeout             5.0.1                    pypi_0    pypi
attrs                     25.1.0                   pypi_0    pypi
blas                      1.0                    openblas    anaconda
blinker                   1.9.0                    pypi_0    pypi
blosc                     1.21.6               he440d0b_1    conda-forge
botocore                  1.36.23                  pypi_0    pypi
bzip2                     1.0.8                h5eee18b_6    anaconda
c-ares                    1.34.4               hb9d3cd8_0    conda-forge
ca-certificates           2025.1.31            hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.5.2                    pypi_0    pypi
certifi                   2025.1.31          pyhd8ed1ab_0    conda-forge
cftime                    1.6.4.post1              pypi_0    pypi
charset-normalizer        3.4.1                    pypi_0    pypi
click                     8.1.8                    pypi_0    pypi
cloudpickle               3.1.1                    pypi_0    pypi
contourpy                 1.3.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
cython                    3.0.12                   pypi_0    pypi
dask                      2025.2.0                 pypi_0    pypi
databricks-sdk            0.44.1                   pypi_0    pypi
deprecated                1.2.18                   pypi_0    pypi
dm-tree                   0.1.9                    pypi_0    pypi
docker                    7.1.0                    pypi_0    pypi
docker-pycreds            0.4.0                    pypi_0    pypi
einops                    0.8.1                    pypi_0    pypi
fasteners                 0.19                     pypi_0    pypi
filelock                  3.17.0                   pypi_0    pypi
flask                     3.1.0                    pypi_0    pypi
fonttools                 4.56.0                   pypi_0    pypi
frozenlist                1.5.0                    pypi_0    pypi
fsspec                    2025.2.0                 pypi_0    pypi
gast                      0.6.0                    pypi_0    pypi
gitdb                     4.0.12                   pypi_0    pypi
gitpython                 3.1.44                   pypi_0    pypi
google-auth               2.38.0                   pypi_0    pypi
graphene                  3.4.3                    pypi_0    pypi
graphql-core              3.2.6                    pypi_0    pypi
graphql-relay             3.2.0                    pypi_0    pypi
greenlet                  3.1.1                    pypi_0    pypi
grpcio                    1.62.2          py310h1b8f574_0    conda-forge
gunicorn                  23.0.0                   pypi_0    pypi
h5netcdf                  1.5.0              pyhd8ed1ab_0    conda-forge
h5py                      3.12.1          nompi_py310hacc6608_103    conda-forge
hdf4                      4.2.15               h2a13503_7    conda-forge
hdf5                      1.14.4          nompi_h2d575fe_105    conda-forge
huggingface-hub           0.29.1                   pypi_0    pypi
hydra-core                1.3.2                    pypi_0    pypi
icu                       75.1                 he02047a_0    conda-forge
idna                      3.10                     pypi_0    pypi
importlib-metadata        8.5.0                    pypi_0    pypi
itsdangerous              2.2.0                    pypi_0    pypi
jinja2                    3.1.5                    pypi_0    pypi
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.8                    pypi_0    pypi
krb5                      1.21.3               h659f571_0    conda-forge
ld_impl_linux-64          2.40                 h12ee557_0    anaconda
libabseil                 20240116.2      cxx17_h6a678d5_0    anaconda
libaec                    1.1.3                h59595ed_0    conda-forge
libcurl                   8.12.1               h332b0f4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libffi                    3.4.4                h6a678d5_1    anaconda
libgcc                    14.2.0               h767d61c_2    conda-forge
libgcc-ng                 14.2.0               h69a702a_2    conda-forge
libgfortran               14.2.0               h69a702a_2    conda-forge
libgfortran-ng            14.2.0               h69a702a_2    conda-forge
libgfortran5              14.2.0               hf1ad2bd_2    conda-forge
libgomp                   14.2.0               h767d61c_2    conda-forge
libgrpc                   1.62.2               h15f2491_0    conda-forge
libiconv                  1.18                 h4ce23a2_1    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblzma                   5.6.4                hb9d3cd8_0    conda-forge
libnetcdf                 4.9.2           nompi_h5ddbaa4_116    conda-forge
libnghttp2                1.64.0               h161d5f1_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.21               h043d6bf_0    anaconda
libprotobuf               4.25.3               hd5b35b9_1    conda-forge
libre2-11                 2023.09.01           h5a48ba9_2    conda-forge
libsqlite                 3.45.2               h2797004_0    conda-forge
libssh2                   1.11.1               hf672d98_0    conda-forge
libstdcxx                 14.2.0               h8f9b012_2    conda-forge
libstdcxx-ng              14.2.0               h4852527_2    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.13.6               h8d12d68_0    conda-forge
libzip                    1.11.2               h6991a6a_0    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
llvmlite                  0.44.0                   pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
lz4-c                     1.10.0               h5888daf_1    conda-forge
mako                      1.3.9                    pypi_0    pypi
markdown                  3.4.1           py310h06a4308_0    anaconda
markupsafe                3.0.2           py310h5eee18b_0    anaconda
matplotlib                3.10.0                   pypi_0    pypi
mlflow                    2.20.3                   pypi_0    pypi
mlflow-skinny             2.20.3                   pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.1.0                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    anaconda
netcdf4                   1.7.2           nompi_py310h5146f0f_101    conda-forge
networkx                  3.4.2                    pypi_0    pypi
numba                     0.61.0                   pypi_0    pypi
numcodecs                 0.13.1                   pypi_0    pypi
numpy                     2.2.3                    pypi_0    pypi
nvidia-cublas-cu12        12.4.5.8                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.4.127                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.4.127                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.4.127                 pypi_0    pypi
nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
nvidia-cusparselt-cu12    0.6.2                    pypi_0    pypi
nvidia-dali-cuda120       1.47.0                   pypi_0    pypi
nvidia-modulus            0.9.0                    pypi_0    pypi
nvidia-nccl-cu12          2.21.5                   pypi_0    pypi
nvidia-nvimgcodec-cu12    0.4.1.21                 pypi_0    pypi
nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
nvidia-nvjpeg2k-cu12      0.8.1.40                 pypi_0    pypi
nvidia-nvtiff-cu12        0.4.0.62                 pypi_0    pypi
nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi
nvtx                      0.2.11                   pypi_0    pypi
omegaconf                 2.3.0                    pypi_0    pypi
onnx                      1.17.0                   pypi_0    pypi
opencv-python             4.11.0.86                pypi_0    pypi
openssl                   3.4.1                h7b32b05_0    conda-forge
opentelemetry-api         1.30.0                   pypi_0    pypi
opentelemetry-sdk         1.30.0                   pypi_0    pypi
opentelemetry-semantic-conventions 0.51b0                   pypi_0    pypi
packaging                 24.2               pyhd8ed1ab_2    conda-forge
pandas                    2.2.3                    pypi_0    pypi
partd                     1.4.2                    pypi_0    pypi
pillow                    11.1.0                   pypi_0    pypi
pint                      0.19.2                   pypi_0    pypi
pip                       25.0            py310h06a4308_0    anaconda
platformdirs              4.3.6                    pypi_0    pypi
propcache                 0.3.0                    pypi_0    pypi
protobuf                  4.25.3          py310he36ed58_1    anaconda
psutil                    5.9.0           py310h5eee18b_1    anaconda
pyarrow                   19.0.1                   pypi_0    pypi
pyasn1                    0.6.1                    pypi_0    pypi
pyasn1-modules            0.4.1                    pypi_0    pypi
pydantic                  2.10.6                   pypi_0    pypi
pydantic-core             2.27.2                   pypi_0    pypi
pyparsing                 3.2.1                    pypi_0    pypi
python                    3.10.13         hd12c33a_1_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python_abi                3.10                    5_cp310    conda-forge
pytz                      2025.1                   pypi_0    pypi
pyyaml                    6.0.2                    pypi_0    pypi
re2                       2023.09.01           h7f4b329_2    conda-forge
readline                  8.2                  h5eee18b_0    anaconda
requests                  2.32.3                   pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
s3fs                      2025.2.0                 pypi_0    pypi
safetensors               0.5.3                    pypi_0    pypi
scikit-learn              1.6.1                    pypi_0    pypi
scipy                     1.15.2                   pypi_0    pypi
sentry-sdk                2.22.0                   pypi_0    pypi
setproctitle              1.3.5                    pypi_0    pypi
setuptools                75.8.0          py310h06a4308_0    anaconda
six                       1.16.0             pyhd3eb1b0_1    anaconda
smmap                     5.0.2                    pypi_0    pypi
snappy                    1.2.1                h8bd8927_1    conda-forge
sqlalchemy                2.0.38                   pypi_0    pypi
sqlite                    3.45.2               h2c6b66d_0    conda-forge
sqlparse                  0.5.3                    pypi_0    pypi
sympy                     1.13.1                   pypi_0    pypi
tensorboard               2.17.0          py310h06a4308_0    anaconda
tensorboard-data-server   0.7.0           py310h52d8a92_1    anaconda
termcolor                 2.5.0                    pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
timm                      1.0.15                   pypi_0    pypi
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toolz                     1.0.0                    pypi_0    pypi
torch                     2.6.0                    pypi_0    pypi
torchaudio                2.6.0                    pypi_0    pypi
torchvision               0.21.0                   pypi_0    pypi
tqdm                      4.67.1                   pypi_0    pypi
treelib                   1.7.0                    pypi_0    pypi
triton                    3.2.0                    pypi_0    pypi
typing-extensions         4.12.2                   pypi_0    pypi
tzdata                    2025.1                   pypi_0    pypi
urllib3                   2.3.0                    pypi_0    pypi
wandb                     0.19.7                   pypi_0    pypi
werkzeug                  3.1.3           py310h06a4308_0    anaconda
wheel                     0.45.1          py310h06a4308_0    anaconda
wrapt                     1.17.2                   pypi_0    pypi
xarray                    2025.1.2                 pypi_0    pypi
xz                        5.6.4                h5eee18b_1    anaconda
yarl                      1.18.3                   pypi_0    pypi
zarr                      2.18.3                   pypi_0    pypi
zipp                      3.21.0                   pypi_0    pypi
zlib                      1.3.1                hb9d3cd8_2    conda-forge
zstd                      1.5.7                hb8e6e7a_1    conda-forge
@yhchen1112 yhchen1112 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Mar 6, 2025
@jleinonen
Copy link
Collaborator

This probably happens because you have an outdated version of the Modulus package compared to the CorrDiff example. To make sure you have the compatible versions of the Modulus package and the example, as a workaround you can install Modulus by going to the root directory (i.e. the directory that contains pyproject.toml) of the repository that you cloned to get the examples, and install Modulus from there:

pip install -e .

@yhchen1112
Copy link
Author

I successfully ran the diffusion model and thank you very much for your suggestions

@CharlelieLrt
Copy link
Collaborator

Duplicate of #797 which is already solved. Just need to make sure you use the latest version installed with pip, as suggested above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants