Skip to content

[Feature Request]Warn on missing custom OP during dp --pt convert/dp --pt compress and allow overriding SHARED_LIB_DIR #5368

@ZhouXY-PKU

Description

@ZhouXY-PKU

Summary

Checklist

[x] I have searched existing issues to make sure this feature has not been requested before.
[x] I have checked the documentation of DeepMD-kit.

Description of the problem

When users compile a custom libdeepmd_op_pt.so (e.g., to match specific CUDA/GCC environments) and place it in a custom directory (not the default deepmd/lib/), the PyTorch backend silently fails to load it during dp --pt convert-backend or dp --pt compress.

The root cause is in deepmd/pt/cxx_op.py:

SHARED_LIB_DIR = Path(deepmd.lib.__path__[0])
module_file = (SHARED_LIB_DIR / (prefix + module_name)).with_suffix(ext).resolve()
if module_file.is_file():
    # loads the library

It strictly looks for the .so in the Python package installation path and ignores environment variables like LD_LIBRARY_PATH. If the file is missing, ENABLE_CUSTOMIZED_OP becomes False, and the model is serialized with dummy placeholder functions (e.g., tabulate_fusion_se_a that just raise NotImplementedError).

This results in a broken .pth model that passes conversion without any warnings, but crashes immediately in LAMMPS.

Currently, the only workaround is to manually symlink the .so into deepmd/lib/, which is non-intuitive and breaks upon environment updates.

Detailed Description

Describe the solution

  1. Add a critical warning/error during model serialization (convert/compress/freeze)
    When a model architecture requires customized OPs (e.g., uses tabulate_fusion_se_a for compressed SeA) but ENABLE_CUSTOMIZED_OP is False, dp --pt convert and dp --pt compress should not proceed silently.
    It should raise an explicit error or a prominent warning like:

    [ERROR] The current model requires customized PyTorch OPs (e.g., tabulate_fusion_se_a), but libdeepmd_op_pt.so was not loaded. The exported model will fail during inference. Please ensure the custom OP library is installed correctly.

  2. Allow overriding SHARED_LIB_DIR via Environment Variable
    In deepmd/pt/cxx_op.py, the path resolution logic should fall back to an environment variable (e.g., LD_LIBRARY_PATH) if the hardcoded SHARED_LIB_DIR does not contain the library.

Proposed logic

module_file = (SHARED_LIB_DIR / (prefix + module_name)).with_suffix(ext).resolve()
if not module_file.is_file():
    # Check environment variable override
    env_dir = os.environ.get("DEEPMD_OP_DIR")
    if env_dir:
        module_file = (Path(env_dir) / (prefix + module_name)).with_suffix(ext).resolve()

This would allow users to point to their custom-compiled OP libraries without modifying source code or creating symlinks.

Describe alternatives you’ve considered

Manually symlinking the .so to site-packages/deepmd/lib/ (Current workaround, fragile).
Modifying cxx_op.py source code directly (Gets overwritten on updates).
Setting LD_LIBRARY_PATH (Does not work because cxx_op.py uses explicit absolute path resolution via Path.is_file(), not torch.ops.load_library("deepmd_op_pt") directly).

Additional context

DeepMD-kit version: v3.1.0 (and likely affects v2.x PT backend as well)
PyTorch version: Built with _GLIBCXX_USE_CXX11_ABI=0 (conda-forge)

How to reproduce:

Compile libdeepmd_op_pt.so in a custom directory (e.g., ~/deepmd-kit/lib/).
Set export LD_LIBRARY_PATH=~/deepmd-kit/lib/:$LD_LIBRARY_PATH.
Run dp --pt convert-backend in.pb out.pth. (.pb may have ops like se_a)
Observe no warnings. Check ENABLE_CUSTOMIZED_OP -> it is False.
Run dp --pt compress out.pth compress.pth
Observe no warnings. Check ENABLE_CUSTOMIZED_OP -> it is False.
Run LAMMPS using out.pth is OK, BUT Run LAMMPS using compress.pth -> NotImplementedError. (some what like #4530 )

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions