Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Support for torch 2.5 #2946

Closed
3 tasks done
Galaxy-Husky opened this issue Dec 24, 2024 · 8 comments
Closed
3 tasks done

[Question] Support for torch 2.5 #2946

Galaxy-Husky opened this issue Dec 24, 2024 · 8 comments
Assignees

Comments

@Galaxy-Husky
Copy link
Contributor

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Hi,
I noticed that the restriction on torch version has been relaxed to 2.5.1:

torch<=2.5.1,>=2.0.0

But when I tried installing the latest code via: pip install "git+https://github.com/InternLM/lmdeploy.git@main" -U , it would remove torch 2.5.1 and install torch 2.4.0.

Can you tell me how to install lmdeploy without changing the torch version?

Reproduction

pip install "git+https://github.com/InternLM/lmdeploy.git@main" -U

Environment

sys.platform: linux
Python: 3.12.8 | packaged by conda-forge | (main, Dec  5 2024, 14:24:40) [GCC 13.3.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA A100-SXM4-40GB
CUDA_HOME: None
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.5.1+cu124
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.4
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.19.0+cu121
LMDeploy: 0.6.4+3009f92
transformers: 4.47.0
gradio: Not Found
fastapi: 0.115.6
pydantic: 2.10.3
triton: 3.1.0
NVIDIA Topology: 
	�[4mGPU0	GPU1	GPU2	GPU3	CPU Affinity	NUMA Affinity�[0m
GPU0	 X 	NV4	NV4	NV4	0-31,64-95	0
GPU1	NV4	 X 	NV4	NV4	0-31,64-95	0
GPU2	NV4	NV4	 X 	NV4	32-63,96-127	1
GPU3	NV4	NV4	NV4	 X 	32-63,96-127	1

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

No response

@RunningLeon
Copy link
Collaborator

@Galaxy-Husky hi, sorry for the trouble. Because the requirements for torchvision is not updated to torch2.5.1, so it would install torch2.4.0.
This issue has been fixed in pr #2912 . Pls try on the latest main branch.
requirements/runtime_cuda.txt

- torchvision<=0.19.0,>=0.15.0
+ torchvision<=0.20.1,>=0.15.0

@Galaxy-Husky
Copy link
Contributor Author

@Galaxy-Husky hi, sorry for the trouble. Because the requirements for torchvision is not updated to torch2.5.1, so it would install torch2.4.0. This issue has been fixed in pr #2912 . Pls try on the latest main branch. requirements/runtime_cuda.txt

- torchvision<=0.19.0,>=0.15.0
+ torchvision<=0.20.1,>=0.15.0

Thanks for your reply.
I tried on the latest main branch before I submit this issue. I'm guessing the previous failure was caused by me not updating torchvision==0.19.0 to 0.20.1.
But it still failed after I tried again by creating a new environment. It would remove torch 2.5.1 and install torch 2.4.1. The following are my commands:

conda create -n lmdeploy python=3.12
pip3 install torch==2.5.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install "git+https://github.com/InternLM/lmdeploy.git@main" -U 

Could you help me check again?

@RunningLeon
Copy link
Collaborator

@Galaxy-Husky ok, let me check again if there still has other problems.

@RunningLeon
Copy link
Collaborator

@Galaxy-Husky hi, can you try again install with --use-deprecated=legacy-resolver to pip.

pip install "git+https://github.com/InternLM/lmdeploy.git@main" -U --use-deprecated=legacy-resolver --extra-index-url https://download.pytorch.org/whl/cu124

@Galaxy-Husky
Copy link
Contributor Author

@RunningLeon Hi, It raised an error:

...
Successfully built lmdeploy
Installing collected packages: triton, lmdeploy
  Attempting uninstall: triton
    Found existing installation: triton 3.1.0
    Uninstalling triton-3.1.0:
      Successfully uninstalled triton-3.1.0
  Attempting uninstall: lmdeploy
    Found existing installation: lmdeploy 0.6.4
    Uninstalling lmdeploy-0.6.4:
      Successfully uninstalled lmdeploy-0.6.4
ERROR: pip's legacy dependency resolver does not consider dependency conflicts when selecting packages. This behaviour is the source of the following dependency conflicts.
torch 2.5.1+cu124 requires triton==3.1.0; platform_system == "Linux" and platform_machine == "x86_64" and python_version < "3.13", but you'll have triton 3.0.0 which is incompatible.
Successfully installed lmdeploy-0.6.4 triton-3.0.0

@RunningLeon
Copy link
Collaborator

hi, it seems lmdeploy with torch2.5.1 has been installed successfully. The msg can be ignored since triton version in lmdeploy is fixed to triton==3.0.0.

@Galaxy-Husky
Copy link
Contributor Author

@RunningLeon I see. BTW, it would be great if you could tell me whether torch2.5.1 work well with triton==3.0.0 or not. And when will lmdeploy upgrade triton to 3.1.0?

@RunningLeon
Copy link
Collaborator

hi, triton3.0.0 can work well with torch2.5.1. As for upgrading triton to 3.1.0, there would need a full scale test.

@RunningLeon I see. BTW, it would be great if you could tell me whether torch2.5.1 work well with triton==3.0.0 or not. And when will lmdeploy upgrade triton to 3.1.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants