Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: initialize_ub() got an unexpected keyword argument 'tp_size' #1376

Open
wccccp opened this issue Dec 13, 2024 · 3 comments
Open
Assignees

Comments

@wccccp
Copy link

wccccp commented Dec 13, 2024

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0

Name: torch
Version: 2.4.0a0+3bcc3cddb5.nv24.7
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: flash-attn, lightning-thunder, torch-tensorrt, torchvision

Traceback (most recent call last):
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 279, in
[rank2]: pretrain(
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 223, in pretrain
[rank2]: initialize_megatron(
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/initialize.py", line 105, in initialize_megatron
[rank2]: _initialize_tp_communicators()
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/initialize.py", line 211, in _initialize_tp_communicators
[rank2]: te_module.base.initialize_ub(shape = input_shape, tp_size = args.tensor_model_parallel_size,
[rank2]: TypeError: initialize_ub() got an unexpected keyword argument 'tp_size'

@ksivaman
Copy link
Member

ksivaman commented Dec 13, 2024

What TE version are you using? Note that tp_size is a positional argument to initialize_ub and not a kwarg. Perhaps we could move the issue to megatron-LM?

@ksivaman
Copy link
Member

@wccccp The tp_size argument was introduced in TE v1.9.0 before which it was tp_group. Megatron-LM does a version check but I do see that they pass the wrong argument for older TE version, see here.

@ksivaman ksivaman self-assigned this Dec 13, 2024
@wccccp
Copy link
Author

wccccp commented Dec 16, 2024

@wccccp The tp_size argument was introduced in TE v1.9.0 before which it was tp_group. Megatron-LM does a version check but I do see that they pass the wrong argument for older TE version, see here.

Thank you for your response. I have located the appropriate version for compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants