TypeError: initialize_ub() got an unexpected keyword argument 'tp_size' #1376

wccccp · 2024-12-13T15:53:12Z

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0

Name: torch
Version: 2.4.0a0+3bcc3cddb5.nv24.7
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: flash-attn, lightning-thunder, torch-tensorrt, torchvision

Traceback (most recent call last):
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 279, in
[rank2]: pretrain(
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 223, in pretrain
[rank2]: initialize_megatron(
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/initialize.py", line 105, in initialize_megatron
[rank2]: _initialize_tp_communicators()
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/initialize.py", line 211, in _initialize_tp_communicators
[rank2]: te_module.base.initialize_ub(shape = input_shape, tp_size = args.tensor_model_parallel_size,
[rank2]: TypeError: initialize_ub() got an unexpected keyword argument 'tp_size'

ksivaman · 2024-12-13T15:59:49Z

What TE version are you using? Note that tp_size is a positional argument to initialize_ub and not a kwarg. Perhaps we could move the issue to megatron-LM?

ksivaman · 2024-12-13T16:06:21Z

@wccccp The tp_size argument was introduced in TE v1.9.0 before which it was tp_group. Megatron-LM does a version check but I do see that they pass the wrong argument for older TE version, see here.

wccccp · 2024-12-16T02:05:45Z

@wccccp The tp_size argument was introduced in TE v1.9.0 before which it was tp_group. Megatron-LM does a version check but I do see that they pass the wrong argument for older TE version, see here.

Thank you for your response. I have located the appropriate version for compatibility.

ksivaman self-assigned this Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: initialize_ub() got an unexpected keyword argument 'tp_size' #1376

TypeError: initialize_ub() got an unexpected keyword argument 'tp_size' #1376

wccccp commented Dec 13, 2024

ksivaman commented Dec 13, 2024 •

edited

Loading

ksivaman commented Dec 13, 2024

wccccp commented Dec 16, 2024

TypeError: initialize_ub() got an unexpected keyword argument 'tp_size' #1376

TypeError: initialize_ub() got an unexpected keyword argument 'tp_size' #1376

Comments

wccccp commented Dec 13, 2024

ksivaman commented Dec 13, 2024 • edited Loading

ksivaman commented Dec 13, 2024

wccccp commented Dec 16, 2024

ksivaman commented Dec 13, 2024 •

edited

Loading