You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What TE version are you using? Note that tp_size is a positional argument to initialize_ub and not a kwarg. Perhaps we could move the issue to megatron-LM?
@wccccp The tp_size argument was introduced in TE v1.9.0 before which it was tp_group. Megatron-LM does a version check but I do see that they pass the wrong argument for older TE version, see here.
@wccccp The tp_size argument was introduced in TE v1.9.0 before which it was tp_group. Megatron-LM does a version check but I do see that they pass the wrong argument for older TE version, see here.
Thank you for your response. I have located the appropriate version for compatibility.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
Name: torch
Version: 2.4.0a0+3bcc3cddb5.nv24.7
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: flash-attn, lightning-thunder, torch-tensorrt, torchvision
Traceback (most recent call last):
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 279, in
[rank2]: pretrain(
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 223, in pretrain
[rank2]: initialize_megatron(
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/initialize.py", line 105, in initialize_megatron
[rank2]: _initialize_tp_communicators()
[rank2]: File "/gpfs02/unifiedcsi/gpfs/csi-dfs-ti-platform-fs/AI_center_main/wcp/new_git/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/initialize.py", line 211, in _initialize_tp_communicators
[rank2]: te_module.base.initialize_ub(shape = input_shape, tp_size = args.tensor_model_parallel_size,
[rank2]: TypeError: initialize_ub() got an unexpected keyword argument 'tp_size'
The text was updated successfully, but these errors were encountered: