The provided qkv memory layout is not supported! When using RoPE #681

1049451037 · 2024-02-26T08:12:54Z

I think the problem has ever been solved before. But now it occurs again.

How to solve it? I have tested both stable branch and main branch. None of them work.

I just run the official text generation example of Megatron-LM by adding --position-embedding-type rope and --no-position-embedding args:

https://github.com/NVIDIA/Megatron-LM/blob/main/examples/run_text_generation_server_345M.sh

And got the error The provided qkv memory layout is not supported!

Moreover, I use the mcore version model instead of legacy model, so you should change it in text_generation_server.py to reproduce the error.

The text was updated successfully, but these errors were encountered:

1049451037 · 2024-02-27T05:24:49Z

I solved the problem by hard-code...

NVIDIA/Megatron-LM#703 (comment)

cyanguwa · 2024-03-01T00:16:07Z

Hi @1049451037 , could you provide more details about how to run the job please? Currently I can start the job but am stuck at

 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 76705792
 * Serving Flask app 'megatron.text_generation_server'
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://10.176.10.102:5000
INFO:werkzeug:Press CTRL+C to quit

Could you send me your complete run_text_generation_server_345M.sh script please?

1049451037 · 2024-03-01T10:32:15Z

I don't see any problem in your log. You run the server. Just need to start a client at: https://github.com/NVIDIA/Megatron-LM/blob/main/tools/text_generation_cli.py

python tools/text_generation_cli.py 10.176.10.102:5000

stgzr · 2024-03-05T09:11:23Z

Same issue. Any solutions?

cyanguwa · 2024-03-09T04:28:39Z

I tested with TE main (8255f87) and Megatron-LM main (8957468). I'm not seeing the issue above. Let me know if I'm not using the same run script as you have.

Thanks.

# container: nvcr.io/nvidia/pytorch:24.02-py3
# install the latest TransformerEngine and Megatron-LM

$ cat examples/run_text_generation_server_345M.sh
#!/bin/bash
# This example will start serving the 345M model.
DISTRIBUTED_ARGS="--nproc_per_node 1 \
                  --nnodes 1 \
                  --node_rank 0 \
                  --master_addr localhost \
                  --master_port 6000"

CHECKPOINT="" #<Path to checkpoint (e.g /345m)>
VOCAB_FILE=/code/gpt2-vocab.json
MERGE_FILE=/code/gpt2-merges.txt
DATA_PATH=/code/ds/ThePile/BookCorpus2_ftfy_cleaned_id_shuf_text_document

export CUDA_DEVICE_MAX_CONNECTIONS=1

pip install flask-restful

#torchrun $DISTRIBUTED_ARGS tools/run_text_generation_server.py   \
torchrun tools/run_text_generation_server.py   \
       --tensor-model-parallel-size 1  \
       --pipeline-model-parallel-size 1  \
       --num-layers 2  \
       --hidden-size 1024  \
       --num-attention-heads 16  \
       --max-position-embeddings 1024  \
       --tokenizer-type GPT2BPETokenizer  \
       --position-embedding-type rope \
       --no-position-embedding \
       --fp16  \
       --micro-batch-size 1  \
       --seq-length 1024  \
       --vocab-file $VOCAB_FILE  \
       --merge-file $MERGE_FILE  \
       --data-path $DATA_PATH \
       --use-mcore-models \
       --micro-batch-size 2 \
       --global-batch-size 8 \
       --lr 0.00015 \
       --train-iters 50 \
       --lr-decay-iters 320000 \
       --lr-decay-style cosine \
       --min-lr 1.0e-5 \
       --weight-decay 1e-2 \
       --lr-warmup-fraction .01 \
       --clip-grad 1.0 \
       --seed 42

$ pip list | grep transformer
transformer-engine        1.5.0.dev0

$ bash examples/run_text_generation_server_345M.sh
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://10.32.113.152:5000
INFO:werkzeug:Press CTRL+C to quit
request IP: 10.32.113.152
{"prompts": ["I am"], "tokens_to_generate": 10}
start time:  2024-03-09 04:22:05.314579
INFO:werkzeug:10.32.113.152 - - [09/Mar/2024 04:22:10] "PUT /api HTTP/1.1" 200 -

$ python tools/text_generation_cli.py 10.32.113.152:5000
Enter prompt: I am
Enter number of tokens to generate: 10
Megatron Response: 
I am perennlington vehiclesigning protagonistlon Peng surreal nostalgia ignorant
Enter prompt:

1049451037 · 2024-03-13T08:50:41Z

You don't have the problem if you just run the example. Because the example inference does not use MCORE model. It just use the legacy model as you can see in model_provider.

1049451037 · 2024-03-13T08:52:07Z

@cyanguwa You may replace the model provider in text generation server with this to reproduce the error:

from megatron.core.models.gpt import GPTModel
import megatron.model
from megatron.training import get_model
from megatron.arguments import core_transformer_config_from_args
from megatron.text_generation_server import MegatronServer
from megatron.text_generation import generate_and_post_process
from megatron.text_generation import beam_search_and_post_process
import torch

from megatron.core.transformer.spec_utils import import_module
from megatron.core.models.gpt.gpt_layer_specs import get_gpt_layer_with_transformer_engine_spec

def model_provider(pre_process=True, post_process=True):
    """Build the model."""
    args = get_args()

    print_rank_0('building GPT model ...')
    config = core_transformer_config_from_args(get_args())

    if args.use_mcore_models:
        print("building megatron core model!!!!!!!!!!!!!!")
        if args.spec is not None:
            transformer_layer_spec = import_module(args.spec)
        else:
            transformer_layer_spec = get_gpt_layer_with_transformer_engine_spec(args.num_experts, args.moe_grouped_gemm)

        model = GPTModel(
            config=config,
            transformer_layer_spec=transformer_layer_spec,
            vocab_size=args.padded_vocab_size,
            max_sequence_length=args.max_position_embeddings,
            pre_process=pre_process,
            post_process=post_process,
            fp16_lm_cross_entropy=args.fp16_lm_cross_entropy,
            parallel_output=False,
            share_embeddings_and_output_weights=not args.untie_embeddings_and_output_weights,
            position_embedding_type=args.position_embedding_type,
            rotary_percent=args.rotary_percent,
        )
    else:
        print("building megatron legacy model!!!!!!!!!!!!!!")
        assert False, "Never do this!"
        assert(args.context_parallel_size == 1), "Context parallelism is only supported with Megatron Core!"

        model = megatron.model.GPTModel(
            config,
            num_tokentypes=0,
            parallel_output=False,
            pre_process=pre_process,
            post_process=post_process
        )
    return model

yaox12 · 2024-03-14T05:36:44Z

We're aware of this bug and will push a fix to MCore.
For now, you can add the following code in https://github.com/NVIDIA/Megatron-LM/blob/89574689447d694bb19dd86fc8a6153b4467ba9d/megatron/core/transformer/custom_layers/transformer_engine.py#L464

        # In PyTorch, the following two tensors are in fact the same:
        #   Tensor with shape (1, S, H, D) and stride (S*H*D, H*D, D, 1)
        #   Tensor with shape (1, S, H, D) and stride (H*D, H*D, D, 1)
        # We unify them to the first one to pass the stride check in TE
        if value.shape == key.shape and value.stride() != key.stride():
            value = value.as_strided(value.shape, key.stride())

1049451037 · 2024-03-14T05:56:35Z

No. This won't fix the bug. It makes inference normal, but makes training fail. (The loss of training cannot converge)

1049451037 · 2024-03-14T06:16:55Z

My solution for now is just adding the as_strided for inference, and comment out this line during training... Waiting for an official more elegant way to solve...

stgzr · 2024-03-14T06:43:36Z

Maybe it is the qkv_format, you can check the tensor format is sbhd or bshd.

1049451037 · 2024-03-14T07:12:19Z

sbhd, just the official training and inference code in Megatron.

ptrendx · 2024-05-16T19:14:04Z

I believe the MCore issue is fixed now, is that correct @yaox12? Can we close this issue?

1049451037 · 2024-05-17T03:21:47Z

Yes, the issue is fixed in the latest main branch of Megatron-LM.

1049451037 mentioned this issue Feb 26, 2024

Exception: The provided qkv memory layout is not supported! when using RoPE #455

Closed

ptrendx assigned cyanguwa Feb 26, 2024

1049451037 closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The provided qkv memory layout is not supported! When using RoPE #681

The provided qkv memory layout is not supported! When using RoPE #681

1049451037 commented Feb 26, 2024 •

edited

Loading

1049451037 commented Feb 27, 2024

cyanguwa commented Mar 1, 2024 •

edited

Loading

1049451037 commented Mar 1, 2024

stgzr commented Mar 5, 2024

cyanguwa commented Mar 9, 2024

1049451037 commented Mar 13, 2024 •

edited

Loading

1049451037 commented Mar 13, 2024 •

edited

Loading

yaox12 commented Mar 14, 2024

1049451037 commented Mar 14, 2024

1049451037 commented Mar 14, 2024

stgzr commented Mar 14, 2024

1049451037 commented Mar 14, 2024

ptrendx commented May 16, 2024

1049451037 commented May 17, 2024

The provided qkv memory layout is not supported! When using RoPE #681

The provided qkv memory layout is not supported! When using RoPE #681

Comments

1049451037 commented Feb 26, 2024 • edited Loading

1049451037 commented Feb 27, 2024

cyanguwa commented Mar 1, 2024 • edited Loading

1049451037 commented Mar 1, 2024

stgzr commented Mar 5, 2024

cyanguwa commented Mar 9, 2024

1049451037 commented Mar 13, 2024 • edited Loading

1049451037 commented Mar 13, 2024 • edited Loading

yaox12 commented Mar 14, 2024

1049451037 commented Mar 14, 2024

1049451037 commented Mar 14, 2024

stgzr commented Mar 14, 2024

1049451037 commented Mar 14, 2024

ptrendx commented May 16, 2024

1049451037 commented May 17, 2024

1049451037 commented Feb 26, 2024 •

edited

Loading

cyanguwa commented Mar 1, 2024 •

edited

Loading

1049451037 commented Mar 13, 2024 •

edited

Loading

1049451037 commented Mar 13, 2024 •

edited

Loading