[Bug] Same code and enviroment A800 succeed but A10 failed #2903

LaoWangGB · 2024-12-17T03:19:25Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

I use the same code and enviroment with different GPU(A10, A800), A800 succeed but A10 failed.
A800 didn't show the progress "Convert to turbomind format".

Reproduction

backend_config = TurbomindEngineConfig(dtype="float16", tp=1, session_len=4096)
gen_config = GenerationConfig(n=1,
top_p=1,
top_k=1,
temperature=0.0,
max_new_tokens=512,
do_sample=False,
random_seed=3407,
min_new_tokens=1,
stop_words=[self.tokenizer.eos_token,'<|im_end|>'],
skip_special_tokens=True)
model = pipeline(model_path="InternVL2_5-8B", backend_config=backend_config)

Environment

lmdeploy==0.6.4
transformers==4.46.3
timm==1.0.12
cuda 12.5
flash-attn-2.6.3
torch-2.4.0

Error traceback

/usr/local/lib/python3.10/dist-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
InternLM2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
[TM][WARNING] [LlamaTritonModel] `max_context_token_num` is not set, default to 4096.
2024-12-17 11:09:38,325 - lmdeploy - �[33mWARNING�[0m - turbomind.py:231 - get 227 model params
[WARNING] gemm_config.in is not found; using default GEMM algo

Convert to turbomind format:   0%|          | 0/32 [00:00<?, ?it/s]
Convert to turbomind format:   6%|▋         | 2/32 [00:00<00:01, 18.66it/s]
Convert to turbomind format:  16%|█▌        | 5/32 [00:00<00:01, 20.17it/s]
Convert to turbomind format:  25%|██▌       | 8/32 [00:00<00:01, 21.11it/s]
Convert to turbomind format:  34%|███▍      | 11/32 [00:00<00:01, 17.78it/s]
Convert to turbomind format:  44%|████▍     | 14/32 [00:00<00:00, 19.13it/s]
Convert to turbomind format:  53%|█████▎    | 17/32 [00:00<00:00, 20.13it/s]
Convert to turbomind format:  62%|██████▎   | 20/32 [00:01<00:00, 15.02it/s]
Convert to turbomind format:  69%|██████▉   | 22/32 [00:01<00:00, 12.11it/s]
Convert to turbomind format:  78%|███████▊  | 25/32 [00:01<00:00, 14.35it/s]
Convert to turbomind format:  88%|████████▊ | 28/32 [00:01<00:00, 16.24it/s]
Convert to turbomind format:  97%|█████████▋| 31/32 [00:01<00:00, 17.76it/s]
                                                                            
terminate called after throwing an instance of 'std::runtime_error'
  what():  [TM][ERROR] pointer_mapping_ does not have information of ptr at 0x7dc2a6000. Assertion fail: /lmdeploy/src/turbomind/utils/allocator.h:284

lzhangzz · 2024-12-17T09:09:15Z

Looks like OOM during warm-up. You may try to lower max_prefill_token_num or cache_max_entry_count on the A10.

LaoWangGB · 2024-12-18T09:05:20Z

Looks like OOM during warm-up. You may try to lower max_prefill_token_num or cache_max_entry_count on the A10.

Yes. this helps.

lvhan028 assigned lzhangzz Dec 17, 2024

lvhan028 added the awaiting response label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Same code and enviroment A800 succeed but A10 failed #2903

[Bug] Same code and enviroment A800 succeed but A10 failed #2903

LaoWangGB commented Dec 17, 2024 •

edited

Loading

lzhangzz commented Dec 17, 2024

LaoWangGB commented Dec 18, 2024

[Bug] Same code and enviroment A800 succeed but A10 failed #2903

[Bug] Same code and enviroment A800 succeed but A10 failed #2903

Comments

LaoWangGB commented Dec 17, 2024 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lzhangzz commented Dec 17, 2024

LaoWangGB commented Dec 18, 2024

LaoWangGB commented Dec 17, 2024 •

edited

Loading