Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Same code and enviroment A800 succeed but A10 failed #2903

Open
3 tasks done
LaoWangGB opened this issue Dec 17, 2024 · 2 comments
Open
3 tasks done

[Bug] Same code and enviroment A800 succeed but A10 failed #2903

LaoWangGB opened this issue Dec 17, 2024 · 2 comments
Assignees

Comments

@LaoWangGB
Copy link

LaoWangGB commented Dec 17, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

48f04ac153cd73ce0f3d0b9b418332714c8df3bf_knock_capture_image
I use the same code and enviroment with different GPU(A10, A800), A800 succeed but A10 failed.
A800 didn't show the progress "Convert to turbomind format".

Reproduction

backend_config = TurbomindEngineConfig(dtype="float16", tp=1, session_len=4096)
gen_config = GenerationConfig(n=1,
top_p=1,
top_k=1,
temperature=0.0,
max_new_tokens=512,
do_sample=False,
random_seed=3407,
min_new_tokens=1,
stop_words=[self.tokenizer.eos_token,'<|im_end|>'],
skip_special_tokens=True)
model = pipeline(model_path="InternVL2_5-8B", backend_config=backend_config)

Environment

lmdeploy==0.6.4
transformers==4.46.3
timm==1.0.12
cuda 12.5
flash-attn-2.6.3
torch-2.4.0

Error traceback

/usr/local/lib/python3.10/dist-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
InternLM2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
[TM][WARNING] [LlamaTritonModel] `max_context_token_num` is not set, default to 4096.
2024-12-17 11:09:38,325 - lmdeploy - �[33mWARNING�[0m - turbomind.py:231 - get 227 model params
[WARNING] gemm_config.in is not found; using default GEMM algo

Convert to turbomind format:   0%|          | 0/32 [00:00<?, ?it/s]
Convert to turbomind format:   6%|▋         | 2/32 [00:00<00:01, 18.66it/s]
Convert to turbomind format:  16%|█▌        | 5/32 [00:00<00:01, 20.17it/s]
Convert to turbomind format:  25%|██▌       | 8/32 [00:00<00:01, 21.11it/s]
Convert to turbomind format:  34%|███▍      | 11/32 [00:00<00:01, 17.78it/s]
Convert to turbomind format:  44%|████▍     | 14/32 [00:00<00:00, 19.13it/s]
Convert to turbomind format:  53%|█████▎    | 17/32 [00:00<00:00, 20.13it/s]
Convert to turbomind format:  62%|██████▎   | 20/32 [00:01<00:00, 15.02it/s]
Convert to turbomind format:  69%|██████▉   | 22/32 [00:01<00:00, 12.11it/s]
Convert to turbomind format:  78%|███████▊  | 25/32 [00:01<00:00, 14.35it/s]
Convert to turbomind format:  88%|████████▊ | 28/32 [00:01<00:00, 16.24it/s]
Convert to turbomind format:  97%|█████████▋| 31/32 [00:01<00:00, 17.76it/s]
                                                                            
terminate called after throwing an instance of 'std::runtime_error'
  what():  [TM][ERROR] pointer_mapping_ does not have information of ptr at 0x7dc2a6000. Assertion fail: /lmdeploy/src/turbomind/utils/allocator.h:284
@lzhangzz
Copy link
Collaborator

Looks like OOM during warm-up. You may try to lower max_prefill_token_num or cache_max_entry_count on the A10.

@LaoWangGB
Copy link
Author

Looks like OOM during warm-up. You may try to lower max_prefill_token_num or cache_max_entry_count on the A10.

Yes. this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants