You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
I use the same code and enviroment with different GPU(A10, A800), A800 succeed but A10 failed.
A800 didn't show the progress "Convert to turbomind format".
lmdeploy==0.6.4
transformers==4.46.3
timm==1.0.12
cuda 12.5
flash-attn-2.6.3
torch-2.4.0
Error traceback
/usr/local/lib/python3.10/dist-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
InternLM2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions. - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception). - If you are not the owner of the model architecture class, please contact the model code owner to update it.[TM][WARNING] [LlamaTritonModel] `max_context_token_num` is not set, default to 4096.2024-12-17 11:09:38,325 - lmdeploy - �[33mWARNING�[0m - turbomind.py:231 - get 227 model params[WARNING] gemm_config.in is not found; using default GEMM algoConvert to turbomind format: 0%| | 0/32 [00:00<?, ?it/s]Convert to turbomind format: 6%|▋ | 2/32 [00:00<00:01, 18.66it/s]Convert to turbomind format: 16%|█▌ | 5/32 [00:00<00:01, 20.17it/s]Convert to turbomind format: 25%|██▌ | 8/32 [00:00<00:01, 21.11it/s]Convert to turbomind format: 34%|███▍ | 11/32 [00:00<00:01, 17.78it/s]Convert to turbomind format: 44%|████▍ | 14/32 [00:00<00:00, 19.13it/s]Convert to turbomind format: 53%|█████▎ | 17/32 [00:00<00:00, 20.13it/s]Convert to turbomind format: 62%|██████▎ | 20/32 [00:01<00:00, 15.02it/s]Convert to turbomind format: 69%|██████▉ | 22/32 [00:01<00:00, 12.11it/s]Convert to turbomind format: 78%|███████▊ | 25/32 [00:01<00:00, 14.35it/s]Convert to turbomind format: 88%|████████▊ | 28/32 [00:01<00:00, 16.24it/s]Convert to turbomind format: 97%|█████████▋| 31/32 [00:01<00:00, 17.76it/s]terminate called after throwing an instance of 'std::runtime_error' what(): [TM][ERROR] pointer_mapping_ does not have information of ptr at 0x7dc2a6000. Assertion fail: /lmdeploy/src/turbomind/utils/allocator.h:284
The text was updated successfully, but these errors were encountered:
Checklist
Describe the bug
I use the same code and enviroment with different GPU(A10, A800), A800 succeed but A10 failed.
A800 didn't show the progress "Convert to turbomind format".
Reproduction
backend_config = TurbomindEngineConfig(dtype="float16", tp=1, session_len=4096)
gen_config = GenerationConfig(n=1,
top_p=1,
top_k=1,
temperature=0.0,
max_new_tokens=512,
do_sample=False,
random_seed=3407,
min_new_tokens=1,
stop_words=[self.tokenizer.eos_token,'<|im_end|>'],
skip_special_tokens=True)
model = pipeline(model_path="InternVL2_5-8B", backend_config=backend_config)
Environment
Error traceback
The text was updated successfully, but these errors were encountered: