[Bug] Transformers doesn't recognize LLaVA variant architectures #2532

amosyou · 2024-12-20T11:21:51Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

when I try to load llava 1.6, I encounter an issue when loading the config file for llava variants. Transformers doesn't recognize the model_type field in config.json file

Reproduction

$ python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-mistral-7b --port 30000
[2024-12-20 03:14:40] server_args=ServerArgs(model_path='liuhaotian/llava-v1.6-mistral-7b', tokenizer_path='liuhaotian/llava-v1.6-mistral-7b', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='liuhaotian/llava-v1.6-mistral-7b', chat_template=None, is_embedding=False, revision=None, host='127.0.0.1', port=30000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, tp_size=1, stream_interval=1, random_seed=902724198, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='SGLang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False)
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.63k/1.63k [00:00<00:00, 13.0MB/s]
Traceback (most recent call last):
  File "/home/amosyou/sglang/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1038, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/home/amosyou/sglang/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 740, in __getitem__
    raise KeyError(key)
KeyError: 'llava_mistral'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/amosyou/.local/share/uv/python/cpython-3.10.14-linux-x86_64-gnu/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/amosyou/.local/share/uv/python/cpython-3.10.14-linux-x86_64-gnu/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/amosyou/sglang/python/sglang/launch_server.py", line 14, in <module>
    launch_server(server_args)
  File "/home/amosyou/sglang/python/sglang/srt/server.py", line 526, in launch_server
    launch_engine(server_args=server_args)
  File "/home/amosyou/sglang/python/sglang/srt/server.py", line 488, in launch_engine
    tokenizer_manager = TokenizerManager(server_args, port_args)
  File "/home/amosyou/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 110, in __init__
    self.model_config = ModelConfig(
  File "/home/amosyou/sglang/python/sglang/srt/configs/model_config.py", line 52, in __init__
    self.hf_config = get_config(
  File "/home/amosyou/sglang/python/sglang/srt/hf_transformers_utils.py", line 72, in get_config
    config = AutoConfig.from_pretrained(
  File "/home/amosyou/sglang/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1040, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `llava_mistral` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
/home/amosyou/.local/share/uv/python/cpython-3.10.14-linux-x86_64-gnu/lib/python3.10/multiprocessing/resource_tracker.py:104: UserWarning: resource_tracker: process died unexpectedly, relaunching.  Some resources might leak.
  warnings.warn('resource_tracker: process died unexpectedly, '
Traceback (most recent call last):
  File "/home/amosyou/.local/share/uv/python/cpython-3.10.14-linux-x86_64-gnu/lib/python3.10/multiprocessing/resource_tracker.py", line 209, in main
    cache[rtype].remove(name)
KeyError: '/mp-hdna7c5p'

Environment

Python: 3.10.14 (main, Aug 14 2024, 05:11:29) [Clang 18.1.8 ]
CUDA available: True
GPU 0,1: Tesla P100-PCIE-16GB
GPU 0,1 Compute Capability: 6.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.6, V12.6.85
CUDA Driver Version: 550.127.08
PyTorch: 2.5.1+cu124
sglang: 0.4.0.post1
flashinfer: 0.1.6+cu124torch2.4
triton: 3.1.0
transformers: 4.47.1
torchao: 0.7.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.8
huggingface_hub: 0.27.0
interegular: 0.3.3
modelscope: 1.21.0
orjson: 3.10.12
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.4
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.58.1
anthropic: 0.42.0
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PIX     PHB     PHB     SYS     SYS     SYS     SYS     0-7,16-23       0               N/A
GPU1    PIX      X      PHB     PHB     SYS     SYS     SYS     SYS     0-7,16-23       0               N/A
GPU2    PHB     PHB      X      PIX     SYS     SYS     SYS     SYS     0-7,16-23       0               N/A
GPU3    PHB     PHB     PIX      X      SYS     SYS     SYS     SYS     0-7,16-23       0               N/A
GPU4    SYS     SYS     SYS     SYS      X      PIX     PHB     PHB     8-15,24-31      1               N/A
GPU5    SYS     SYS     SYS     SYS     PIX      X      PHB     PHB     8-15,24-31      1               N/A
GPU6    SYS     SYS     SYS     SYS     PHB     PHB      X      PIX     8-15,24-31      1               N/A
GPU7    SYS     SYS     SYS     SYS     PHB     PHB     PIX      X      8-15,24-31      1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

ulimit soft: 1048576

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Transformers doesn't recognize LLaVA variant architectures #2532

[Bug] Transformers doesn't recognize LLaVA variant architectures #2532

amosyou commented Dec 20, 2024

[Bug] Transformers doesn't recognize LLaVA variant architectures #2532

[Bug] Transformers doesn't recognize LLaVA variant architectures #2532

Comments

amosyou commented Dec 20, 2024

Checklist

Describe the bug

Reproduction

Environment