Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exllamav2 tokenizer kwargs are not used #1322

Open
cpfiffer opened this issue Dec 5, 2024 · 0 comments
Open

Exllamav2 tokenizer kwargs are not used #1322

cpfiffer opened this issue Dec 5, 2024 · 0 comments

Comments

@cpfiffer
Copy link
Contributor

cpfiffer commented Dec 5, 2024

Several of the kwargs in the docstring for the exllamav2 inference engine do not seem to be in use,

def exl2(
model_path: str,
draft_model_path: Optional[str] = None,
max_seq_len: Optional[int] = None,
cache_q4: bool = False,
paged: bool = True,
max_chunk_size: Optional[int] = None,
) -> ExLlamaV2Model:
"""
Load an ExLlamaV2 model.
Parameters
----------
model_path (str)
Path to the model directory.
device (str)
Device to load the model on. Pass in 'cuda' for GPU or 'cpu' for CPU
max_seq_len (Optional[int], optional)
Maximum sequence length. Defaults to None.
scale_pos_emb (Optional[float], optional)
Scale factor for positional embeddings. Defaults to None.
scale_alpha_value (Optional[float], optional)
Scale alpha value. Defaults to None.
no_flash_attn (Optional[bool], optional)
Disable flash attention. Defaults to None.
num_experts_per_token (Optional[int], optional)
Number of experts per token. Defaults to None.
cache_q4 (bool, optional)
Use Q4 cache. Defaults to False.
tokenizer_kwargs (dict, optional)
Additional keyword arguments for the tokenizer. Defaults to {}.
gpu_split (str)
\"auto\", or VRAM allocation per GPU in GB. Auto will use exllama's autosplit feature
low_mem (bool, optional)
Enable VRAM optimizations, potentially trading off speed
verbose (bool, optional)
Enable if you want debugging statements
Returns
-------
An `ExLlamaV2Model` instance.
Raises
------
`ImportError` if the `exllamav2` library is not installed.
"""

The following kwargs do not seem to be used, but are mentioned in the doc string:

  • tokenizer_kwargs
  • scale_pos_emb
  • scale_alpha_value
  • no_flash_attention
  • num_experts_per_token
  • gpu_split
  • low_mem
  • verbose

Used but not documented:

  • draft_model_path
  • paged
  • max_chunk_size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant