huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1.1k
Star 9.8k

Code
Issues 206
Pull requests 28
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

206 Open 1,248 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

VRAM usage increases in version 3.1.0

#3038 opened Feb 19, 2025 by aW3st

2 of 4 tasks

TGI metrics don't have model_name label to indicate which model the metrics belong to

#3026 opened Feb 17, 2025 by yashaswipiplani

NIX: text_generation_launcher::gpu: Cannot determine GPU compute capability: ImportError: libffi.so.8

#3025 opened Feb 14, 2025 by celsowm

1 of 4 tasks

Unsupported model type xlm-roberta

#3020 opened Feb 13, 2025 by elvizlai

2 of 4 tasks

Warmup fails with Google Flan T5 models

#3019 opened Feb 13, 2025 by TomerG711

2 of 4 tasks

WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090

#3014 opened Feb 11, 2025 by bmilesp

Resource underutilization, thread thrashing: CPU affinity ignores allowed CPUs and cannot be switched off

#3011 opened Feb 11, 2025 by askervin

3 of 4 tasks

Warmup fails for Qwen 2 VL on AMD

#3009 opened Feb 11, 2025 by almersawi

1 of 4 tasks

Quantized BNB-4bit models are not working.

#3005 opened Feb 10, 2025 by v3ss0n

2 of 4 tasks

Nonsense responses with n-gram speculative decoding

#2997 opened Feb 6, 2025 by olliestanley

1 of 4 tasks

Request failed during generation: Server error: Value out of range: -29146814772

#2994 opened Feb 5, 2025 by AlperYildirim1

2 of 4 tasks

Mistral Small 3 : chat template with python functions causes error

#2987 opened Feb 3, 2025 by v3ss0n

2 tasks done

Function/tool calling never resolves

#2986 opened Feb 2, 2025 by awmartin

2 of 4 tasks

Error: "new batch size should not exceed padded batch size" when running latest Docker container and sending multiple requests simultaneously

#2985 opened Feb 2, 2025 by BradyBonnette

2 of 4 tasks

Default TGI Inference parameter values

#2978 opened Jan 31, 2025 by ashwincv0112

2 of 4 tasks

no prefill when decoder_input_details=True from InferenceClient

#2973 opened Jan 30, 2025 by lifeng-jin

2 of 4 tasks

Incorrect Tokenization Likely Because of Diacritics in OpenChat and LLaMA 3.2 (TGI v3.0.2 and v2.2.0)

#2969 opened Jan 30, 2025 by biba10

2 of 4 tasks

Structured output doesn't work with open ai endpoint

#2959 opened Jan 27, 2025 by Stealthwriter

2 of 4 tasks

How to give custom model code for TGI to run.

#2956 opened Jan 27, 2025 by ashwani-bhat

Running Qwen2-VL-2B-Instruct on TGI is giving an error

#2955 opened Jan 27, 2025 by ashwani-bhat

2 of 4 tasks

CUDA Out of memory when using the benchmarking tool with batch size greater than 1

#2952 opened Jan 24, 2025 by mborisov-bi

3 of 4 tasks

Serverless Inference API OpenAI /v1/chat/completions route broken

#2946 opened Jan 23, 2025 by pelikhan

1 of 4 tasks

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model

#2944 opened Jan 23, 2025 by edesalve

2 of 4 tasks

Allow specifying adapter_id on chat/completions requests

#2939 opened Jan 22, 2025 by tsvisab

Remove Conda from Docker Installation

#2934 opened Jan 21, 2025 by rjmehta1993

2 of 4 tasks

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly