Release v1.9 · oobabooga/text-generation-webui

Backend updates

4-bit and 8-bit kv cache options have been added to llama.cpp and llamacpp_HF. They reuse the existing --cache_8bit and --cache_4bit flags. Thanks @GodEmperor785 for figuring out what values to pass to llama-cpp-python.
Transformers:
- Add eager attention option to make Gemma-2 work correctly (#6188). Thanks @GralchemOz.
- Automatically detect bfloat16/float16 precision when loading models in 16-bit precision.
- Automatically apply eager attention to models with Gemma2ForCausalLM architecture.
- Gemma-2 support: Automatically detect and apply the optimal settings for this model with the two changes above. No need to set --bf16 --use_eager_attention manually.
Automatically obtain the EOT token from Jinja2 templates and add it to the stopping strings, fixing Llama-3-Instruct not stopping. No need to add <eot> to the custom stopping strings anymore.

Whisper STT overhaul: this extension has been rewritten, replacing the Gradio microphone component with a custom microphone element that is much more reliable (#6194). Thanks @RandomInternetPreson, @TimStrauven, and @mamei16.
Make the character dropdown menu coexist in the "Chat" tab and the "Parameters > Character" tab, after some people pointed out that moving it entirely to the Chat tab makes it harder to edit characters.
Colors in the light theme have been improved, making it a bit more aesthetic.
Increase the chat area on mobile devices.

Fix the API request to AUTOMATIC1111 in the sd-api-pictures extension.
Fix a glitch when switching tabs with "Show controls" unchecked in the chat tab and extensions loaded.