Skip to content

v1.9

Compare
Choose a tag to compare
@oobabooga oobabooga released this 05 Jul 03:24
· 15 commits to main since this release
3315d00

Backend updates

  • 4-bit and 8-bit kv cache options have been added to llama.cpp and llamacpp_HF. They reuse the existing --cache_8bit and --cache_4bit flags. Thanks @GodEmperor785 for figuring out what values to pass to llama-cpp-python.
  • Transformers:
    • Add eager attention option to make Gemma-2 work correctly (#6188). Thanks @GralchemOz.
    • Automatically detect bfloat16/float16 precision when loading models in 16-bit precision.
    • Automatically apply eager attention to models with Gemma2ForCausalLM architecture.
    • Gemma-2 support: Automatically detect and apply the optimal settings for this model with the two changes above. No need to set --bf16 --use_eager_attention manually.
  • Automatically obtain the EOT token from Jinja2 templates and add it to the stopping strings, fixing Llama-3-Instruct not stopping. No need to add <eot> to the custom stopping strings anymore.

UI updates

  • Whisper STT overhaul: this extension has been rewritten, replacing the Gradio microphone component with a custom microphone element that is much more reliable (#6194). Thanks @RandomInternetPreson, @TimStrauven, and @mamei16.
  • Make the character dropdown menu coexist in the "Chat" tab and the "Parameters > Character" tab, after some people pointed out that moving it entirely to the Chat tab makes it harder to edit characters.
  • Colors in the light theme have been improved, making it a bit more aesthetic.
  • Increase the chat area on mobile devices.

Bug fixes

  • Fix the API request to AUTOMATIC1111 in the sd-api-pictures extension.
  • Fix a glitch when switching tabs with "Show controls" unchecked in the chat tab and extensions loaded.

Library updates

  • llama-cpp-python: bump to 0.2.81 (adds Gemma-2 support).
  • Transformers: bump to 4.42 (adds Gemma-2 support).

Support