Release v1.8 · oobabooga/text-generation-webui

Releases with version numbers are back! The last one was v1.7 in October 8th, 2023, so I am calling this one v1.8.

From this release on, it will be possible to install past releases by downloading the .zip source and running the start_ script in it. The installation script no longer updates to the latest version automatically. This doesn't apply to snapshots/releases before this one.

New backend

Add TensorRT-LLM support.
- That's now the fastest backend in the project.
- It currently has to be installed in a separate Python 3.10 environment.
- A Dockerfile is provided.
- For instructions on how to convert models, consult #5715 and https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md.

UI updates

Improved "past chats" menu: this menu is now a vertical list of text items instead of a dropdown menu, making it a lot easier to switch between past conversations. Only one click is required instead of two.
Store the chat history in the browser: if you restart the server and do not refresh the browser, your conversation will not be accidentally erased anymore.
Avoid some unnecessary calls to the backend, making the UI faster and more responsive.
Move the "Character" droprown menu to the main Chat tab, to make it faster to switch between different characters.
Change limits of RoPE scaling sliders in UI (#6142). Thanks @GodEmperor785.
Do not expose "alpha_value" for llama.cpp and "rope_freq_base" for transformers to keep things simple and avoid conversions.
Remove an obsolete info message intended for GPTQ-for-LLaMa.
Remove the "Tab" shortcut to switch between the generation tabs and the "Parameter" tabs, as it was awkward.
Improved streaming of lists, which would flicker and temporarily display horizontal lines sometimes.

Bug fixes

Revert the reentrant generation lock to a simple lock, fixing an issue caused by the change.
Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119). Thanks @Ph0rk0z.
Fix "500 error" issue caused by block_requests.py (#5976). Thanks @nero-dv.
Setting default alpha_value and fixing loading some newer DeepSeekCoder GGUFs (#6111). Thanks @mefich.

Library updates

llama-cpp-python: bump to 0.2.79 (after a month of wrestling with GitHub Actions).
ExLlamaV2: bump to 0.1.6.
flash-attention: bump to 2.5.9.post1.
PyTorch: bump to 2.2.2. That's the last 2.2 patch version.
HQQ: bump to 0.1.7.post3. Makes HQQ functional again.

Other updates

Do not "git pull" during installation, allowing previous releases (from this one on) to be installed.
Make logs more readable, no more \u7f16\u7801 (#6127). Thanks @Touch-Night.

Support this project

Become a GitHub Sponsor ❤️
Buy me a ko-fi ☕

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.8