Skip to content

v1.8

Compare
Choose a tag to compare
@oobabooga oobabooga released this 27 Jun 02:38
· 35 commits to main since this release
6915c50

Releases with version numbers are back! The last one was v1.7 in October 8th, 2023, so I am calling this one v1.8.

From this release on, it will be possible to install past releases by downloading the .zip source and running the start_ script in it. The installation script no longer updates to the latest version automatically. This doesn't apply to snapshots/releases before this one.

New backend

UI updates

  • Improved "past chats" menu: this menu is now a vertical list of text items instead of a dropdown menu, making it a lot easier to switch between past conversations. Only one click is required instead of two.
  • Store the chat history in the browser: if you restart the server and do not refresh the browser, your conversation will not be accidentally erased anymore.
  • Avoid some unnecessary calls to the backend, making the UI faster and more responsive.
  • Move the "Character" droprown menu to the main Chat tab, to make it faster to switch between different characters.
  • Change limits of RoPE scaling sliders in UI (#6142). Thanks @GodEmperor785.
  • Do not expose "alpha_value" for llama.cpp and "rope_freq_base" for transformers to keep things simple and avoid conversions.
  • Remove an obsolete info message intended for GPTQ-for-LLaMa.
  • Remove the "Tab" shortcut to switch between the generation tabs and the "Parameter" tabs, as it was awkward.
  • Improved streaming of lists, which would flicker and temporarily display horizontal lines sometimes.

Bug fixes

  • Revert the reentrant generation lock to a simple lock, fixing an issue caused by the change.
  • Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119). Thanks @Ph0rk0z.
  • Fix "500 error" issue caused by block_requests.py (#5976). Thanks @nero-dv.
  • Setting default alpha_value and fixing loading some newer DeepSeekCoder GGUFs (#6111). Thanks @mefich.

Library updates

  • llama-cpp-python: bump to 0.2.79 (after a month of wrestling with GitHub Actions).
  • ExLlamaV2: bump to 0.1.6.
  • flash-attention: bump to 2.5.9.post1.
  • PyTorch: bump to 2.2.2. That's the last 2.2 patch version.
  • HQQ: bump to 0.1.7.post3. Makes HQQ functional again.

Other updates

  • Do not "git pull" during installation, allowing previous releases (from this one on) to be installed.
  • Make logs more readable, no more \u7f16\u7801 (#6127). Thanks @Touch-Night.

Support this project