Skip to content

0.8.0

Compare
Choose a tag to compare
@CheshireCC CheshireCC released this 03 Jun 14:49
· 1 commit to main since this release

0.8.0

HASH

  • CRC32:13530B56
  • MD5:61ED9AF5F712A27DA32EE34A8E88A5A0
  • SHA-1:B415E721D6C54D5E835FD5BD4AD870DA985644AC

0.8.0 改动

  • 修复没有赞助渠道的 bug #126

  • 升级 faster-whisper 到 1.02 版本

    • 添加 distil-large-v3 模型在线模式支持 #130

      • 最新的 Distil-Whisper 模型 distil-large-v3 本质上是为与 OpenAI 顺序算法配合使用而设计的。
    • 支持初始化更多 whisper 模型参数

      • 音频分段设置

        • max_new_tokens: 每个区块生成的新令牌的最大数量。如果未设置,最大值将通过默认的 max_size 设置。

        • chunk_length: 音频段的长度。如果不是 None,它将覆盖 FeatureExtractor 的默认chunk_size

        • clip_timestamps: 逗号分隔的要处理的剪辑的时间戳列表(以秒为单位)开始,结束,开始,结束......。最后一个结束时间戳默认为文件的结束。如果使用 clip_timestamps,将忽略 VAD 设置。

      • 幻听参数

        • hallucination_silence_threshold: 当 word_timestamps True 时,当检测到可能的幻觉时,跳过长于此阈值(以秒为单位)的静默期。
      • 其他设置

        • hotwords: 为模型提供的热词/提示短语。如果 prefix 不是 None,则无效。 你可以输入提示词,类似于:“the video is about comfyUI”。
      • 常规

        • language_detection_threshold: 如果语言标记的最大概率高于此值,则会检测为该语言。

        • language_detection_segments: 语言检测需要考虑的分段数量。

      • 其他新特性:https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.0.2

  • 修复 复制字幕 功能的 bug

  • 更新一些 UI 文字

  • 停用 转写参数 页面的 保存参数、读取参数 功能

  • 起止时间、说话人 列居中显示

  • 升级 pytorch 到 2.3.0 , CUDA12

提示

  • 软件需要完全卸载旧版之后安装新版(cache文件夹可不做清理)
  • 需要安装 ffmpeg
  • 使用 V3 模型时,如果频繁出现显存溢出,请尝试更新显卡驱动程序到最新或者回退到上一个稳定版本,当前版本(2024.5.29)测试结果稳定。

0.8.0 Changes

  • Fixed bug with no sponsorship channels #126

  • Upgrade faster-whisper to version 1.02

    • Add online mode support for the [distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3-ct2) model #130

      • The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.
    • Support initializing more whisper model args

      • max_new_tokens: Maximum number of new tokens to generate per-chunk. If not set, the maximum will be set by the default max_length.

      • chunk_length: The length of audio segments. If it is not None, it will overwrite the default chunk_length of the FeatureExtractor.

      • clip_timestamps: Comma-separated list start,end,start,end,... timestamps (in seconds) of clips to process. The last end timestamp defaults to the end of the file.vad_filter will be ignored if clip_timestamps is used.

      • hallucination_silence_threshold: When word_timestamps is True, skip silent periods longer than this threshold (in seconds) when a possible hallucination is detected

      • hotwords: Hotwords/hint phrases to provide the model with. Has no effect if prefix is not None.

      • language_detection_threshold: If the maximum probability of the language tokens is higher than this value, the language is detected.

      • language_detection_segments: Number of segments to consider for the language detection.

    • fixed bug of copy subtitles

    • Update some UI text

    • Disable the functions of saving parameters and reading parameters on the transfer parameter page

    • Start and end times and speaker columns are displayed in the center

    • Upgrade pytorch to 2.3.0 , CUDA12

tips

  • the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
  • ffmpeg is required to be installed
  • When using the V3 model, if memory overflows frequently, please try updating the graphics card driver to the latest or fallback to the previous stable version. The test results of the current version (2024.5.29) are stable.