Skip to content

Releases: CheshireCC/faster-whisper-GUI

0.8.0

03 Jun 14:49
Compare
Choose a tag to compare

0.8.0

HASH

  • CRC32:13530B56
  • MD5:61ED9AF5F712A27DA32EE34A8E88A5A0
  • SHA-1:B415E721D6C54D5E835FD5BD4AD870DA985644AC

0.8.0 改动

  • 修复没有赞助渠道的 bug #126

  • 升级 faster-whisper 到 1.02 版本

    • 添加 distil-large-v3 模型在线模式支持 #130

      • 最新的 Distil-Whisper 模型 distil-large-v3 本质上是为与 OpenAI 顺序算法配合使用而设计的。
    • 支持初始化更多 whisper 模型参数

      • 音频分段设置

        • max_new_tokens: 每个区块生成的新令牌的最大数量。如果未设置,最大值将通过默认的 max_size 设置。

        • chunk_length: 音频段的长度。如果不是 None,它将覆盖 FeatureExtractor 的默认chunk_size

        • clip_timestamps: 逗号分隔的要处理的剪辑的时间戳列表(以秒为单位)开始,结束,开始,结束......。最后一个结束时间戳默认为文件的结束。如果使用 clip_timestamps,将忽略 VAD 设置。

      • 幻听参数

        • hallucination_silence_threshold: 当 word_timestamps True 时,当检测到可能的幻觉时,跳过长于此阈值(以秒为单位)的静默期。
      • 其他设置

        • hotwords: 为模型提供的热词/提示短语。如果 prefix 不是 None,则无效。 你可以输入提示词,类似于:“the video is about comfyUI”。
      • 常规

        • language_detection_threshold: 如果语言标记的最大概率高于此值,则会检测为该语言。

        • language_detection_segments: 语言检测需要考虑的分段数量。

      • 其他新特性:https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.0.2

  • 修复 复制字幕 功能的 bug

  • 更新一些 UI 文字

  • 停用 转写参数 页面的 保存参数、读取参数 功能

  • 起止时间、说话人 列居中显示

  • 升级 pytorch 到 2.3.0 , CUDA12

提示

  • 软件需要完全卸载旧版之后安装新版(cache文件夹可不做清理)
  • 需要安装 ffmpeg
  • 使用 V3 模型时,如果频繁出现显存溢出,请尝试更新显卡驱动程序到最新或者回退到上一个稳定版本,当前版本(2024.5.29)测试结果稳定。

0.8.0 Changes

  • Fixed bug with no sponsorship channels #126

  • Upgrade faster-whisper to version 1.02

    • Add online mode support for the [distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3-ct2) model #130

      • The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.
    • Support initializing more whisper model args

      • max_new_tokens: Maximum number of new tokens to generate per-chunk. If not set, the maximum will be set by the default max_length.

      • chunk_length: The length of audio segments. If it is not None, it will overwrite the default chunk_length of the FeatureExtractor.

      • clip_timestamps: Comma-separated list start,end,start,end,... timestamps (in seconds) of clips to process. The last end timestamp defaults to the end of the file.vad_filter will be ignored if clip_timestamps is used.

      • hallucination_silence_threshold: When word_timestamps is True, skip silent periods longer than this threshold (in seconds) when a possible hallucination is detected

      • hotwords: Hotwords/hint phrases to provide the model with. Has no effect if prefix is not None.

      • language_detection_threshold: If the maximum probability of the language tokens is higher than this value, the language is detected.

      • language_detection_segments: Number of segments to consider for the language detection.

    • fixed bug of copy subtitles

    • Update some UI text

    • Disable the functions of saving parameters and reading parameters on the transfer parameter page

    • Start and end times and speaker columns are displayed in the center

    • Upgrade pytorch to 2.3.0 , CUDA12

tips

  • the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
  • ffmpeg is required to be installed
  • When using the V3 model, if memory overflows frequently, please try updating the graphics card driver to the latest or fallback to the previous stable version. The test results of the current version (2024.5.29) are stable.

0.7.6

17 Apr 04:24
Compare
Choose a tag to compare

0.7.6

0.7.6 改动

  • 修复转写结束后崩溃的 bug #111

  • 修复手动添加多个字幕后表格不能关闭的 bug

  • Demucs 新增非人声音轨合并输出 #110

    • 新增人声、其他音轨二分输出
  • 字幕显示及编辑相关功能更新

    • 字幕编辑:添加批量增减时间戳功能
    • 字幕表格显示:
      • 添加持续时间过短的时间戳背景色提示功能
      • 说话人 在表格中单独显示为一列

Hash

  • SHA-1 : 37C8C46BE3D297AD06FA4C887A69E2FB46CB49AB
  • MD5 : 09681381F2AF06749BB70030A411DFB6
  • CRC32 : D6FA10B2

提示

  • 软件需要完全卸载旧版之后安装新版(cache文件夹可不做清理)
  • 需要安装 ffmpeg
  • 使用 V3 模型时,如果显存溢出,请尝试关闭 单词级时间戳 ,如果仍然溢出,那么请将量化方式更改为 float16 或者 int8

0.7.6 Changes

  • Fixed bug that crashed after transcriptions ended #111

  • Fixed a bug where manually adding multiple words behind the scenes table cannot be closed

  • Demucs adds combined output of non-vocal tracks #110

    • Added dichotomy output of vocals and other audio tracks
  • Updated functions related to subtitle display and editing

    • Subtitle editing: Add batch advance and delay timestamp function

    • The subtitle table shows:

      • Add a prompt function for background color of a timestamp with too short duration

      • Speakers are displayed as a separate column in the table

Hash

  • SHA-1 : 37C8C46BE3D297AD06FA4C887A69E2FB46CB49AB
  • MD5 : 09681381F2AF06749BB70030A411DFB6
  • CRC32 : D6FA10B2

tips

  • the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
  • ffmpeg is required to be installed
  • When using the V3 model, if the memory of GPU overflows, try turning off the word-level timestamp. If it still overflows, change the quantization method to float16 or int8

0.7.2

29 Mar 05:08
Compare
Choose a tag to compare

0.7.2

0.7.2 改动

  • 修正界面翻译不彻底的问题 #106
  • 修复添加表格的逻辑 bug
  • 修复 whisperX 不能批量处理的 bug
  • 精简安装包大小

提示

  • 软件需要完全卸载旧版之后安装新版(cache文件夹可不做清理)
  • 需要安装 ffmpeg
  • 转写结束之后或许存在不稳定的崩溃状况,如果转写结束之后崩溃,请关闭转写完成自动跳转功能,并在转写结束之后稍等片刻再点击跳转到结果页面

0.7.2 Changes

  • Fixed the problem of incomplete interface translation #106

  • Fix logic bug in adding tables

  • Fix WhisperX bug that can't be processed in batches

  • Thin Package Size

tips

  • the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
  • ffmpeg is required to be installed
  • -there may be an unstable crash condition after the end of processing. if the crash occurs after the transcript ends, please turn off the automatic jump function, and wait a moment after process, and then click to jump to the result page.

0.7.0

19 Mar 18:23
Compare
Choose a tag to compare

0.7.0

0.7.0 改动

  • json 格式字幕支持
    • 支持使用 json 格式保存字幕及单词级时间戳
  • ass 格式的支持
    • 支持输出 ass 格式字幕文件,执行标准为:ssa v4.00+
  • 读取 json 格式的字幕文件
    • json 格式作为自动读取的首选格式
    • 支持从 json 格式字幕文件中读取字幕及单词级时间戳
  • 修复标签关闭但表格不被删除的 bug
  • 修复 smi 格式的一些 bug

提示

  • 软件需要完全卸载旧版之后安装新版(cache文件夹可不做清理)
  • 需要安装 ffmpeg
  • 转写结束之后或许存在不稳定的崩溃状况,如果转写结束之后崩溃,请关闭转写完成自动跳转功能,并在转写结束之后稍等片刻再点击跳转到结果页面

0.7.0 Changes

  • json format subtitle output
    • we can use json file to save subtitles and word-level timestamp now
  • read subtitles from excited json file
    • json format as the preferred format for automatic reading
    • we can read subtitles and word-level timestamp from json file now
  • fixed bug that tab closed but tables don't be deleted
  • fixed bugs of smi format subtitle

tips

  • the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
  • ffmpeg is required to be installed
  • -there may be an unstable crash condition after the end of processing. if the crash occurs after the transcript ends, please turn off the automatic jump function, and wait a moment after process, and then click to jump to the result page.

0.6.7

07 Mar 09:31
Compare
Choose a tag to compare

0.6.7

0.6.7 改动

  • 增加相同说话人字幕内容聚合功能 #82

    • 参数页面增加相关设置项
    • 输出 txt 格式字幕时,可以那顺序将相同说话人的说话内容聚合在一起。
  • 数据标注功能 #78

    • 按照 vocal_path,speaker_name,language,text 格式输出标注信息到 csv 文件
  • 修复文件目录带有空格造成的 bug #71

    • 暂时修复带有空格的文件目录被强制去除目录造成的问题
  • 修复 whisperX 参数的 bug

    • 修复 min_speakersmax_speakers 参数设置异常的 bug
  • 字幕戳编辑功能进一步改进

    • 新增右键菜单批量修改说话人的功能
    • 新增右键菜单合并字幕语句的功能
  • 简繁体转换问题 #77

    • 语言 参数新增 简体中文——zhs-Simplified Chinese 和 繁体中文——zht-Traditional Chinese 选项
    • 转写结束之后将会自动转换简繁体
    • 打开已存在的字幕文件将会自动转换简繁体
  • 修复单元格列宽 bug

    • 修正单元格列宽逻辑
    • 修正自适应列宽

提示

  • 软件需要完全卸载旧版之后安装新版(cache文件夹可不做清理)

0.6.7 Changes

  • add the same speaker subtitle content aggregation function #82

    • add related setting items to the parameters page

    • when outputting subtitles in txt format, the words of the same speaker can be grouped together in that order.

  • data annotation function #78

    • output the annotation information to the csv file in the format of vocal_path,speaker_name,language,text
  • fixed bug caused by spaces in the file directory #71

    • temporarily fix the problem caused by the forced removal of file directories with spaces
  • repair the bug of the whisperX parameter

    • fixed bug with abnormal setting of min_ Secreters and max_ Secreters parameters
  • subtitle stamp editing function is further improved

    • added right-click menu to modify the speaker in batches
    • added the function of right-click menu merging subtitle statements
  • conversion between simplified and traditional Chinese #77

    • added simplified Chinese-zht-Traditional Chinese and traditional Chinese-zht-Traditional Chinese options for Language parameters

    • the simplified and traditional Chinese will be converted automatically after the transliteration is finished.

    • opening an existing subtitle file will automatically convert simplified and traditional Chinese

  • repair cell grid width bug

    • modified cell lattice width logic

    • modified adaptive column width

tips

  • the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)

0.6.0

03 Mar 16:37
Compare
Choose a tag to compare

0.6.0

0.6.0 改动

  • 为 whisperX 添加 粤语 模型
    • Whisper 模型参数中,"语言" 选择 yue ,实现粤语转写,输出结果为粤语口语,非中文书面语
    • wshiperX 可以在 粤语 模式下进行时间戳对齐了
  • 修复包含字幕轨的音视频文件识别失败的问题 #90
  • 添加对 dilist 模型的支持
    • 升级 faster-whisper 后端到 1.0.1 版本
    • 升级 Ctranslate24.0.0 版本
  • 升级 pytorchcuda 引擎
    • 升级 pytorch2.2.1
    • 升级 CUDA 引擎支持 12.1
  • 更新弹窗自动关闭逻辑
    • “成功”提示弹窗将会在 5 秒后自动关闭

提示

  • dilist 模型暂时只支持英语输出

0.6.0 Changes

  • add a Cantonese model to whisperX

    • in the parameters of the Whisper model, select yue for "language" to achieve Cantonese rewriting. The output result is spoken Cantonese and not Chinese written language.

    • wshiperX can be time stamped in Cantonese mode.

  • fixed failure in audio and video file recognition with word screen tracks #90

  • add support for the dilist model

    • upgrade the backend of faster- roomper to version 1.0.1
    • upgrade version Ctranslate2 to version 4.0.0
  • upgrade pytorch and cuda engines

    • upgrade pytorch to 2.2.1

    • upgrade CUDA engine to support 12.1

  • update pop-up window automatic closing logic

    • the "success" prompt will automatically close after 5 seconds.

Tips

  • dilist model only output with English language.

0.5.7

11 Jan 10:20
Compare
Choose a tag to compare

0.5.7

0.5.7 改动

  • 修复关闭表格时,当前转写结果不更新的 bug

  • 文件列表功能更新 #66

    • 添加从剪贴板读取、粘贴文件名到文件列表的功能
    • 文件列表一键清除功能
    • 完善文件列表多选时移除文件的功能逻辑
    • 文件拖放支持文件夹功能
    • 文件拖放支持子文件夹递归
  • 添加手动导出、导入配置的功能

  • 设置页面添加滚动

  • 修复重复转写时同名不同路径的文件导致,表格覆盖且添加失败的问题。 #61

  • 修复 V3 模型的在线下载功能

    • 升级 faster- whisper0.10.0
  • 修复单词级时间戳占用显存过多导致速度变慢甚至崩溃的 bug

    • 已经升级 CTranslate2 至最新版本,如果还是存在上述问题,请升级显卡驱动。
  • 添加改变主题色功能

  • 再次修复部分音视频文件无法识别音频流的 bug

提示

  • 手动卸载 whisper 模型失败或者软件崩溃的情况下,请将 温度 参数设置为一个 0,温度候选个数设置为 1
  • 转写结果较多时窗体可能崩溃,建议关闭自动跳转功能
  • 由于本人打包安装包的时候经常忘记封装 ffmpeg (悲),所以以后的安装包可能都不再提供 ffmpeg,请大家自己安装好自己的 ffmpeg。本次单独提供 ffmpeg.7z 的下载,解压缩之后放在任意目录并添加该目录到环境变量,或者放入软件安装目录下。

0.5.7 Changes

  • fixed bug in which the result of the current overwrite is not updated when the table is closed

  • File list function update # 66

    • add the ability to read and paste file names from the clipboard to the file list

    • one-click clear function of file list

    • improve the functional logic of removing files when multiple selections in the file list

    • File drag and drop support folder function

    • File drag and drop support subfolder recursion

  • add the ability to manually export and import configurations

  • set up the page to add scrolling

  • fixed the problem that the file with the same name and different path was overwritten and failed to be added when it was repeated. # 61

  • fixed the online download function of the V3 model

    • upgrade faster-whisper to 0.10.0
  • fixed bug where word- level timestamps take up too much video memory, resulting in slowdowns or even crashes

    • CTranslate2 has been upgraded to the latest version. If the above problems still exist, please upgrade the video card driver.
  • add the function of changing theme color

  • fixed bug that some audio/video files can not be read again.

Tips

  • if you fail to uninstall the whisper model manually or the software crashes, set the temperature parameter to 0 and the number of temperature candidates to 1.
  • The form may crash when there are many rewriting results. It is recommended to turn off the automatic jump function.
  • Since I often forget to package ffmpeg when packing and installing packages(sad), I may no longer provide ffmpeg in future installation packages. Please install your own ffmpeg. Download ffmpeg.7z separately, unzip it and put it in any directory and add it to the environment variable, or put it in the software installation directory.

0.5.4

01 Jan 11:45
Compare
Choose a tag to compare

0.5.4

0.5.4 改动

  • 添加根据说话人和字幕时间戳进行音频分段输出的功能 (#54)

    • 根据字幕时间戳和说话人将音频分割为多段音频并输出
  • 升级字幕显示和编辑功能

    • 字幕表格显示时间戳显示为 hh:mm:ss 形式
    • 字幕时间戳编辑功能完善
    • 字幕时间戳编辑功能支持单词级时间戳
    • 自动修改所有相同说话人
  • 修复文件添加到文件列表后会被占用的 bug

  • 修复手动添加的已存在的字幕文件不能修改、保存的 bug

  • 修复表格相关显示状态不随配置文件变化的 bug

  • 修复部分音视频文件不能读取的 bug (#55)

    • 修复文件 tag 含有特殊字符时,音视频文件可能读取失败的 bug

提示

  • 手动卸载 whisper 模型失败或者软件崩溃的情况下,请将 温度 参数设置为一个 0,温度候选个数设置为 1

0.5.4 Changes

  • add the ability to segment audio output according to the speaker and subtitle timestamp (#54)

    • divide audio into multiple audio segments and output based on subtitle timestamp and speaker
  • upgrade subtitle display and editing functions

    • the subtitle table displays the timestamp as hh:mm:ss

    • the subtitle timestamp editing function is perfect.

    • the subtitle timestamp editing function supports word- level timestamps

    • automatically modify all the same speakers

  • fixed bug that will be occupied when files are added to the file list

  • fixed bug that manually added existing subtitle files that cannot be modified or saved

  • fixed bug where the display status of the table does not change with the profile

  • fixed bug that cannot be read from some audio and video files (#55)

    • fixed that audio and video files may fail to read bug when the file tag contains special characters

Tips

  • if you fail to uninstall the whisper model manually or the software crashes, set the temperature parameter to 0 and the number of temperature candidates to 1.

0.5.0 patch

19 Nov 11:08
Compare
Choose a tag to compare

0.5.0 patch

0.5.0 紧急修复

  • 修复只能输出英语的 bug
    • 下载 0.5.0 patch.7z 文件并解压缩,然后将解压得到的文件和文件夹放入 0.5.0 版本的安装目录下并替换原文件

0.5.0 Emergency Repair

  • Fixed bug of only English output
    • Download the file 0.5.0 patch.7z and extract it, then put the unzipped files and folders in the installation directory of the 0.5.0 version and replace the original file.

0.5.0

18 Nov 08:12
Compare
Choose a tag to compare

0.5.0

0.5.0 改动

  • 重构模型参数页面 UI 布局设计
    • 弃用转换模型功能
    • 重新设计参数项布局
  • 其他 UI 优化
  • 将输出文件编码参数应用于更多输出文件
    • 现在所有输出文件格式: .srt.vtt.txt.lrc.smi 格式的输出文件都可以设置文件编码而不仅仅是 .srt 格式。
  • 添加 设置 界面
    • 添加保存软件各项配置的设置项
    • 添加设置自动加载模型的设置项 #33
    • 添加设置页面清除软件临时存储的功能按钮
    • 添加设置页面打开临时存储目录的功能按钮
    • 添加设置页面选择转写完成之后是否自动跳转页面的设置项 #38
    • 添加自动清除临时文件设置项
    • 添加打开 log 文件的按钮
    • 添加语言设置选项 #34
  • 添加自动保存软件配置的功能,在 设置 页面设置是否自动保存
    • 添加自动保存、加载主题设置的功能 #38
    • 添加自动保存、加载模型参数配置功能
    • 添加自动保存、加载 VAD 参数配置的功能
    • 添加自动保存、加载 转写参数配置的功能
    • 添加自动保存、加载 Demucs 参数配置的功能
    • 添加自动保存、加载 字幕表格样式参数配置的功能
    • 添加自动保存、加载 whisperX 参数配置的功能
  • 添加自动加载模型的功能
    • 如果 设置 页面下的 自动加载模型 选项被开启,软件启动后将会按照 模型参数 配置自动加载模型
    • 该功能需要正确保存前次 模型参数 配置的前提下开启,所以要使用该功能,自动保存配置 选项必须开启。
  • 添加更多 log 信息
    - 添加 faster_whisper 的详细日志 faster_whisper.log 文件
  • 取消文件转写时,如果已经有转写结果,将输出结果显示到输出页面
  • 将输入文件检测设置成多线程运行避免文件过多时界面假死以及信息窗口不能弹出
    - 启用子线程进行输入文件内容检测
  • 修正加载页面拼写错误 #38
  • 最终,我为 huggingface 用户令牌参数 找到了家
  • 调整输出文件的逻辑,
    - 将会使用全局变量保存当前活动的转写结果,所有功能均可单独工作并输出结果,包括 whisperX 的时间戳对齐和说话人分离功能
  • 修复连续转写操作下会闪退的 bug
  • 修复转写结束之后可能出现的闪退 bug

提示

  • 如果 whisperX 功能出现异常且 log 文件显示 Error: [WinError 2] 系统找不到指定的文件。 请确保已经正确安装 ffmpeg 。如果没有安装过 ffmpeg 可以在本次发布中下载 ffmpeg.zip 解压之后将整个文件夹放入软件安装目录
  • 百度网盘更新地址:https://pan.baidu.com/s/18Yq6pH_6KB_Ht4U03AgkZA?pwd=hbie 欢迎订阅

0.5.0 Changes

  • UI layout Design of reconstructed Model Parameter Page

    • Deprecate the transformation model function

    • Redesign the layout of parameter items

  • Other UI optimizations

  • Apply output file encoding parameters to more output files

    • Now all output files in the formats of .srt, .vtt, .txt, .lrc and .smi can be encoded instead of just .srt.
  • Add Settings interface

    • Add settings that save the configuration of the software

    • Add setting item # 33 that sets the automatic loading of the model

    • Add settings page to clear the function button of the temporary storage of the software

    • Add settings page to open the function button for temporary storage directory

    • Add the settings page to select whether to automatically jump to the page after the conversion is completed. # 38

    • Add automatic cleanup of temporary file settings

    • Add a button to open the log file

    • Add language setting options # 34

  • Add the function to automatically save the software configuration, and whether the settings on the Settings page are automatically saved.

    • Add the function of automatically saving and loading theme settings # 38

    • Add auto-save and load model parameter configuration function

    • Add the function of automatically saving and loading VAD parameter configuration

    • Add the function of automatically saving and loading transfer parameter configuration

    • Add the function of automatically saving and loading Demucs parameter configuration

    • Add the function of automatically saving and loading subtitle table style parameter configuration

    • Add the function of automatically saving and loading whisperX parameter configuration

    • Add the ability to load models automatically

      • If the auto load model option under the Settings page is enabled, the software will automatically load the model according to the model parameter configuration after startup.

      • This feature needs to be enabled on the premise that the previous model parameter configuration is saved correctly, so the auto-save configuration option must be enabled to use this feature.

  • Add more log information

    • Add a detailed log faster_whisper.log file for faster_whisper
  • When canceling file transfer, if there is already a transfer result, the output result is displayed to the output page

  • Set the input file detection to multithreaded to avoid the interface dying when there are too many files and the information window cannot pop up.

    • Enable child threads to detect the contents of input files
  • Fixed the misspelling of the loaded page # 38

  • Finally, I found a home for the parameter huggingface user token

  • Adjust the logic of the output file

    • Global variables will be used to save the transcription results of the current activity, and all functions can work independently and output the results, including whisperX timestamp alignment and speaker separation
  • Fixed bug that flickered during continuous write operations

  • Fixed possible flashback bug after the end of the overwrite

tips

  • If the whisperX function is abnormal and the log file shows Error: [WinError 2] the system cannot find the specified file. Please make sure that ffmpeg is installed correctly. If you have not installed ffmpeg, you can download ffmpeg.zip in this release and put the entire folder into the software installation directory after decompression.