Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

给 AiShell/Deepspeech 增加中文说明文档 #3891

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

Liyulingyue
Copy link
Contributor

PR types

Others

PR changes

Docs

Describe

Copy link

paddle-bot bot commented Nov 15, 2024

Thanks for your contribution!

@mergify mergify bot added the Example label Nov 15, 2024
@@ -0,0 +1,208 @@
# 使用Aishell数据集训练DeepSpeech2的离线/在线ASR模型
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整篇注意下中英文之间加空格

您需要的所有脚本都在`run.sh`中。`run.sh`中有几个阶段,每个阶段都有其功能。
| 阶段 | 功能 |
|:---- |:----------------------------------------------------------- |
| 0 | 数据处理。包括:<br> (1) 下载数据集 <br> (2) 计算训练数据集的CMVN <br> (3) 获取词汇文件 <br> (4) 获取训练、开发和测试数据集的manifest文件 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vocab.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和英文对齐,英文也没有

Copy link
Collaborator

@zxcd zxcd Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个词汇文件就是指的vocab.txt,如果翻译成文字的话可以翻译成词典

|:---- |:----------------------------------------------------------- |
| 0 | 数据处理。包括:<br> (1) 下载数据集 <br> (2) 计算训练数据集的CMVN <br> (3) 获取词汇文件 <br> (4) 获取训练、开发和测试数据集的manifest文件 |
| 1 | 训练模型 |
| 2 | 通过平均前k个最佳模型来获得最终模型,设置k=1表示选择最佳模型 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

前->最好

```bash
source path.sh
```
需要先运行此脚本。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

运行程序前需要先执行此脚本 感觉更加通顺

source path.sh
```
需要先运行此脚本。
另一个脚本也需要运行:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另一个同样需要执行的脚本 感觉更加通顺

## 局部变量
在`run.sh`中设置了一些局部变量。
`gpus`表示您想使用的GPU数量。如果您设置`gpus=`,则表示仅使用CPU。
`stage`表示您想在实验中从哪个阶段开始。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中->的

Copy link
Contributor Author

@Liyulingyue Liyulingyue Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stage表示您想在实验中 从哪个阶段 开始。

在`run.sh`中设置了一些局部变量。
`gpus`表示您想使用的GPU数量。如果您设置`gpus=`,则表示仅使用CPU。
`stage`表示您想在实验中从哪个阶段开始。
`stop_stage`表示您想在实验中结束于哪个阶段。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不太通

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop_stage表示您想在实验中 结束于 哪个 阶段。

`stage`表示您想在实验中从哪个阶段开始。
`stop_stage`表示您想在实验中结束于哪个阶段。
`conf_path`表示模型的配置路径。
`avg_num`表示要平均的前k个最佳模型的数量,以获得最终模型。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

前->最好

`stop_stage`表示您想在实验中结束于哪个阶段。
`conf_path`表示模型的配置路径。
`avg_num`表示要平均的前k个最佳模型的数量,以获得最终模型。
`model_type`表示模型类型:离线或在线
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

流式或者非流式

`avg_num`表示要平均的前k个最佳模型的数量,以获得最终模型。
`model_type`表示模型类型:离线或在线
`audio_file`表示在阶段6中您想进行推理的单个文件的路径。
`ckpt`表示模型的检查点前缀,例如"deepspeech2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件名?

bash run.sh --gpus 0,1 --avg_num 1
```
## 阶段0:数据处理
要使用此示例,您需要先处理数据,可以使用`run.sh`中的阶段0来完成此操作。代码如下:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成在使用此示例前,您需要先进行数据处理是不是好点

@zxcd zxcd added the README label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants