-
Notifications
You must be signed in to change notification settings - Fork 183
Open
Description
确认清单
- 我已经阅读过 README.md 和 dependencies.md 文件
- 我已经确认之前没有 issue 或 discussion 涉及此 BUG
- 我已经确认问题发生在最新代码或稳定版本中
- 我已经确认问题与 API 无关
- 我已经确认问题与 WebUI 无关
- 我已经确认问题与 Finetune 无关
你的issues
Hi,
I am planning to fine-tune ChatTTS using my own dataset, and I would like to confirm a few details regarding the data format and requirements.
1. Data Structure and .list File Format
Based on the documentation and examples, I have organized my data as follows:
File Structure
datasets/
└── data_speaker_a/
├── speaker_a/
│ ├── 1.wav
│ ├── 2.wav
│ └── ... (more audio files)
└── speaker_a.list
.list File Format
Each line in the .list
file is formatted as filepath|speaker|lang|text
, where:
filepath
: Relative path to the audio file (relative to the directory containing the.list
file).speaker
: Name of the speaker.lang
: Language code (e.g.,ZH
for Chinese,EN
for English).text
: Transcription of the audio content.
Example:
speaker_a/1.wav|John|ZH|你好
speaker_a/2.wav|John|EN|Hello
Could you please confirm if this structure and format are correct?
2. Audio Data Specifications
I am planning to use 100 audio files, each approximately 10 seconds long, with a sampling rate of 24000 Hz for training.
Is this a suitable setup for fine-tuning the model? Are there any specific recommendations or requirements?
Thank you for your assistance!
Metadata
Metadata
Assignees
Labels
No labels