Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datapipe] Support wenet datapipe #182

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

[datapipe] Support wenet datapipe #182

wants to merge 5 commits into from

Conversation

mlxu995
Copy link
Collaborator

@mlxu995 mlxu995 commented Feb 5, 2025

No description provided.

@a122760
Copy link

a122760 commented Feb 5, 2025

可以支持正负样本平衡的dataloader吗,当前框架下有什么好的实现方式呢?就是每个iter都平衡正负样本。正负样本分开做两个list?

@mlxu995
Copy link
Collaborator Author

mlxu995 commented Feb 6, 2025

可以支持正负样本平衡的dataloader吗,当前框架下有什么好的实现方式呢?就是每个iter都平衡正负样本。正负样本分开做两个list?

好需求,目前想到的也是通过分成两个list来做

@mlxu995 mlxu995 marked this pull request as ready for review February 21, 2025 07:44
@mlxu995 mlxu995 requested a review from cdliang11 February 21, 2025 07:50
@cdliang11
Copy link
Contributor

Good Job!

@mlxu995
Copy link
Collaborator Author

mlxu995 commented Feb 21, 2025

  • result of e2e loss on keyword "hi xiao wen"
    image
    and on keyword "ni hao wen wen"
    image
    image

  • result of ctc loss on keyword "hi xiao wen"
    image
    and on keyword "ni hao wen wen"
    image

@@ -15,4 +15,6 @@ pyflakes==2.2.0
lmdb
scipy
tqdm
langid
pypinyin
Copy link
Contributor

@cdliang11 cdliang11 Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlxu995 需要加上wenet。 git+https://github.com/<username>/<repository>.git@<commit_hash>

reverb_lmdb=args.reverb_lmdb,
noise_lmdb=args.noise_lmdb)
cv_dataset = Dataset(args.cv_data, cv_conf)
# train_dataset = Dataset(args.train_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释不需要了可以删掉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants