Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate Labels in Dataset #29

Open
lmxue opened this issue Jul 20, 2024 · 0 comments
Open

Inaccurate Labels in Dataset #29

lmxue opened this issue Jul 20, 2024 · 0 comments

Comments

@lmxue
Copy link

lmxue commented Jul 20, 2024

I have encountered inaccuracies in the labels provided in the dataset at https://huggingface.co/datasets/parler-tts/mls-eng-10k-tags_tagged_10k_generated.

The code:
from datasets import load_dataset
test_set = load_dataset("parler-tts/mls-eng-10k-tags_tagged_10k_generated", split="test")
test_set[0]

The output:
{'original_path': 'http://www.archive.org/download/lesmis3_0911_0911/lesmiserables_vol3_22_hugo_64kb.mp3', 'begin_time': 119.15, 'end_time': 132.26, 'audio_duration': 13.109999999999983, 'speaker_id': '7171', 'book_id': '3158', 'utterance_pitch_mean': 172.13397216796875, 'utterance_pitch_std': 71.41407012939453, 'snr': 47.84040069580078, 'c50': 57.13105392456055, 'speaking_rate': 'slightly slowly', 'phonemes': 'ʌnd hi nu ðʌ ʌndʒʌst ʃeɪm ʌnd ðʌ pɔɪnjʌnt blʌʃʌz ʌv ædmɜ˞ʌbʌl ʌnd tɛɹʌbʌl tɹaɪʌl fɹʌm wɪtʃ ðʌ fibʌl ɪmɜ˞dʒ beɪs fɹʌm wɪtʃ ðʌ stɹɔŋ ɪmɜ˞dʒ sʌblaɪm', 'gender': 'male', 'pitch': 'very high pitch', 'noise': 'moderate ambient sound', 'reverberation': 'very confined sounding', 'speech_monotony': 'slightly expressive', 'text_description': ' A man speaks with a slightly expressive tone in a confined space, his voice echoing slightly but overall sounding quite clear, with moderate ambient sound in the background. His pitch is very high, but his delivery is only slightly slower than normal.', 'original_text': 'and he knew the unjust shame and the poignant blushes of wretchedness admirable and terrible trial from which the feeble emerge base from which the strong emerge sublime', 'text': 'And he knew the unjust shame and the poignant blushes of wretchedness. Admirable and terrible trial from which the feeble emerge base, from which the strong emerge sublime.'}

Analysis:

However, after listening to the audio of http://www.archive.org/download/doublelifeofalfredburton_1801_librivox/doublelifealfredburton_14_oppenheim_64kb.mp3,
I found that the begining time and end time of 'text': 'Mr Cowper looked at his visitor in amazement, my young friend. He said: are you going to tell me that you have seen one of these beans? Not only that, but i have eaten one. Burton said, in fact, i have eaten two.' in the audio are 1.59 and 2.11 minutes., which are not aligned with the labels in the dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant