Inaccurate Labels in Dataset #29

lmxue · 2024-07-20T16:23:45Z

I have encountered inaccuracies in the labels provided in the dataset at https://huggingface.co/datasets/parler-tts/mls-eng-10k-tags_tagged_10k_generated.

The code:
from datasets import load_dataset
test_set = load_dataset("parler-tts/mls-eng-10k-tags_tagged_10k_generated", split="test")
test_set[0]

The output:
{'original_path': 'http://www.archive.org/download/lesmis3_0911_0911/lesmiserables_vol3_22_hugo_64kb.mp3', 'begin_time': 119.15, 'end_time': 132.26, 'audio_duration': 13.109999999999983, 'speaker_id': '7171', 'book_id': '3158', 'utterance_pitch_mean': 172.13397216796875, 'utterance_pitch_std': 71.41407012939453, 'snr': 47.84040069580078, 'c50': 57.13105392456055, 'speaking_rate': 'slightly slowly', 'phonemes': 'ʌnd hi nu ðʌ ʌndʒʌst ʃeɪm ʌnd ðʌ pɔɪnjʌnt blʌʃʌz ʌv ædmɜ˞ʌbʌl ʌnd tɛɹʌbʌl tɹaɪʌl fɹʌm wɪtʃ ðʌ fibʌl ɪmɜ˞dʒ beɪs fɹʌm wɪtʃ ðʌ stɹɔŋ ɪmɜ˞dʒ sʌblaɪm', 'gender': 'male', 'pitch': 'very high pitch', 'noise': 'moderate ambient sound', 'reverberation': 'very confined sounding', 'speech_monotony': 'slightly expressive', 'text_description': ' A man speaks with a slightly expressive tone in a confined space, his voice echoing slightly but overall sounding quite clear, with moderate ambient sound in the background. His pitch is very high, but his delivery is only slightly slower than normal.', 'original_text': 'and he knew the unjust shame and the poignant blushes of wretchedness admirable and terrible trial from which the feeble emerge base from which the strong emerge sublime', 'text': 'And he knew the unjust shame and the poignant blushes of wretchedness. Admirable and terrible trial from which the feeble emerge base, from which the strong emerge sublime.'}

Analysis:

original_path': 'http://www.archive.org/download/lesmis3_0911_0911/lesmiserables_vol3_22_hugo_64kb.mp3'
'text': 'And he knew the unjust shame and the poignant blushes of wretchedness. Admirable and terrible trial from which the feeble emerge base, from which the strong emerge sublime.'
'begin_time': 119.15, 'end_time': 132.26 (in s) correspond to 1.9858 and 2.2043 (in minute)

However, after listening to the audio of http://www.archive.org/download/doublelifeofalfredburton_1801_librivox/doublelifealfredburton_14_oppenheim_64kb.mp3,
I found that the begining time and end time of 'text': 'Mr Cowper looked at his visitor in amazement, my young friend. He said: are you going to tell me that you have seen one of these beans? Not only that, but i have eaten one. Burton said, in fact, i have eaten two.' in the audio are 1.59 and 2.11 minutes., which are not aligned with the labels in the dataset.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inaccurate Labels in Dataset #29

Inaccurate Labels in Dataset #29

lmxue commented Jul 20, 2024

Inaccurate Labels in Dataset #29

Inaccurate Labels in Dataset #29

Comments

lmxue commented Jul 20, 2024