You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I want to create a dataset of parquet files, with audios stored as separate .mp3 files. However, it says "No such file or directory" (see the reproducing code).
Steps to reproduce the bug
Creating a dataset
from pathlib import Path
from datasets import Dataset, load_dataset, Audio
Path('my_dataset/audio').mkdir(parents=True, exist_ok=True)
Path('my_dataset/audio/file.mp3').touch(exist_ok=True)
Dataset.from_list(
[{'audio': {'path': 'audio/file.mp3'}}]
).to_parquet('my_dataset/data.parquet')
dataset = (
load_dataset('my_dataset', split='train')
.cast_column('audio', Audio(sampling_rate=16_000))
)
dataset[0]
>>> FileNotFoundError: [Errno 2] No such file or directory: 'audio/file.mp3'
Expected behavior
I expect the dataset to load correctly.
I've found 2 workarounds, but they are not very good:
I can specify an absolute path to the audio, however, when I move the folder or upload to HF it will stop working.
I can set 'path': 'file.mp3', and load with load_dataset('my_dataset', data_dir='audio') - it seems to work, but does this mean that anyone from Hugging Face who wants to use this dataset should also pass the data_dir argument, otherwise it won't work?
Environment info
datasets 3.1.0, Ubuntu 24.04.1
The text was updated successfully, but these errors were encountered:
@lhoestq thank you, but there are two problems with using AudioFolder:
It is said that AudioFolder requires metadata.csv. However, my datset is too large and contains nested and np.ndarray fields, so I can't use csv.
It is said that I need to load the dataset with load_dataset("audiofolder", ...). However, if possible, I want my dataset to be loaded as usual with load_dataset(dataset_name) after I upload if to HF.
Describe the bug
Hello! I want to create a dataset of parquet files, with audios stored as separate .mp3 files. However, it says "No such file or directory" (see the reproducing code).
Steps to reproduce the bug
Creating a dataset
Result:
Trying to load the dataset
Expected behavior
I expect the dataset to load correctly.
I've found 2 workarounds, but they are not very good:
'path': 'file.mp3'
, and load withload_dataset('my_dataset', data_dir='audio')
- it seems to work, but does this mean that anyone from Hugging Face who wants to use this dataset should also pass thedata_dir
argument, otherwise it won't work?Environment info
datasets 3.1.0, Ubuntu 24.04.1
The text was updated successfully, but these errors were encountered: