Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected cache behaviour using load_dataset #7323

Open
Moritz-Wirth opened this issue Dec 12, 2024 · 0 comments
Open

Unexpected cache behaviour using load_dataset #7323

Moritz-Wirth opened this issue Dec 12, 2024 · 0 comments

Comments

@Moritz-Wirth
Copy link

Moritz-Wirth commented Dec 12, 2024

Describe the bug

Following the (Cache management)[https://huggingface.co/docs/datasets/en/cache] docu and previous behaviour from datasets version 2.18.0, one is able to change the cache directory. Previously, all downloaded/extracted/etc files were found in this folder. As i have recently update to the latest version this is not the case anymore. Downloaded files are stored in ~/.cache/huggingface/hub.
Providing the cache_dir argument in load_dataset the cache directory is created and there are some files but the bulk is still in ~/.cache/huggingface/hub.

I believe this could be solved by adding the cache_dir argument here

Steps to reproduce the bug

For example using https://huggingface.co/datasets/ashraq/esc50:

from datasets import load_dataset
ds = load_dataset("ashraq/esc50", "default", cache_dir="~/custom/cache/path/esc50")

Expected behavior

I would expect the bulk of files related to the dataset to be stored somewhere in ~/custom/cache/path/esc50, but it seems they are in ~/.cache/huggingface/hub/datasets--ashraq--esc50.

Environment info

  • datasets version: 3.2.0
  • Platform: Linux-5.14.0-503.15.1.el9_5.x86_64-x86_64-with-glibc2.34
  • Python version: 3.10.14
  • huggingface_hub version: 0.26.5
  • PyArrow version: 17.0.0
  • Pandas version: 2.2.2
  • fsspec version: 2024.6.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant