Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannt use huggingface datasets and model in online way #11

Open
bencaocs opened this issue Nov 2, 2023 · 4 comments
Open

cannt use huggingface datasets and model in online way #11

bencaocs opened this issue Nov 2, 2023 · 4 comments

Comments

@bencaocs
Copy link

bencaocs commented Nov 2, 2023

No description provided.

@bencaocs bencaocs changed the title cannt use huggingface datasets and model in cannt use huggingface datasets and model in online way Nov 2, 2023
@bencaocs
Copy link
Author

bencaocs commented Nov 2, 2023

If i cannt use huggingface dataset and model online, Does i have other way to use this code?
I try to down dataset(Tevatron/msmarco-passage-corpus) to disk, and use process_marco.py to process, its OK.

But when i Run run.py, it give me a feback,
Traceback (most recent call last): File "/home/bio-3090ti/anaconda3/envs/DSI-transform/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1724, in from_pretrained resolved_vocab_files[file_id] = cached_path( File "/home/bio-3090ti/anaconda3/envs/DSI-transform/lib/python3.8/site-packages/transformers/file_utils.py", line 1921, in cached_path output_path = get_from_cache( File "/home/bio-3090ti/anaconda3/envs/DSI-transform/lib/python3.8/site-packages/transformers/file_utils.py", line 2177, in get_from_cache raise ValueError( ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Thanks.

@ArvinZhuang
Copy link
Owner

Hi, can you also try download huggingface t5-base or large models here https://huggingface.co/t5-base to disk and load the model there?

@bencaocs
Copy link
Author

bencaocs commented Nov 2, 2023

Hi, can you also try download huggingface t5-base or large models here https://huggingface.co/t5-base to disk and load the model there?

Thanks for u fast replay. i think maybe its a good way. But i am not sure File structure. Now, my File structure is

DSI-QG
- -__pycache__
-  cache
-     dowloads
-     Tevatron__msmarco-passage-corpus
-         default
- CE
- data
-   msmarco_data
-     100k
-     X.tsv
- Other file .py .sh et.al**

If the directory is correct, where should I store t5-base after I download it? Is that the same cache

Thank your very much.

@ArvinZhuang
Copy link
Owner

simply set --model_name to the dir where you save the downloaded model in the running command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants