Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while running run_eval_model.sh #3

Open
damnfarooq opened this issue Aug 9, 2023 · 2 comments
Open

Issue while running run_eval_model.sh #3

damnfarooq opened this issue Aug 9, 2023 · 2 comments

Comments

@damnfarooq
Copy link

damnfarooq commented Aug 9, 2023

I am having this problem can you help me fix this issue?

(Farooq_thesis) phd-research@phd-research:~/research_space/w2v2-air-traffic$ bash src/run_eval_model.sh
*** About to evaluate a Wav2Vec 2.0 model***
*** Dataset in: experiments/data/uwb_atcc/test ***
*** Output folder: experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/output ***
Integrating a LM by shallow fusion, results should be better
*** Loading the Wav2Vec 2.0 model, loading... ***
/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:53: FutureWarning: Loading a tokenizer inside Wav2Vec2Processor from a config that does not include a tokenizer_class attribute is deprecated and will be removed in v5. Please add 'tokenizer_class': 'Wav2Vec2CTCTokenizer' attribute to either your config.json or tokenizer_config.json file to suppress this warning:
warnings.warn(
Traceback (most recent call last):
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py", line 51, in from_pretrained
return super().from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/processing_utils.py", line 182, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/processing_utils.py", line 226, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 640, in from_pretrained
return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1761, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/checkpoint-10000'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/checkpoint-10000' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/phd-research/research_space/w2v2-air-traffic/src/eval_model.py", line 250, in
main()
File "/home/phd-research/research_space/w2v2-air-traffic/src/eval_model.py", line 152, in main
processor, processor_ctc_kenlm, model = get_kenlm_processor(path_model, path_lm)
File "/home/phd-research/research_space/w2v2-air-traffic/src/eval_model.py", line 47, in get_kenlm_processor
processor = AutoProcessor.from_pretrained(path_tokenizer)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 254, in from_pretrained
return PROCESSOR_MAPPING[type(config)].from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py", line 63, in from_pretrained
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1761, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/checkpoint-10000'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/checkpoint-10000' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.

@damnfarooq
Copy link
Author

This is the output from previous command, I got stuck in run_eval_model.sh now

(Farooq_thesis) phd-research@phd-research:~/research_space/w2v2-air-traffic$ bash /home/phd-research/research_space/w2v2-air-traffic/src/run_train_kenlm.sh

*** About to start the KenLM ***
*** Dataset name: uwb_atcc ***
*** Output folder: experiments/data/uwb_atcc/train/lm ***
uwb_atcc experiments/data/uwb_atcc/train/text

Exporting dataset to text file experiments/data/uwb_atcc/train/lm/4_corpus.txt...
lmplz -o 4 --text experiments/data/uwb_atcc/train/lm/4_corpus.txt --arpa experiments/data/uwb_atcc/train/lm/uwb_atcc_4g_no_fix.arpa
=== 1/5 Counting and sorting n-grams ===
Reading /home/phd-research/research_space/w2v2-air-traffic/experiments/data/uwb_atcc/train/lm/4_corpus.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Unigram tokens 113301 types 1766
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:21192 2:2261260544 3:4239863552 4:6783781888
Statistics:
1 1765 D1=0.595645 D2=0.962202 D3+=1.62725
2 16099 D1=0.732908 D2=1.05218 D3+=1.47953
3 38208 D1=0.799969 D2=1.12127 D3+=1.28138
4 60883 D1=0.823461 D2=1.16559 D3+=1.23074
Memory estimate for binary LM:
type kB
probing 2387 assuming -p 1.5
probing 2712 assuming -r models -p 1.5
trie 950 without quantization
trie 472 assuming -q 8 -b 8 quantization
trie 895 assuming -a 22 array pointer compression
trie 418 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:21180 2:257584 3:764160 4:1461192
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:21180 2:257584 3:764160 4:1461192
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Name:lmplz VmPeak:13164676 kB VmRSS:9084 kB RSSMax:2609216 kB user:0.196349 sys:0.464826 CPU:0.661188 real:0.647472
corrected Ken LM in experiments/data/uwb_atcc/train/lm/uwb_atcc_4g.arpa
build_binary trie experiments/data/uwb_atcc/train/lm/uwb_atcc_4g.arpa experiments/data/uwb_atcc/train/lm/uwb_atcc_4g.binary
Reading experiments/data/uwb_atcc/train/lm/uwb_atcc_4g.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Identifying n-grams omitted by SRI
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Writing trie
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

SUCCESS
done doing training of KenLM
check the output folder: experiments/data/uwb_atcc/train/lm
Done training 4-gram in experiments/data/uwb_atcc/train/lm

@damnfarooq
Copy link
Author

I FIXED THE ISSUE BY CHANGING THE PATH TO MODEL IN run_eval_model.sh to:
I GOT THE OUTPUT BUT THERE ARE SOME WARNINGS RELATED TO UNIGRAM NOT SURE IF IT WORKED AS IT EXPECTED TO BE OR NOT:

path_to_model="experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc"

(Farooq_thesis) phd-research@phd-research:~/research_space/w2v2-air-traffic$ bash src/run_eval_model.sh
*** About to evaluate a Wav2Vec 2.0 model***
*** Dataset in: experiments/data/uwb_atcc/test ***
*** Output folder: /home/phd-research/research_space/w2v2-air-traffic/experiments/results/baselines/wav2vec2-base/uwb_atcc/output ***
Integrating a LM by shallow fusion, results should be better
*** Loading the Wav2Vec 2.0 model, loading... ***
Unigrams not provided and cannot be automatically determined from LM file (only arpa format). Decoding accuracy might be reduced.
Found entries of length > 1 in alphabet. This is unusual unless style is BPE, but the alphabet was not recognized as BPE type. Is this correct?
No known unigrams provided, decoding results might be a lot worse.
*** Loading the dataset... ***
Using custom data configuration test-085e5dd7a4b8bb1c
Downloading and preparing dataset atc_data_loader/test to /home/phd-research/research_space/w2v2-air-traffic/.cache/eval/experiments/data/uwb_atcc/test/atc_data_loader/test-085e5dd7a4b8bb1c/0.0.0/f2633cc53c6abe32cddd4152eebde1a4e3c9953e1446e190b8d9a13330cddaa4...
Dataset atc_data_loader downloaded and prepared to /home/phd-research/research_space/w2v2-air-traffic/.cache/eval/experiments/data/uwb_atcc/test/atc_data_loader/test-085e5dd7a4b8bb1c/0.0.0/f2633cc53c6abe32cddd4152eebde1a4e3c9953e1446e190b8d9a13330cddaa4. Subsequent calls will reuse this data.
67%|████████████████████████████████████████████████████ | 2/3 [00:47<00:23, 23.59s/ba]
#0: 0%| | 0/706 [00:00<?, ?ex/s/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:154: UserWarning: as_target_processor is deprecated and will be removed in v5 of Transformers. You can process your labels by using the argument text of the regular __call__ method (either in the same call as your audio inputs, or in a separate call.
warnings.warn(
/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:154: UserWarning: as_target_processor is deprecated and will be removed in v5 of Transformers. You can process your labels by using the argument text of the regular __call__ method (either in the same call as your audio inputs, or in a separate call.
warnings.warn(
/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:154: UserWarning: as_target_processor is deprecated and will be removed in v5 of Transformers. You can process your labels by using the argument text of the regular __call__ method (either in the same call as your audio inputs, or in a separate call.
warnings.warn(
/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:154: UserWarning: as_target_processor is deprecated and will be removed in v5 of Transformers. You can process your labels by using the argument text of the regular __call__ method (either in the same call as your audio inputs, or in a separate call.
warnings.warn(
#0: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:13<00:00, 50.44ex/s]
#3: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:14<00:00, 50.02ex/s]
#1: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:14<00:00, 49.39ex/s]
#2: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:15<00:00, 46.99ex/s]
#2: 94%|██████████████████████████████████████████████████████████████████▏ | 667/706 [00:14<00:00, 51.16ex/s]
#2: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:15<00:00, 45.76ex/s]
Performing inference on dataset... Loading

inference: 100%|█████████████████████████████████████████████████████████████| 2824/2824 [16:40<00:00, 2.82ex/s]
Downloading builder script: 100%|███████████████████████████████████████████| 5.60k/5.60k [00:00<00:00, 7.62MB/s]
*** printing the ASR results in /home/phd-research/research_space/w2v2-air-traffic/experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/output/uwb_atcc/hypo ***
Done!
Done evaluating model in /home/phd-research/research_space/w2v2-air-traffic/experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc with LM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant