Issue while running run_eval_model.sh #3

damnfarooq · 2023-08-09T10:39:25Z

I am having this problem can you help me fix this issue?

(Farooq_thesis) phd-research@phd-research:~/research_space/w2v2-air-traffic$ bash src/run_eval_model.sh
*** About to evaluate a Wav2Vec 2.0 model***
*** Dataset in: experiments/data/uwb_atcc/test ***
*** Output folder: experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/output ***
Integrating a LM by shallow fusion, results should be better
*** Loading the Wav2Vec 2.0 model, loading... ***
/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:53: FutureWarning: Loading a tokenizer inside Wav2Vec2Processor from a config that does not include a tokenizer_class attribute is deprecated and will be removed in v5. Please add 'tokenizer_class': 'Wav2Vec2CTCTokenizer' attribute to either your config.json or tokenizer_config.json file to suppress this warning:
warnings.warn(
Traceback (most recent call last):
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py", line 51, in from_pretrained
return super().from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/processing_utils.py", line 182, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/processing_utils.py", line 226, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 640, in from_pretrained
return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1761, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/checkpoint-10000'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/checkpoint-10000' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/phd-research/research_space/w2v2-air-traffic/src/eval_model.py", line 250, in
main()
File "/home/phd-research/research_space/w2v2-air-traffic/src/eval_model.py", line 152, in main
processor, processor_ctc_kenlm, model = get_kenlm_processor(path_model, path_lm)
File "/home/phd-research/research_space/w2v2-air-traffic/src/eval_model.py", line 47, in get_kenlm_processor
processor = AutoProcessor.from_pretrained(path_tokenizer)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 254, in from_pretrained
return PROCESSOR_MAPPING[type(config)].from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py", line 63, in from_pretrained
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1761, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/checkpoint-10000'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/checkpoint-10000' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.

The text was updated successfully, but these errors were encountered:

damnfarooq · 2023-08-09T10:40:19Z

This is the output from previous command, I got stuck in run_eval_model.sh now

(Farooq_thesis) phd-research@phd-research:~/research_space/w2v2-air-traffic$ bash /home/phd-research/research_space/w2v2-air-traffic/src/run_train_kenlm.sh

*** About to start the KenLM ***
*** Dataset name: uwb_atcc ***
*** Output folder: experiments/data/uwb_atcc/train/lm ***
uwb_atcc experiments/data/uwb_atcc/train/text

Exporting dataset to text file experiments/data/uwb_atcc/train/lm/4_corpus.txt...
lmplz -o 4 --text experiments/data/uwb_atcc/train/lm/4_corpus.txt --arpa experiments/data/uwb_atcc/train/lm/uwb_atcc_4g_no_fix.arpa
=== 1/5 Counting and sorting n-grams ===
Reading /home/phd-research/research_space/w2v2-air-traffic/experiments/data/uwb_atcc/train/lm/4_corpus.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Unigram tokens 113301 types 1766
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:21192 2:2261260544 3:4239863552 4:6783781888
Statistics:
1 1765 D1=0.595645 D2=0.962202 D3+=1.62725
2 16099 D1=0.732908 D2=1.05218 D3+=1.47953
3 38208 D1=0.799969 D2=1.12127 D3+=1.28138
4 60883 D1=0.823461 D2=1.16559 D3+=1.23074
Memory estimate for binary LM:
type kB
probing 2387 assuming -p 1.5
probing 2712 assuming -r models -p 1.5
trie 950 without quantization
trie 472 assuming -q 8 -b 8 quantization
trie 895 assuming -a 22 array pointer compression
trie 418 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:21180 2:257584 3:764160 4:1461192
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:21180 2:257584 3:764160 4:1461192
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Name:lmplz VmPeak:13164676 kB VmRSS:9084 kB RSSMax:2609216 kB user:0.196349 sys:0.464826 CPU:0.661188 real:0.647472
corrected Ken LM in experiments/data/uwb_atcc/train/lm/uwb_atcc_4g.arpa
build_binary trie experiments/data/uwb_atcc/train/lm/uwb_atcc_4g.arpa experiments/data/uwb_atcc/train/lm/uwb_atcc_4g.binary
Reading experiments/data/uwb_atcc/train/lm/uwb_atcc_4g.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Identifying n-grams omitted by SRI
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Writing trie
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

SUCCESS
done doing training of KenLM
check the output folder: experiments/data/uwb_atcc/train/lm
Done training 4-gram in experiments/data/uwb_atcc/train/lm

damnfarooq · 2023-08-09T15:36:17Z

I FIXED THE ISSUE BY CHANGING THE PATH TO MODEL IN run_eval_model.sh to:
I GOT THE OUTPUT BUT THERE ARE SOME WARNINGS RELATED TO UNIGRAM NOT SURE IF IT WORKED AS IT EXPECTED TO BE OR NOT:

path_to_model="experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc"

(Farooq_thesis) phd-research@phd-research:~/research_space/w2v2-air-traffic$ bash src/run_eval_model.sh
*** About to evaluate a Wav2Vec 2.0 model***
*** Dataset in: experiments/data/uwb_atcc/test ***
*** Output folder: /home/phd-research/research_space/w2v2-air-traffic/experiments/results/baselines/wav2vec2-base/uwb_atcc/output ***
Integrating a LM by shallow fusion, results should be better
*** Loading the Wav2Vec 2.0 model, loading... ***
Unigrams not provided and cannot be automatically determined from LM file (only arpa format). Decoding accuracy might be reduced.
Found entries of length > 1 in alphabet. This is unusual unless style is BPE, but the alphabet was not recognized as BPE type. Is this correct?
No known unigrams provided, decoding results might be a lot worse.
*** Loading the dataset... ***
Using custom data configuration test-085e5dd7a4b8bb1c
Downloading and preparing dataset atc_data_loader/test to /home/phd-research/research_space/w2v2-air-traffic/.cache/eval/experiments/data/uwb_atcc/test/atc_data_loader/test-085e5dd7a4b8bb1c/0.0.0/f2633cc53c6abe32cddd4152eebde1a4e3c9953e1446e190b8d9a13330cddaa4...
Dataset atc_data_loader downloaded and prepared to /home/phd-research/research_space/w2v2-air-traffic/.cache/eval/experiments/data/uwb_atcc/test/atc_data_loader/test-085e5dd7a4b8bb1c/0.0.0/f2633cc53c6abe32cddd4152eebde1a4e3c9953e1446e190b8d9a13330cddaa4. Subsequent calls will reuse this data.
67%|████████████████████████████████████████████████████ | 2/3 [00:47<00:23, 23.59s/ba]
#0: 0%| | 0/706 [00:00<?, ?ex/s/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:154: UserWarning: as_target_processor is deprecated and will be removed in v5 of Transformers. You can process your labels by using the argument text of the regular __call__ method (either in the same call as your audio inputs, or in a separate call.
warnings.warn(
/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:154: UserWarning: as_target_processor is deprecated and will be removed in v5 of Transformers. You can process your labels by using the argument text of the regular __call__ method (either in the same call as your audio inputs, or in a separate call.
warnings.warn(
/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:154: UserWarning: as_target_processor is deprecated and will be removed in v5 of Transformers. You can process your labels by using the argument text of the regular __call__ method (either in the same call as your audio inputs, or in a separate call.
warnings.warn(
/home/phd-research/anaconda3/envs/Farooq_thesis/lib/python3.10/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:154: UserWarning: as_target_processor is deprecated and will be removed in v5 of Transformers. You can process your labels by using the argument text of the regular __call__ method (either in the same call as your audio inputs, or in a separate call.
warnings.warn(
#0: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:13<00:00, 50.44ex/s]
#3: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:14<00:00, 50.02ex/s]
#1: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:14<00:00, 49.39ex/s]
#2: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:15<00:00, 46.99ex/s]
#2: 94%|██████████████████████████████████████████████████████████████████▏ | 667/706 [00:14<00:00, 51.16ex/s]
#2: 100%|██████████████████████████████████████████████████████████████████████| 706/706 [00:15<00:00, 45.76ex/s]
Performing inference on dataset... Loading

inference: 100%|█████████████████████████████████████████████████████████████| 2824/2824 [16:40<00:00, 2.82ex/s]
Downloading builder script: 100%|███████████████████████████████████████████| 5.60k/5.60k [00:00<00:00, 7.62MB/s]
*** printing the ASR results in /home/phd-research/research_space/w2v2-air-traffic/experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc/output/uwb_atcc/hypo ***
Done!
Done evaluating model in /home/phd-research/research_space/w2v2-air-traffic/experiments/results/baselines/wav2vec2-base/uwb_atcc/0.0ld_0.0ad_0.0attd_0.0fpd_0.01mtp_12mtl_0.0mfp_12mfl_2acc with LM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue while running run_eval_model.sh #3

Issue while running run_eval_model.sh #3

damnfarooq commented Aug 9, 2023 •

edited

Loading

damnfarooq commented Aug 9, 2023

damnfarooq commented Aug 9, 2023

Issue while running run_eval_model.sh #3

Issue while running run_eval_model.sh #3

Comments

damnfarooq commented Aug 9, 2023 • edited Loading

damnfarooq commented Aug 9, 2023

damnfarooq commented Aug 9, 2023

damnfarooq commented Aug 9, 2023 •

edited

Loading