shared_list does not have data_set in forward block with TIMIT tutorial #157

hajime9652 · 2019-08-29T01:14:00Z

------------------------------ Epoch 23 / 23 ------------------------------
 
----- Summary epoch 23 / 23
Training on ['TIMIT_tr']
Loss = 0.932 | err = 0.298 
-----
Validating on TIMIT_dev
Loss = 1.811 | err = 0.468 
-----
Learning rate on architecture1 = 0.08 
-----
Elapsed time (s) = 574

 
Testing TIMIT_test chunk = 1 / 1
shared list []
shared list [None, None, None, {'mfcc': ['mfcc', 'exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5_0827_test/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5_0827_test/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]
output folder exp/TIMIT_MLP_basic
data_set_dict <class 'dict'>
data_set_dict {'input': None, 'ref': None}
Traceback (most recent call last):
  File "run_exp.py", line 340, in <module>
    data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)
  File "/home/sysadmin/pytorch-kaldi/core.py", line 46, in convert_numpy_to_torch
    data_set_inp=torch.from_numpy(data_set_dict['input']).float()
TypeError: expected np.ndarray (got NoneType)

The text was updated successfully, but these errors were encountered:

hajime9652 · 2019-08-29T01:14:57Z

# --------FORWARD--------#
for forward_data in forward_data_lst:

         # Compute the number of chunks
         N_ck_forward=compute_n_chunks(out_folder,forward_data,ep,N_ep_str_format,'forward')
         N_ck_str_format='0'+str(max(math.ceil(np.log10(N_ck_forward)),1))+'d'

         processes = list()
         info_files = list()
         for ck in range(N_ck_forward):

            if not is_production:
                print('Testing %s chunk = %i / %i' %(forward_data,ck+1, N_ck_forward))
            else:
                print('Forwarding %s chunk = %i / %i' %(forward_data,ck+1, N_ck_forward))

            # output file
            info_file=out_folder+'/exp_files/forward_'+forward_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'.info'
            config_chunk_file=out_folder+'/exp_files/forward_'+forward_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'.cfg'


            # Do forward if the chunk was not already processed
            if not(os.path.exists(info_file)):

                # Doing forward

                # getting the next chunk 
                next_config_file=cfg_file_list[op_counter]

                # run chunk processing                    
                if _run_forwarding_in_subprocesses(config):
                    shared_list = list()
                    print("shared list",shared_list)
                    output_folder = config['exp']['out_folder']
                    save_gpumem = strtobool(config['exp']['save_gpumem'])
                    use_cuda=strtobool(config['exp']['use_cuda'])
                    p = read_next_chunk_into_shared_list_with_subprocess(read_lab_fea, shared_list, config_chunk_file, is_production, output_folder, wait_for_process=True)
                    data_name, data_end_index_fea, data_end_index_lab, fea_dict, lab_dict, arch_dict, data_set_dict = extract_data_from_shared_list(shared_list)
                    print("shared list", shared_list)
                    print("output folder",output_folder)
                    print("data_set_dict",type(data_set_dict))
                    print("data_set_dict",data_set_dict)
                    data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)

hajime9652 · 2019-08-29T01:28:11Z

When is shared_list overwrite?
and How to bring the correct data_set?

TParcollet · 2019-08-29T08:48:09Z

Hi ! Isn't it simply a problem with the path of the test dataset in the config file ?

mravanelli · 2019-08-29T15:24:36Z

Yes, it looks like that!

…

On Thu, 29 Aug 2019 at 04:48, Parcollet Titouan ***@***.***> wrote: Hi ! Isn't it simply a problem with the path of the test dataset in the config file ? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#157?email_source=notifications&email_token=AEA2ZVUTZHOAI7NUZZN4R4DQG6EM3A5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5NX6NI#issuecomment-526090037>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEA2ZVUQKVVQBV35NWHQSSLQG6EM3ANCNFSM4IRZSXTQ> .

hajime9652 · 2019-08-30T06:48:01Z

I will check again.

hajime9652 · 2019-09-05T06:22:48Z

I'm still in trouble.

ERROR MSG

------------------------------ Epoch 23 / 23 ------------------------------
 
----- Summary epoch 23 / 23
Training on ['TIMIT_tr']
Loss = 0.932 | err = 0.298 
-----
Validating on TIMIT_dev
Loss = 1.812 | err = 0.468 
-----
Learning rate on architecture1 = 0.08 
-----
Elapsed time (s) = 489

 
Testing TIMIT_test chunk = 1 / 1
config chunk file exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0.cfg
shared list [None, None, None, {'mfcc': ['mfcc', 'exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]
Traceback (most recent call last):
  File "run_exp.py", line 338, in <module>
    data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)
  File "/home/sysadmin/pytorch-kaldi/core.py", line 46, in convert_numpy_to_torch
    data_set_inp=torch.from_numpy(data_set_dict['input']).float()
TypeError: expected np.ndarray (got NoneType)

cfg

[dataset1]
data_name = TIMIT_tr
fea = fea_name=mfcc
        fea_lst=/home/sysadmin/kaldi/egs/timit/s5/data/train/feats.scp
        fea_opts=apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/train/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
        cw_left=5
        cw_right=5
        

lab = lab_name=lab_cd
        lab_folder=/home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali
        lab_opts=ali-to-pdf
        lab_count_file=auto
        lab_data_folder=/home/sysadmin/kaldi/egs/timit/s5/data/train/
        lab_graph=/home/sysadmin/kaldi/egs/timit/s5/exp/tri3/graph
        

n_chunks = 5

[dataset2]
data_name = TIMIT_dev
fea = fea_name=mfcc
        fea_lst=/home/sysadmin/kaldi/egs/timit/s5/data/dev/feats.scp
        fea_opts=apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/dev/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
        cw_left=5
        cw_right=5
        

lab = lab_name=lab_cd
        lab_folder=/home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev
        lab_opts=ali-to-pdf
        lab_count_file=auto
        lab_data_folder=/home/sysadmin/kaldi/egs/timit/s5/data/dev/
        lab_graph=/home/sysadmin/kaldi/egs/timit/s5/exp/tri3/graph
        

n_chunks = 1

[dataset3]
data_name = TIMIT_test
fea = fea_name=mfcc
        fea_lst=/home/sysadmin/kaldi/egs/timit/s5/data/test/feats.scp
        fea_opts=apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
        cw_left=5
        cw_right=5
        

lab = lab_name=lab_cd
        lab_folder=/home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
        lab_opts=ali-to-pdf
        lab_count_file=auto
        lab_data_folder=/home/sysadmin/kaldi/egs/timit/s5/data/test/
        lab_graph=/home/sysadmin/kaldi/egs/timit/s5/exp/tri3/graph
        

n_chunks = 1

hajime9652 · 2019-09-05T06:32:11Z

data_name, data_end_index_fea, data_end_index_lab, lab_dict, data_set_dict is None.
Especially why can not read lab_dict?

shared list [None, None, None, {'mfcc': ['mfcc', 'exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]

lab_folder

$ ls /home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
ali.1.gz  ali.2.gz  ali.3.gz  ali.4.gz  final.mdl  log  num_jobs  phones.txt  tree

exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0.cfg

[cfg_proto]
cfg_proto = proto/global.proto
cfg_proto_chunk = proto/global_chunk.proto

[exp]
cmd = 
run_nn_script = run_nn
out_folder = exp/TIMIT_MLP_basic
seed = 1257
use_cuda = False
multi_gpu = False
save_gpumem = False
n_epochs_tr = 24
production = False
to_do = forward
out_info = exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0.info

[batches]
batch_size_train = 128
max_seq_length_train = 1000
batch_size_valid = 128
max_seq_length_valid = 1000

[architecture1]
arch_name = MLP_layers1
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = exp/TIMIT_MLP_basic/exp_files/train_TIMIT_tr_ep23_ck4_architecture1.pkl
arch_freeze = False
arch_seq_model = False
dnn_lay = 1024,1024,1024,1024,1896
dnn_drop = 0.15,0.15,0.15,0.15,0.0
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = True,True,True,True,False
dnn_use_laynorm = False,False,False,False,False
dnn_act = relu,relu,relu,relu,softmax
arch_lr = 0.08
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = sgd
opt_momentum = 0.0
opt_weight_decay = 0.0
opt_dampening = 0.0
opt_nesterov = False

[model]
model_proto = proto/model.proto
model = out_dnn1=compute(MLP_layers1,mfcc)
        loss_final=cost_nll(out_dnn1,lab_cd)
        err_final=cost_err(out_dnn1,lab_cd)

[forward]
forward_out = out_dnn1
normalize_posteriors = True
normalize_with_counts_from = exp/TIMIT_MLP_basic/exp_files/forward_out_dnn1_lab_cd.count
save_out_file = False
require_decoding = True

[data_chunk]
fea = fea_name=mfcc
        fea_lst=exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst
        fea_opts=apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
        cw_left=5
        cw_right=5
lab = lab_name=lab_cd
        lab_folder=/home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
        lab_opts=ali-to-pdf
        lab_count_file=auto
        lab_data_folder=/home/sysadmin/kaldi/egs/timit/s5/data/test/
        lab_graph=/home/sysadmin/kaldi/egs/timit/s5/exp/tri3/graph

spencerkirn · 2019-10-02T13:29:21Z

Did you find a solution to this? I am having the exact same issue. Double checked all paths in my cfg file and the same error is occurring.

Note: I am using PyTorch-Kaldi on WSL without CUDA (still no CUDA support on WSL) not sure if this would make a difference.

mravanelli · 2019-10-02T15:00:40Z

It looks like and error in reading features and labels with kaldi. To debug, you can try to "manually" read the features in this way: 1- select one ark file in the /mnt/mscteach_home/s1870525/dissertation/PruninNeuralNetworksSpeech/s5/data/test_dev93/feats.scp (e.g, quick_test/fbank/raw_fbank_dev.1.ark) 2- run copy-feats ark:your_ark_file.ark ark,t:- . If everything works you should see a lot of numbers is standard output. If it doesn't work, try to take a look into the error. 3- If it works, you can add the options and you can write: copy-feats ark:your_ark.ark ark:- | apply-cmvn --utt2spk=ark:/mnt/mscteach_home/s1870525/dissertation/PruninNeuralNetworksSpeech/s5/data/test_dev93/utt2spk ark:/mnt/mscteach_home/s1870525/dissertation/PruninNeuralNetworksSpeech/s5/data/test_dev93/data/cmvn_test_dev93.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark,t:- If it doesn't work, take a look into the error message. You can also try to take a look into the log.log file you find into the output folder. Please, let me know if you are able to solve the data loading issue...

…

On Wed, 2 Oct 2019 at 09:29, spencerkirn ***@***.***> wrote: Did you find a solution to this? I am having the exact same issue. Double checked all paths in my cfg file and the same error is occurring. Note: I am using PyTorch-Kaldi on WSL without CUDA (still no CUDA support on WSL) not sure if this would make a difference. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#157?email_source=notifications&email_token=AEA2ZVTDF3BUVASYTBN5OV3QMSO3LA5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAEXV4A#issuecomment-537492208>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEA2ZVU5Z7HOD7RZI763UTDQMSO3LANCNFSM4IRZSXTQ> .

spencerkirn · 2019-10-03T13:24:49Z

Thank you for the quick reply. I apologize if these are basic questions, I am new to using Kaldi and this toolkit. So I ran copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_dev.1.ark ark,t:- and it ran just like you said it should, with a lot of numbers output to the terminal. So after that I ran copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_dev.1.ark ark:- | apply-cmvn --utt2spk=ark:/home/spencer/kaldi/egs/timit/data/dev/utt2spk ark:/home/spencer/kaldi/egs/timit/s5/data/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark, t:- and got the attached error. One thing I noticed is that there is no cmvn_dev.ark in my data folder (no .ark files at all in that folder) is that meant to be the output or should there be a .ark file there? Seems like the error is centered around that file.

mravanelli · 2019-10-03T13:34:31Z

Does /home/spencer/kaldi/egs/timit/s5/data/cmvn_dev.ark exists? Mirco

…

On Thu, 3 Oct 2019 at 09:24, spencerkirn ***@***.***> wrote: Thank you for the quick reply. I apologize if these are basic questions, I am new to using Kaldi and this toolkit. So I ran copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_dev.1.ark ark,t:- and it ran just like you said it should, with a lot of numbers output to the terminal. So after that I ran copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_dev.1.ark ark:- | apply-cmvn --utt2spk=ark:/home/spencer/kaldi/egs/timit/data/dev/utt2spk ark:/home/spencer/kaldi/egs/timit/s5/data/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark, t:- and got the attached error. One thing I noticed is that there is no cmvn_dev.ark in my data folder (no .ark files at all in that folder) is that meant to be the output or should there be a .ark file there? Seems like the error is centered around that file. [image: TIMITError] <https://user-images.githubusercontent.com/49201733/66129779-8fc35180-e5be-11e9-8b3c-d0ea6a826948.PNG> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#157?email_source=notifications&email_token=AEA2ZVRWMRDPA6HIA2ECK3TQMXXCJA5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAIGC4A#issuecomment-537944432>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEA2ZVSLWWHPNLZPMECNK23QMXXCJANCNFSM4IRZSXTQ> .

spencerkirn · 2019-10-03T13:36:30Z

No like I said there are not .ark files in that folder (or subfolders). I thought this might be an output folder, but it looks like the issue is in the creation of those files.

mravanelli · 2019-10-03T14:24:21Z

This cmvn file is created by kaldi during the feature extraction phase and it performs mean and variance normalization. You should probably have the cmvn file somewhere else like in data/dev/cmvn* or mfcc/cmv* Mirco

…

On Oct 3, 2019 09:36, "spencerkirn" ***@***.***> wrote: No like I said there are not .ark files in that folder (or subfolders). I thought this might be an output folder, but it looks like the issue is in the creation of those files. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#157?email_source=notifications&email_token=AEA2ZVREGRGKV5LM674GLATQMXYODA5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAIHFCY#issuecomment-537948811>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEA2ZVQZEBDVBCBQJ4SVEDDQMXYODANCNFSM4IRZSXTQ> .

spencerkirn · 2019-10-09T19:57:05Z

Yea I had the wrong path for cmvn file, but when I run copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_test.1.ark ark,t:- | apply-cmvn --utt2spk=ark:/home/spencer/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/spencer/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- now I get a Kaldi Fatal error

spencerkirn · 2019-10-25T12:54:02Z

In case anyone else has this issue: I resolved it by bypassing the if statement on line 328 of run_exp.py. There was some issue in how the shared_list object was being created that I could not figure out, but the else statement ran the run_nn function in a similar fashion as the training and validation steps.

So I commented out line 328 and created another variable set to False to bypass that if statement.

test=False
#if _run_forwarding_in_subprocesses(config)
if test:

mravanelli · 2019-10-25T14:11:42Z

This is weird, are you sure that you don't have a path problem only?

…

On Fri, 25 Oct 2019 at 08:54, spencerkirn ***@***.***> wrote: In case anyone else has this issue: I resolved it by bypassing the if statement on line 328 of run_exp.py. There was some issue in how the shared_list object was being created that I could not figure out, but the else statement ran the run_nn function in a similar fashion as the training and validation steps. So I commented out line 328 and created another variable set to False to bypass that if statement. test=False #if _run_forwarding_in_subprocesses(config) if test: — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#157?email_source=notifications&email_token=AEA2ZVSA33MYYYZIFPEVEGLQQLT65A5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECIIHSQ#issuecomment-546341834>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVSLDBJZY4BCVRFHB3TQQLT65ANCNFSM4IRZSXTQ> .

spencerkirn · 2019-10-25T15:07:38Z

Yes, I checked all the paths in the config file and they were all correct. Bypassing that if statement though gave a result that looked very similar to the one in the tutorial.

mravanelli · 2019-10-25T15:40:25Z

Interesting, we haven't experimented this issue on our side.

…

On Fri, 25 Oct 2019 at 11:07, spencerkirn ***@***.***> wrote: Yes, I checked all the paths in the config file and they were all correct. Bypassing that if statement though gave a result that looked very similar to the one in the tutorial. [image: TIMITResult] <https://user-images.githubusercontent.com/49201733/67582253-6ad28200-f717-11e9-9d6e-40d0d73a7744.PNG> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#157?email_source=notifications&email_token=AEA2ZVXGWLJEAJNF6KUJHWDQQMDT5A5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECIUSZI#issuecomment-546392421>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVQWJTWPJJDJDTNX2N3QQMDT5ANCNFSM4IRZSXTQ> .

spencerkirn · 2019-10-25T16:19:08Z

There is still an error in the log.log file apparently (I had not check that file when I got the correct result). Something to do with decode_dnn.sh. Looks like the forward_TIMIT_test_ep*_ck*_out_dnn1_to_decode.ark files are not being created for some reason. Though for whatever reason this does not affect the outcome it seems.

mravanelli · 2019-10-25T17:11:34Z

Maybe this file has not been created because there is a problem with test data. Could you better check them? Mirco <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

…

On Fri, 25 Oct 2019 at 12:19, spencerkirn ***@***.***> wrote: There is still an error in the log.log file apparently (I had not check that file when I got the correct result). Something to do with decode_dnn.sh. Looks like the forward_TIMIT_test_ep*_ck*_out_dnn1_to_decode.ark files are not being created for some reason. Though for whatever reason this does not affect the outcome it seems. [image: TIMITError3] <https://user-images.githubusercontent.com/49201733/67587312-965a6a00-f721-11e9-8b54-54dcbcebeef6.PNG> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#157?email_source=notifications&email_token=AEA2ZVQFPWFAZX3MKBOWJTDQQMMABA5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECI27SQ#issuecomment-546418634>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVWADWVSFXVKNZ4NFU3QQMMABANCNFSM4IRZSXTQ> .

kumarh22 · 2019-11-25T07:01:32Z

I am also having error at testing phase

------------------------------ Epoch 23 / 23 ------------------------------
 
----- Summary epoch 23 / 23
Training on ['TIMIT_tr']
Loss = 0.916 | err = 0.290 
-----
Validating on TIMIT_dev
Loss = 1.674 | err = 0.450 
-----
Learning rate on architecture1 = 0.0025 
-----
Elapsed time (s) = 3338

 
Testing TIMIT_test chunk = 1 / 1
Traceback (most recent call last):
  File "run_exp.py", line 475, in <module>
    data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)
  File "/home/dev_ds/pytorch-kaldi/core.py", line 53, in convert_numpy_to_torch
    data_set_inp = torch.from_numpy(data_set_dict["input"]).float()
TypeError: expected np.ndarray (got NoneType)

when printed shared_list [print(shared_list)] in run_exp.py, looks as below.

[None, None, None, {'mfcc': ['mfcc', '/home/dev_ds/kaldi_dnn/egs/timit/s5/exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/dev_ds/kaldi_dnn/egs/timit/s5/data/test/utt2spk ark:/home/dev_ds/kaldi_dnn/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]

I used same validation data [dev] as test data, training and validation have no errors, but testing with same data throwing error.

zhang7346 · 2019-12-27T07:21:40Z

@kumarh22 I got the same problem with you, have you solved?

zhang7346 · 2019-12-27T07:36:01Z

@mravanelli I also got the error in test phase.

Testing TIMIT_test chunk = 1 / 1
info [None, None, None, {'mfcc': ['mfcc', 'exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/zhang/code/kaldi_maked/egs/timit/s5/data/dev/utt2spk ark:/home/zhang/code/kaldi_maked/egs/timit/s5/mfcc/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]
Traceback (most recent call last):
File "run_exp.py", line 476, in
data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)
File "/data00/home/zhang/code/pytorch-kaldi/core.py", line 53, in convert_numpy_to_torch
data_set_inp = torch.from_numpy(data_set_dict["input"]).float()
TypeError: expected np.ndarray (got NoneType)

I had "manually" read the features to debug as you said above. It works in step2, and not came into error in step3(for step3, it runs for such a long time but without error, this is the same with eval file) and the log.log is just prov dopo prima
ps. I am using python3.7, torch 1.0 cpu only version
could you help me

TParcollet · 2019-12-27T09:06:24Z

Is the problem happening if you use the validation or training set as the test set?

zhang7346 · 2019-12-27T13:21:51Z

Is the problem happening if you use the validation or training set as the test set?

yes. I use the validation set as test set, but it still happen

zhang7346 · 2020-01-06T10:21:26Z

Is the problem happening if you use the validation or training set as the test set?

yes. I use the validation set as test set, but it still happen

I find that when I use gpu version, the problem not appear again.

Serhiy-Shekhovtsov · 2020-05-29T10:38:57Z

Had the same issue today. Here are some findings:

Why does it only happen when running on CPU?

Because when CPU is used the forward will run in a subprocess and the method to run forward pass in a subprocess uses another version of read_lab_fea method here - read_lab_fea_refac01. While the same process forward pass will use the original read_lab_fea method.

So why it crashes when using the read_lab_fea_refac01 method?

First of all, because it will switch to production mode when reading fea_dict, lab_dict, arch_dict. By removing this line I fixed the initial issue. But there is another problem.
It will also return -1 as data_end_index and run_nn will crash anyway.

How to fix:

You can update this method to return False. I tried to use read_lab instead of read_lab_fea_refac01 here but it will crash anyway when trying to unpack the shared_list here. The shared_list has 6 items, not 7. There is only one item for the data_end_index data.

hajime9652 closed this as completed Sep 10, 2019

TParcollet reopened this Dec 27, 2019

Serhiy-Shekhovtsov added a commit to sciforce/pytorch-kaldi that referenced this issue May 29, 2020

Fix issue mravanelli#157 training crashes when CPU is used

e519f5f

Serhiy-Shekhovtsov mentioned this issue May 29, 2020

Fix issue #157 - training crashes when CPU is used #230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shared_list does not have data_set in forward block with TIMIT tutorial #157

shared_list does not have data_set in forward block with TIMIT tutorial #157

hajime9652 commented Aug 29, 2019 •

edited

Loading

hajime9652 commented Aug 29, 2019 •

edited

Loading

hajime9652 commented Aug 29, 2019

TParcollet commented Aug 29, 2019

mravanelli commented Aug 29, 2019 via email

hajime9652 commented Aug 30, 2019

hajime9652 commented Sep 5, 2019

hajime9652 commented Sep 5, 2019

spencerkirn commented Oct 2, 2019

mravanelli commented Oct 2, 2019 via email

spencerkirn commented Oct 3, 2019

mravanelli commented Oct 3, 2019 via email

spencerkirn commented Oct 3, 2019

mravanelli commented Oct 3, 2019 via email

spencerkirn commented Oct 9, 2019

spencerkirn commented Oct 25, 2019 •

edited

Loading

mravanelli commented Oct 25, 2019 via email

spencerkirn commented Oct 25, 2019

mravanelli commented Oct 25, 2019 via email

spencerkirn commented Oct 25, 2019

mravanelli commented Oct 25, 2019 via email

kumarh22 commented Nov 25, 2019

zhang7346 commented Dec 27, 2019

zhang7346 commented Dec 27, 2019 •

edited

Loading

TParcollet commented Dec 27, 2019

zhang7346 commented Dec 27, 2019 •

edited

Loading

zhang7346 commented Jan 6, 2020

Serhiy-Shekhovtsov commented May 29, 2020 •

edited

Loading

shared_list does not have data_set in forward block with TIMIT tutorial #157

shared_list does not have data_set in forward block with TIMIT tutorial #157

Comments

hajime9652 commented Aug 29, 2019 • edited Loading

hajime9652 commented Aug 29, 2019 • edited Loading

hajime9652 commented Aug 29, 2019

TParcollet commented Aug 29, 2019

mravanelli commented Aug 29, 2019 via email

hajime9652 commented Aug 30, 2019

hajime9652 commented Sep 5, 2019

hajime9652 commented Sep 5, 2019

spencerkirn commented Oct 2, 2019

mravanelli commented Oct 2, 2019 via email

spencerkirn commented Oct 3, 2019

mravanelli commented Oct 3, 2019 via email

spencerkirn commented Oct 3, 2019

mravanelli commented Oct 3, 2019 via email

spencerkirn commented Oct 9, 2019

spencerkirn commented Oct 25, 2019 • edited Loading

mravanelli commented Oct 25, 2019 via email

spencerkirn commented Oct 25, 2019

mravanelli commented Oct 25, 2019 via email

spencerkirn commented Oct 25, 2019

mravanelli commented Oct 25, 2019 via email

kumarh22 commented Nov 25, 2019

zhang7346 commented Dec 27, 2019

zhang7346 commented Dec 27, 2019 • edited Loading

TParcollet commented Dec 27, 2019

zhang7346 commented Dec 27, 2019 • edited Loading

zhang7346 commented Jan 6, 2020

Serhiy-Shekhovtsov commented May 29, 2020 • edited Loading

Why does it only happen when running on CPU?

So why it crashes when using the read_lab_fea_refac01 method?

How to fix:

hajime9652 commented Aug 29, 2019 •

edited

Loading

hajime9652 commented Aug 29, 2019 •

edited

Loading

spencerkirn commented Oct 25, 2019 •

edited

Loading

zhang7346 commented Dec 27, 2019 •

edited

Loading

zhang7346 commented Dec 27, 2019 •

edited

Loading

Serhiy-Shekhovtsov commented May 29, 2020 •

edited

Loading