NaN loss and only OOV in the greedy output #42

debajyotidatta · 2018-11-02T22:06:38Z

The loss initially was decreasing until it reach nan's for a while. I am running it on the squad dataset and the exact argument used for running it is:

python train.py --train_tasks squad --device 0 --data ./.data --save ./results/ --embeddings ./.embeddings/ --train_batch_tokens 2000

So the only change is the train batch tokens to 2000 since my GPU was running out of memory. I am attaching a screenshot. Is there anything I am missing? Should I try something else?

bmccann · 2018-11-16T19:29:04Z

Well that's no good. Let me try running your exact command on my side to see if I get the same thing. Do you know which iteration this first started on? Is it 438000?

Llaneige · 2018-11-21T08:42:50Z

Well that's no good. Let me try running your exact command on my side to see if I get the same thing. Do you know which iteration this first started on? Is it 438000?

I had the same question when I ran
nvidia-docker run -it --rm -v pwd:/decaNLP/ -u $(id -u):$(id -g) bmccann/decanlp:cuda9_torch041 bash -c "python /decaNLP/train.py --train_tasks squad --device 0"
It started at iretation_316800.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN loss and only OOV in the greedy output #42

NaN loss and only OOV in the greedy output #42

debajyotidatta commented Nov 2, 2018

bmccann commented Nov 16, 2018

Llaneige commented Nov 21, 2018

NaN loss and only OOV in the greedy output #42

NaN loss and only OOV in the greedy output #42

Comments

debajyotidatta commented Nov 2, 2018

bmccann commented Nov 16, 2018

Llaneige commented Nov 21, 2018