Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN loss and only OOV in the greedy output #42

Open
debajyotidatta opened this issue Nov 2, 2018 · 2 comments
Open

NaN loss and only OOV in the greedy output #42

debajyotidatta opened this issue Nov 2, 2018 · 2 comments

Comments

@debajyotidatta
Copy link

The loss initially was decreasing until it reach nan's for a while. I am running it on the squad dataset and the exact argument used for running it is:

python train.py --train_tasks squad --device 0 --data ./.data --save ./results/ --embeddings ./.embeddings/ --train_batch_tokens 2000

So the only change is the train batch tokens to 2000 since my GPU was running out of memory. I am attaching a screenshot. Is there anything I am missing? Should I try something else?

screenshot 2018-11-02 14 35 47

@bmccann
Copy link
Contributor

bmccann commented Nov 16, 2018

Well that's no good. Let me try running your exact command on my side to see if I get the same thing. Do you know which iteration this first started on? Is it 438000?

@Llaneige
Copy link

Well that's no good. Let me try running your exact command on my side to see if I get the same thing. Do you know which iteration this first started on? Is it 438000?

I had the same question when I ran
nvidia-docker run -it --rm -v pwd:/decaNLP/ -u $(id -u):$(id -g) bmccann/decanlp:cuda9_torch041 bash -c "python /decaNLP/train.py --train_tasks squad --device 0"
It started at iretation_316800.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants