Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA ERROR occured in training #23

Open
lydia07 opened this issue Apr 22, 2020 · 1 comment
Open

CUDA ERROR occured in training #23

lydia07 opened this issue Apr 22, 2020 · 1 comment

Comments

@lydia07
Copy link

lydia07 commented Apr 22, 2020

The comandline information is as follows:
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [604,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed.
Traceback (most recent call last):
File "main.py", line 16, in
main()
File "main.py", line 9, in main
trainer.train()
File "/data/hyx/workspace/test/neural-question-generation/trainer.py", line 81, in train
batch_loss = self.step(train_data)
File "/data/hyx/workspace/test/neural-question-generation/trainer.py", line 124, in step
enc_outputs, enc_states = self.model.encoder(src_seq, src_len, tag_seq)
File "/data/hyx/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/data/hyx/workspace/test/neural-question-generation/model.py", line 56, in forward
packed = pack_padded_sequence(embedded, src_len, batch_first=True)
File "/data/hyx/miniconda3/lib/python3.7/site-packages/torch/nn/utils/rnn.py", line 223, in pack_padded_sequence
lengths = torch.as_tensor(lengths, dtype=torch.int64)
RuntimeError: CUDA error: device-side assert triggered

This problem is quite strange because I used to train this model sucessfully.
To solve this problem, I tried to use cpu for more error information, however I couldn't use it.
Could you please offer me some help? Do you know anything about this issue?
Thanks a lot!

@seanie12
Copy link
Owner

seanie12 commented Jun 1, 2020

Maybe some ids for token are greater than vocab size. I ve updated my repo. Please run the code again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants