Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Won't the code leads to different input shape for different batch? #26

Open
MingleiLI opened this issue Oct 24, 2018 · 0 comments
Open

Comments

@MingleiLI
Copy link

In the file data_util.py, the code is as follows:
`def batch(inputs):
batch_size = len(inputs)

document_sizes = np.array([len(doc) for doc in inputs], dtype=np.int32) # Different batch will
# have different document_sizes.
document_size = document_sizes.max() # Document with maximum sentence number.

sentence_sizes_ = [[len(sent) for sent in doc] for doc in inputs] # every sentence len in each document.
sentence_size = max(map(max, sentence_sizes_)) # The maximum sentence length.

b = np.zeros(shape=[batch_size, document_size, sentence_size], dtype=np.int32) # == PAD

sentence_sizes = np.zeros(shape=[batch_size, document_size], dtype=np.int32)
for i, document in enumerate(inputs):
for j, sentence in enumerate(document):
sentence_sizes[i, j] = sentence_sizes_[i][j]
for k, word in enumerate(sentence):
b[i, j, k] = word

return b, document_sizes, sentence_sizes`
The output batch depends on the inputs. Won't this leads to different shapes of b since the input is not padded before. Each document may have different number of sentences and each sentence may have different number words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant