You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the file data_util.py, the code is as follows:
`def batch(inputs):
batch_size = len(inputs)
document_sizes = np.array([len(doc) for doc in inputs], dtype=np.int32) # Different batch will
# have different document_sizes.
document_size = document_sizes.max() # Document with maximum sentence number.
sentence_sizes_ = [[len(sent) for sent in doc] for doc in inputs] # every sentence len in each document.
sentence_size = max(map(max, sentence_sizes_)) # The maximum sentence length.
b = np.zeros(shape=[batch_size, document_size, sentence_size], dtype=np.int32) # == PAD
sentence_sizes = np.zeros(shape=[batch_size, document_size], dtype=np.int32)
for i, document in enumerate(inputs):
for j, sentence in enumerate(document):
sentence_sizes[i, j] = sentence_sizes_[i][j]
for k, word in enumerate(sentence):
b[i, j, k] = word
return b, document_sizes, sentence_sizes`
The output batch depends on the inputs. Won't this leads to different shapes of b since the input is not padded before. Each document may have different number of sentences and each sentence may have different number words.
The text was updated successfully, but these errors were encountered:
In the file data_util.py, the code is as follows:
`def batch(inputs):
batch_size = len(inputs)
document_sizes = np.array([len(doc) for doc in inputs], dtype=np.int32) # Different batch will
# have different document_sizes.
document_size = document_sizes.max() # Document with maximum sentence number.
sentence_sizes_ = [[len(sent) for sent in doc] for doc in inputs] # every sentence len in each document.
sentence_size = max(map(max, sentence_sizes_)) # The maximum sentence length.
b = np.zeros(shape=[batch_size, document_size, sentence_size], dtype=np.int32) # == PAD
sentence_sizes = np.zeros(shape=[batch_size, document_size], dtype=np.int32)
for i, document in enumerate(inputs):
for j, sentence in enumerate(document):
sentence_sizes[i, j] = sentence_sizes_[i][j]
for k, word in enumerate(sentence):
b[i, j, k] = word
return b, document_sizes, sentence_sizes`
The output batch depends on the inputs. Won't this leads to different shapes of b since the input is not padded before. Each document may have different number of sentences and each sentence may have different number words.
The text was updated successfully, but these errors were encountered: