Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prob. Values are not consistent if the padding amount changes #32

Open
agupta74 opened this issue Mar 18, 2018 · 8 comments
Open

Prob. Values are not consistent if the padding amount changes #32

agupta74 opened this issue Mar 18, 2018 · 8 comments

Comments

@agupta74
Copy link

The final softmax prob. values are not same if the padding amount changes. It looks like that for some of the functions such as reduce_mean and reduce_max the padding is not masked out. Also the word/char embeddings for zero padded tokens is not zero.

@zhiguowang
Copy link
Owner

Not sure which part are you talking about. But I always multiply a mask to filter out the influence of padding values.

@agupta74
Copy link
Author

agupta74 commented Apr 4, 2018

In matchutils.py, tf.reduce_max and tf.reduce_mean (e.g Lines 150 and 151 -- there are other places in matchutils.py using these functions) are computed without masking. So if you vary the zero padding for the same sentence/passage or query, the final prob. values are different. The amount of zero padding should not impact the final softmax prob. values for a given query/passage.

@agupta74
Copy link
Author

agupta74 commented Apr 4, 2018

With regard to the word/char embeddings for "zero" word/char tokens in the zero-padded query/passage, should these zero word embeddings not be a zero vector ? Any non-zero values for zero padded tokens seems to impact the intermediate cosine similarity score (and therefore final softmax score) when I change the zero padding amount.

@zhiguowang
Copy link
Owner

I multiplied mask at this line https://github.com/zhiguowang/BiMPM/blob/master/src/match_utils.py#L145 before line 150 and line 151.

Before any matching layer, I will also multiply a mask to filter out padding values https://github.com/zhiguowang/BiMPM/blob/master/src/match_utils.py#L139

@agupta74
Copy link
Author

agupta74 commented Apr 4, 2018

I agree that you mask out the values. However for the mean calculation the number of padded zeros will impact the mean value even though the sum does not get impacted. So if my sentence length changes due to different amount of padding based on the max sentence length in a batch, the mean value would change even though the padded values are masked out to zero.

Similarly for the max calculation since the padded values are masked out to zero, the max will be zero if the actual values in the matrix are all negative. We need to remove the padded values for max calculation since a value of zero impacts the max calculation (the padded values need to be set to negative infinity so that it does not impact the ma calculation)

I can see that in the the latest codebase you have masked out the question/passgae representation for the padded word tokens. I guess in the older codebase that did not happen. Thanks for fixing this issue !

@zhiguowang
Copy link
Owner

Oh, I see what you mean. You are right, the padding zeros will affect the mean calculation. I will think about it. Thanks!

@agupta74
Copy link
Author

agupta74 commented Apr 9, 2018

Thanks ! Also please look into the max calculation issue as mentioned in my comments above

@agupta74
Copy link
Author

@zhiguowang Any update on this ? I can submit a PR for the fix if that helps so that you can verify the fix and perform any other testing that is required and then merge. Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants