Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on crf layer, why loop through batch before crf layer? #37

Open
lkqnaruto opened this issue Nov 27, 2021 · 4 comments
Open

Question on crf layer, why loop through batch before crf layer? #37

lkqnaruto opened this issue Nov 27, 2021 · 4 comments

Comments

@lkqnaruto
Copy link

I checked out the code, where you said :

 # 1- the CRF package assumes the mask tensor cannot have interleaved
# zeros and ones. In other words, the mask should start with True
# values, transition to False at some moment and never transition
# back to True. That can only happen for simple padded sequences.
# 2- The first column of mask tensor should be all True, and we
# cannot guarantee that because we have to mask all non-first
# subtokens of the WordPiece tokenization.

Can you explain a little bit on that? I'm still confused what you mean here. What does it mean by "interleaved zeros and ones"?

Thank you

@fabiocapsouza
Copy link
Contributor

Hi @lkqnaruto ,

By interleaved zeros and ones, I meant a mask like [0, 1, 0, 1, 1, 0, 0, 0, 1, ...] instead of [1, 1, 1, 1, 0, 0, 0]. Because we are using WordPiece that is subword tokenization, all word continuation tokens (that start with ##) do not have an associated tag prediction for NER task, otherwise words that are tokenized into 2+ tokens would have multiple predictions.

For instance, suppose we have these tokens:
tokens = ["[CLS]", "Al", "##bert", "Ein", "##stein", ...]
The mask would be
mask = [0, 1, 0, 1, 0, ...]
which is incompatible with the CRF package. So we have index the sequence using the mask and pass only
["Al", "Ein", ...] to CRF.

The mask is different for each sequence of the batch and have different lengths (sum of 1's), so this masking is not trivial to do without an explicit for loop.

@lkqnaruto
Copy link
Author

Hi @lkqnaruto ,

By interleaved zeros and ones, I meant a mask like [0, 1, 0, 1, 1, 0, 0, 0, 1, ...] instead of [1, 1, 1, 1, 0, 0, 0]. Because we are using WordPiece that is subword tokenization, all word continuation tokens (that start with ##) do not have an associated tag prediction for NER task, otherwise words that are tokenized into 2+ tokens would have multiple predictions.

For instance, suppose we have these tokens: tokens = ["[CLS]", "Al", "##bert", "Ein", "##stein", ...] The mask would be mask = [0, 1, 0, 1, 0, ...] which is incompatible with the CRF package. So we have index the sequence using the mask and pass only ["Al", "Ein", ...] to CRF.

The mask is different for each sequence of the batch and have different lengths (sum of 1's), so this masking is not trivial to do without an explicit for loop.

Thank you for the reply, a follow up question: Why we have to loop through each batch before crf? I think crf package can handle batch-wise calculation.

@ViktorooReps
Copy link

Hi @fabiocapsouza,

I'm experimenting with different ways of subword handling for CRF layer. Why have you chosen to just take first subtoken? Wouldn't some sort of pooling of subword representations work better?

I would greatly appreciate if you could share your thoughts on the matter!

@fabiocapsouza
Copy link
Contributor

Hi @ViktorooReps ,
I used the first subtoken because it is the way BERT does it for NER, so it is the simplest way to add CRF on top of it. Yeah, maybe some sort of pooling could be better, even though subword representations are already contextual. It would be a nice experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants