You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.
I was trying to run some model with syntaxgym, and found that the ngram model fails for syntaxgym run command. And this seems to be caused due to incorrect spec information in the model (cased information), probably defined here.
This should probably be false? Because the ngram tokenizer outputs uncased tokens, this mismatch seems to cause a problem in alignment in tokenize_regions method in Sentence class. The error message looks like:
File "/.../lib/python3.7/site-packages/syntaxgym/agg_surprisals.py", line 58, in aggregate_surprisals
raise utils.TokenMismatch(token, sent_tokens[t_idx], t_idx+2)
syntaxgym.utils.TokenMismatch:
tokens "painting" and "the" do not match (line 2 in surprisal file)
Thank you!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
First, thank you for releasing this. Great work!
I was trying to run some model with
syntaxgym
, and found that thengram
model fails forsyntaxgym run
command. And this seems to be caused due to incorrect spec information in the model (cased
information), probably defined here.lm-zoo/models/ngram/spec.template.json
Line 31 in 5c72f5a
This should probably be
false
? Because the ngram tokenizer outputs uncased tokens, this mismatch seems to cause a problem in alignment intokenize_regions
method inSentence
class. The error message looks like:Thank you!
The text was updated successfully, but these errors were encountered: