Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

Implausibly high surprisal for </s> in ngram model #56

Open
rlevy opened this issue Sep 12, 2020 · 1 comment
Open

Implausibly high surprisal for </s> in ngram model #56

rlevy opened this issue Sep 12, 2020 · 1 comment

Comments

@rlevy
Copy link

rlevy commented Sep 12, 2020

This seems wrong: why does we get such a huge surprisal for sentence-end after a period? Input file was:

This is a short sentence.

Command & output:

$ lm-zoo get-surprisals ngram ~/tmp/sentences.txt
reading /opt/srilm/checkpoint/model.lm in binary format
sentence_id	token_id	token	surprisal
1	1	this	5.29354
1	2	is	3.1117
1	3	a	2.92768
1	4	short	9.45191
1	5	sentence	12.0459
1	6	.	3.6674900000000004
1	7	</s>	28.1537

Doesn't happen for GRNN (the -0.0 is a tiny bit funny but probably not worrying about):

$ lm-zoo get-surprisals GRNN ~/tmp/sentences.txt
sentence_id	token_id	token	surprisal
1	1	This	0.0
1	2	is	1.7249029999999999
1	3	a	1.4204510000000001
1	4	short	8.294603
1	5	sentence	10.343164
1	6	.	3.59838
1	7	<eos>	-0.0
@bnicenboim
Copy link

isn't it strange that "This" has a surprisal of 0.0 as well??
@rlevy , I haven't seen any reaction in the issues or the chat (https://gitter.im/lm-zoo/community), is this project still alive?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants