Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test data intruncated #18

Open
KiddoKiddo opened this issue Oct 21, 2017 · 2 comments
Open

Test data intruncated #18

KiddoKiddo opened this issue Oct 21, 2017 · 2 comments
Assignees

Comments

@KiddoKiddo
Copy link

Hi,
I have an issue that my test data has total 36790 rows, but after using the prediction.py, my output file only has 36560 rows. I already checked dev_parsed.json, which has enough data. Can you help to pin point what is the problem? Many thanks.
Regards,
Thy

@mahnerak mahnerak assigned mahnerak and MartinXPN and unassigned mahnerak Oct 25, 2017
@EireneX
Copy link

EireneX commented Nov 6, 2017

Encountered same issue. Could it be related to word2vec?
Saw some WARN message in log during proprocess:

 WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)
[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‍ (U+200D, decimal: 8205)

@YouseYeung
Copy link

Hi,
I also want to use this model to test my dataset. But I got an error. Can you tell me how to define my own dataset in a file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants