error message from spacy training #13715
Unanswered
mcveigh-h16
asked this question in
Help: Coding & Implementations
Replies: 1 comment
-
Followup. I tried removing all non-ascii characters which I am sure will help but am still encountering the same error so clearly that wasn't it. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am encountering an error message when trying to train spacy data. I suspect it's a problem with the training data but am not sure what the issue is. I created the NER labels with (https://arunmozhi.in/ner-annotator/). I then used spacy convert to create the .spacy file from the .json file
python -m spacy convert ./annotations_strain1.json ./
The training command then generates and error
python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy
⚠ Aborting and saving the final best model. Encountered exception:
ValueError('[E986] Could not create any training batches: check your input. Are
the train and dev paths defined? Is
discard_oversize
set appropriately? ')I also tried debug and get this
python -m spacy debug data config.cfg --paths.train ./train.spacy --paths.dev ./train.spacy
============================ Data file validation ============================
✔ Pipeline can be initialized with data
✔ Corpus is loadable
=============================== Training stats ===============================
Language: en
Training pipeline:
0 training docs
0 evaluation docs
✔ No overlap between training and evaluation data
✘ Low number of examples to train a new pipeline (0)
============================== Vocab & Vectors ==============================
ℹ 0 total word(s) in the data (0 unique)
ℹ No word vectors present in the package
================================== Summary ==================================
✔ 3 checks passed
✘ 1 error
Any clue on where the problem is? The training data .json is attached and contains many non-ascii characters, I suspect this could be the issue but don't find any documentation on that.
annotations_strain1.json
Beta Was this translation helpful? Give feedback.
All reactions