You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reproduced this problem.
For the MASC sentence corpus, the MASCReader returns 1865 empty Cas's and 13754 normal
Cas's
I copied a version of the MASCReader into a local project and run the following pipeline:
String patterns = "round*/*-v/*-wn.xml";
SimplePipeline.runPipeline(
createReaderDescription(
MascReader.class,
MascReader.PARAM_IGNORE_TIES, true,
MascReader.PARAM_SOURCE_LOCATION, MASCDirectory,
MascReader.PARAM_PATTERNS, new String[] {
ResourceCollectionReaderBase.INCLUDE_PREFIX + patterns
}),
createEngineDescription(LanguageToolSegmenter.class),
createEngineDescription(MascProblemFinder.class)
//createEngineDescription(CasDumpWriter.class)
);
I modified the MASCReader to return a sentence instead of an empty Cas: this is where
the problem is introduced:
// if no tie between annotators is discovered
if (documentText != null) {
setDocumentMetadata(jCas, node);
jCas.setDocumentText(documentText);
}
else {
setDocumentMetadata(jCas, node);
jCas.setDocumentText("This is an empty Cas.");
//jCas.reset(); // TODO here the CAS is emptied
}
I don't recall much about the MASC corpus format, so I don't have much context to help
me interpret this problem report. I take it from reading the code that the empty CAS
was returned only in those cases where there was a tie between the annotators. Is
this perhaps the intended behaviour? If not, is your modified code above intended
to fix the problem?
Originally reported on Google Code with ID 65
Reported by
MedKhemakhemFSEGS
on 2014-12-03 18:54:40- _Attachment: [mascModified.groovy](https://storage.googleapis.com/google-code-attachments/dkpro-wsd/issue-65/comment-0/mascModified.groovy)_
The text was updated successfully, but these errors were encountered: