Merging a noun_chunk slice for Hearst Pattern Detection #5450
-
How to reproduce the behaviourI'm attempting to implement the code from this repository using spaCy matcher in place of regex: I am having problems with the retokenizer for merging noun_chunks. The overall problem is to remove modifier terms such as, "other" and "some other". They are normally included within the span of a noun_chunk, but are required to be separate as such terms are predicates for particular Hearst Patterns. The following code has been written to address this problem: `
` It seems including Changing Do you know what the problem here is? Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
Sorry to hear you're running into trouble. To help us investigate what may be going on, could you provide a minimal working code snippet that we can run and exhibits the errors you're getting? You can make the code above self-contained by adding an example text and show which It would help being able to execute this, as then we can also access the error stack trace etc. |
Beta Was this translation helpful? Give feedback.
-
I suspect the problem is related to trying to retokenize a 0-length span, in effect something like this:
The retokenizer should reject this but doesn't, and the resulting doc is malformed. |
Beta Was this translation helpful? Give feedback.
-
Thank you both for the prompt feedback, I'll take a look at the 0-length span problem. |
Beta Was this translation helpful? Give feedback.
-
Thank you both, that has actually solved the problem, it was trying to retokenize a span of zero length. The problem is where the for the phrase 'one kind' the loop was triggered firstly by the predicate Placing I may have uncovered a second bug with merge_noun_chunks, and will post that as a separate issue. |
Beta Was this translation helpful? Give feedback.
-
Sorry, while I thought this was fixed there seems to be a problem with trying to merge a slice of an existing noun chunk. The code has been modified as follows: `
The problem is happening at the
While it is possible to create a custom attribute containing the filtered spans, I need the noun_chunk spans to be merged within the Where the point is the merge the slice of a span, would using So you have any ideas as to where I'm going wrong here? |
Beta Was this translation helpful? Give feedback.
-
Solved: The problem indeed is merging a Have developed the following to prevent zero length chunks: `
` |
Beta Was this translation helpful? Give feedback.
Solved: The problem indeed is merging a
noun_chunk
slice of zero length.Have developed the following to prevent zero length chunks:
`