Skip to content

Commit

Permalink
feat: add required & include parameter & support for span_getter in e…
Browse files Browse the repository at this point in the history
…ds.contextual_matcher
  • Loading branch information
percevalw committed May 12, 2024
1 parent 0c6361b commit 53bcc0e
Show file tree
Hide file tree
Showing 8 changed files with 381 additions and 236 deletions.
2 changes: 2 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

- Expose the defaults patterns of `eds.negation`, `eds.hypothesis`, `eds.family`, `eds.history` and `eds.reported_speech` under a `eds.negation.default_patterns` attribute
- Added a `context_getter` SpanGetter argument to the `eds.matcher` class to only retrieve entities inside the spans returned by the getter
- Added a `filter_expr` parameter to scorers to filter the documents to score
- Added a new `required` field to `eds.contextual_matcher` assign patterns to only match if the required field has been found, and an `include` parameter (similar to `exclude`) to search for required patterns without assigning them to the entity

## v0.11.2

Expand Down
17 changes: 17 additions & 0 deletions docs/assets/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -166,3 +166,20 @@ body, input {
.md-typeset code a:not(.md-annotation__index) {
border-bottom: 1px dashed var(--md-typeset-a-color);
}

.doc-param-details .subdoc {
padding: 0;
box-shadow: none;
border-color: var(--md-typeset-table-color);
}

.doc-param-details .subdoc > div > div > div> table {
padding: 0;
box-shadow: none;
border: none;
}

.doc-param-details .subdoc > summary {
margin: 0;
font-weight: normal;
}
70 changes: 2 additions & 68 deletions docs/pipes/core/contextual-matcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,74 +206,6 @@ Let us see what we can get from this pipeline with a few examples

However, most of the configuration is provided in the `patterns` key, as a **pattern dictionary** or a **list of pattern dictionaries**

## The pattern dictionary

### Description

A patterr is a nested dictionary with the following keys:

=== "`source`"

A label describing the pattern

=== "`regex`"

A single Regex or a list of Regexes

=== "`regex_attr`"

An attributes to overwrite the given `attr` when matching with Regexes.

=== "`terms`"

A single term or a list of terms (for exact matches)

=== "`exclude`"

A dictionary (or list of dictionaries) to define exclusion rules. Exclusion rules are given as Regexes, and if a
match is found in the surrounding context of an extraction, the extraction is removed. Each dictionary should have the following keys:

=== "`window`"

Size of the context to use (in number of words). You can provide the window as:

- A positive integer, in this case the used context will be taken **after** the extraction
- A negative integer, in this case the used context will be taken **before** the extraction
- A tuple of integers `(start, end)`, in this case the used context will be the snippet from `start` tokens before the extraction to `end` tokens after the extraction

=== "`regex`"

A single Regex or a list of Regexes.

=== "`assign`"

A dictionary to refine the extraction. Similarily to the `exclude` key, you can provide a dictionary to
use on the context **before** and **after** the extraction.

=== "`name`"

A name (string)

=== "`window`"

Size of the context to use (in number of words). You can provide the window as:

- A positive integer, in this case the used context will be taken **after** the extraction
- A negative integer, in this case the used context will be taken **before** the extraction
- A tuple of integers `(start, end)`, in this case the used context will be the snippet from `start` tokens before the extraction to `end` tokens after the extraction

=== "`regex`"

A dictionary where keys are labels and values are **Regexes with a single capturing group**

=== "`replace_entity`"

If set to `True`, the match from the corresponding assign key will be used as entity, instead of the main match. See [this paragraph][the-replace_entity-parameter]

=== "`reduce_mode`"

Set how multiple assign matches are handled. See the documentation of the [`reduce_mode` parameter][the-reduce_mode-parameter]

### A full pattern dictionary example

```python
Expand All @@ -300,6 +232,8 @@ dict(
regex=r"(neonatal)",
expand_entity=True,
window=3,
# keep the extraction only if neonatal is found
required=True,
),
dict(
name="trans",
Expand Down
4 changes: 2 additions & 2 deletions edsnlp/matchers/regex.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import re
from bisect import bisect_left, bisect_right
from typing import Any, Dict, List, Optional, Tuple, Union
from typing import Any, Dict, Iterator, List, Optional, Tuple, Union

from loguru import logger
from spacy.tokens import Doc, Span
Expand Down Expand Up @@ -465,7 +465,7 @@ def __call__(
doclike: Union[Doc, Span],
as_spans=False,
return_groupdict=False,
) -> Union[Span, Tuple[Span, Dict[str, Any]]]:
) -> Iterator[Union[Span, Tuple[Span, Dict[str, Any]]]]:
"""
Performs matching. Yields matches.
Expand Down
Loading

0 comments on commit 53bcc0e

Please sign in to comment.