Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoveAll() function #201

Open
nicksunderland opened this issue Apr 29, 2023 · 2 comments
Open

RemoveAll() function #201

nicksunderland opened this issue Apr 29, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@nicksunderland
Copy link

Is your feature request related to a problem? Please describe.
I would like to clean up annotations that are used to define more specific things. Complex patterns of variable repeating sections may contain a varying number of annotation types, hence specifying each name/result/match index is not always possible. Once the whole section of text has been annotated removal of all, or some, of these sub annotations would be good, to avoid inadvertent matching later.

In Java I was using something like:

FeatureMap fm_double = Factory.newFeatureMap();
fm_double.put("type", "double");
AnnotationSet doubleWithin = inputAS.get("Numeric", fm_double).getContained(
			splitMetricAnnot.firstNode().getOffset(),
			splitMetricAnnot.lastNode().getOffset());
inputAS.removeAll(doubleWithin);

Describe the solution you'd like
A RemoveAll() function where any number of annotation types or names can be specified.

This seems to work, it may be of interest to others:

class RemoveAnnAll:
    """
    Action for removing annotations.
    """

    def __init__(self,
                 name: str | List[str] = None,
                 type: str | List[str] = None,
                 annset_name: str = None,
                 silent_fail: bool = True):
        """
        Create a remove all annotation action.
        Args:
            name: the name, or list of names, of a match(es) from which to get the annotation to remove
            type: the annotation type, or list of types, of annotation within the whole matched pattern to remove
            annset_name: the name of the annotation set to remove the annotation from. If this is the same set
                as used for matching it may influence the matching result if the annotation is removed before
                the remaining matching is done.
                If this is not specified, the annotation set of the (first) input annotation is used.
            silent_fail: if True, silently ignore the error of no annotation to get removed
        """
        assert any([name, type]), \
            f"either name and/or type should be provided [name: {name}, type: {type}]"

        if name is not None:
            assert all(isinstance(c, str) for c in name), \
                f"name must be a string or list of strings but is {name}"
            if isinstance(name, list):
                self.name = name
            else:
                self.name = [name]
        else:
            self.name = None

        if type is not None:
            assert all(isinstance(c, str) for c in type), \
                f"type must be a string or list of strings but is {type}"
            if isinstance(type, list):
                self.type = type
            else:
                self.type = [type]
        else:
            self.type = None

        assert annset_name is None or isinstance(annset_name, str), \
            f"annset_name must be a string or None but is {annset_name}"
        self.annset_name = annset_name
        self.silent_fail = silent_fail

    def __call__(self, succ, context=None, location=None, annset=None):

        anns_to_remove = []

        for i, r in enumerate(succ._results):

            if self.type is not None:
                for ann in r.anns4matches():
                    if ann.type in self.type:
                        anns_to_remove.append(ann)

            if self.name is not None:
                for name in self.name:
                    for match in r.matches4name(name):
                        ann = match.get("ann")
                        anns_to_remove.append(ann)

        if not anns_to_remove:
            if self.silent_fail:
                return
            else:
                raise Exception(
                    f"Could not find annotations of type: {self.type} and / or of name: {self.name}"
                )

        if self.annset_name is not None:
            annset = context.doc.annset(self.annset_name)

        [annset.remove(ann) for ann in anns_to_remove if ann is not None]
@nicksunderland nicksunderland added the enhancement New feature or request label Apr 29, 2023
@johann-petrak
Copy link
Collaborator

I guess this could be useful more generally.
Just a few notes:

  • I would probably make anns_to_remove a set in case the same annotation could get matched in more than one pattern
  • I am not sure what the best way to combine match names and annotation types should be: my intiution would have been to restrict to the given selection of names and types if names or types are specified
    • so an annotation gets removed if it is matched by a specified name match (or no name has specified) AND it is of a type in the list (or no list of types given).
    • so an ann from a match not in the list, or of a type not in the list would not get removed
    • if no name is specified, no restriction is placed on names and anns from all matches get removed (if they match any given types)
    • if no types are specified, annotations of all types get removed (if they come from a match with a listed name)
    • if neither names nor types are specified, ALL annotations get removed.

Would such a changed semantics also be useful to you if it got added as a pre-defined action?

I think we probably also should provide more documentation for users on how to implement their own actions.
The predefiend actions are only meant to serve the most common situations anyways, as basically any python code can be run on the result matches.

@nicksunderland
Copy link
Author

Thanks for the feedback, your suggestions sound like the best approach.

Yes, extra documentation would be nice, but I guess it's not too hard to work out and not meant to cover all niche uses.

N

johann-petrak added a commit that referenced this issue May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants