Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changing predefined recognizers to use the config file #1393

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

RoeyBC
Copy link
Collaborator

@RoeyBC RoeyBC commented May 30, 2024

Change Description

This PR is going to accomplish the following:

  1. Refactor load_predefined_recognizers to load the settings from the default yml file
  2. Add warning in case recognizer is listed with a language not supported by the registry
  3. raise an exception in case there's discrepancy between the supported language in the registry and the one listed in the analyzer
  4. Edit the docs accordingly I guess :D

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

@RoeyBC
Copy link
Collaborator Author

RoeyBC commented May 31, 2024

image
when refactored the codebase, added documentation & testing and you've got less text, you know it's a good change :D

if languages is not None:
registry_configuration["supported_languages"] = languages

recognizers = RecognizerRegistryProvider(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is working, but it feels like a weakspot. Is it complex to extract this to an external module that both this class and RecognizerRegistryProvoder uses?

registry_configuration=registry_configuration
).create_recognizer_registry().recognizers

self.recognizers.extend(recognizers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth checking for duplications. Isn't the nlp provider added twice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recognizer_registry doesn't add the nlp recognizer, so there's no duplication. In fact when the analyzer get initialized without a registry we do the following:

        if not registry:
            logger.info("registry not provided, creating default.")
            provider = RecognizerRegistryProvider(
                registry_configuration={"supported_languages": self.supported_languages}
            )
            registry = provider.create_recognizer_registry()
            registry.add_nlp_recognizer(nlp_engine=self.nlp_engine)

Which is basically this function now :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it add it if it's in the yaml?

presidio-analyzer/tests/test_analyzer_engine.py Outdated Show resolved Hide resolved
docs/analyzer/analyzer_engine_provider.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants