Skip to content

Latest commit

 

History

History
127 lines (98 loc) · 3.87 KB

presidio.md

File metadata and controls

127 lines (98 loc) · 3.87 KB

Presidio Integration

NeMo Guardrails supports detecting sensitive data out-of-the-box using Presidio, which provides fast identification and anonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more. You can detect sensitive data on user input, bot output, or the relevant chunks retrieved from the knowledge base.

Setup

To use the built-in sensitive data detection rails, you must install Presidio and download the en_core_web_lg model for spacy.

pip install presidio-analyzer presidio-anonymizer spacy
python -m spacy download en_core_web_lg

As an alternative, you can also use the sdd extra.

pip install nemoguardrails[sdd]
python -m spacy download en_core_web_lg

Usage

You can activate sensitive data detection in three ways: input rail, output rail, and retrieval rail.

Input Rail

To activate a sensitive data detection input rail, you have to configure the entities that you want to detect:

rails:
  config:
    sensitive_data_detection:
      input:
        entities:
          - PERSON
          - EMAIL_ADDRESS
          - ...

For the complete list of supported entities, please refer to Presidio - Supported Entities page.

Also, you have to add the detect sensitive data on input or mask sensitive data on input flows to the list of input rails:

rails:
  input:
    flows:
      - ...
      - mask sensitive data on input     # or 'detect sensitive data on input'
      - ...

When using detect sensitive data on input, if sensitive data is detected, the bot will refuse to respond to the user's input. When using mask sensitive data on input the bot will mask the sensitive parts in the user's input and continue the processing.

Output Rail

The configuration for the output rail is very similar to the input rail:

rails:
  config:
    sensitive_data_detection:
      output:
        entities:
          - PERSON
          - EMAIL_ADDRESS
          - ...

  output:
    flows:
      - ...
      - mask sensitive data on output     # or 'detect sensitive data on output'
      - ...

Retrieval Rail

The configuration for the retrieval rail is very similar to the input/output rail:

rails:
  config:
    sensitive_data_detection:
      retrieval:
        entities:
          - PERSON
          - EMAIL_ADDRESS
          - ...

  retrieval:
    flows:
      - ...
      - mask sensitive data on retrieval     # or 'detect sensitive data on retrieval'
      - ...

Custom Recognizers

If you have custom entities that you want to detect, you can define custom recognizers. For more details, check out this tutorial and this example.

Below is an example of configuring a TITLE entity and detecting it inside the input rail.

rails:
  config:
    sensitive_data_detection:
      recognizers:
        - name: "Titles recognizer"
          supported_language: "en"
          supported_entity: "TITLE"
          deny_list:
            - Mr.
            - Mrs.
            - Ms.
            - Miss
            - Dr.
            - Prof.
      input:
        entities:
          - PERSON
          - TITLE

Custom Detection

If you want to implement a completely different sensitive data detection mechanism, you can override the default actions detect_sensitive_data and mask_sensitive_data.