Feat: Add support for local Hugging Face text classifiers #1612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

RobGeada wants to merge 3 commits into NVIDIA-NeMo:develop from RobGeada:HuggingfaceBuiltIn

examples/configs/huggingface_detector/README.md

-Original file line number
+Diff line change
@@ -0,0 +1,70 @@
+    # Huggingface Detector
+    This example showcases how to use inline Huggingface text classification models for content detection in NeMo Guardrails.
+    ## Overview
+    The Huggingface detector allows you to use any text classification model from Huggingface to detect
+    and block specific content categories in user inputs or bot outputs. All models are run locally.
+    ## Prerequisites
+    Install the `transformers` library:
+    ```bash
+    pip install transformers
+    ```
+    For GPU acceleration (recommended):
+    ```bash
+    pip install transformers torch
+    ```
+    ## Configuration
+    The example `config.yml` demonstrates:
+. **Multiple Model Configuration**: Configure multiple Huggingface models with different purposes
+. **Device Specification**: Set which device (CPU/GPU) each model should use
+. **Input and Output Checking**: Apply different models to user inputs and bot outputs
+. **Flexible Class Blocking**: Use either class labels or indices to specify which classes trigger blocking
+    ### Key Configuration Options
+    - **model_repo**: Huggingface model repository ID (e.g., `"ibm-granite/granite-guardian-hap-38m"`)
+    - **descriptor**: Human-readable description of what the model detects
+    - **blocked_classes**: List of class labels or indices that should trigger blocking
+    - **device**: Torch device to load the model onto (`"cuda"`, `"cpu"`, `"cuda:0"`, etc.)
+    ### Device Configuration
+    The `device` field allows you to optimize performance:
+    - Use `"cuda"` for GPU acceleration (faster inference, requires GPU)
+    - Use `"cpu"` for CPU inference (slower but works on any machine)
+    - Use `"cuda:0"`, `"cuda:1"`, etc. for specific GPU devices in multi-GPU setups
+    - Omit the field to use the `transformers` default device
+    See the `torch.device` documentation for full usage. Different models can use different devices
+    based on your requirements.
+    ## Running the Example
+    ```bash
+    nemoguardrails chat --config=examples/configs/huggingface_detector
+    ```
+    ## Provided flows
+. `huggingface detector check input $hf_model=$HF_ORG/$HF_MODEL`
+. `huggingface detector check output $hf_model=$HF_ORG/$HF_MODEL`
+. `huggingface detector check tool input $hf_model=$HF_ORG/$HF_MODEL`
+. `huggingface detector check tool output $hf_model=$HF_ORG/$HF_MODEL`
+    ## Performance Tips
+    - Use smaller models for faster inference
+    - Put frequently-used models on GPU, less-used models on CPU
+    - Models are cached after first load to avoid reloading
+    - Only models activated in your flows will be loaded

examples/configs/huggingface_detector/config.yml

-Original file line number
+Diff line change
@@ -0,0 +1,28 @@
+    models:
+      - type: main
+        engine: openai
+        model: gpt-3.5-turbo
+    rails:
+      config:
+        huggingface_detector:
+          models:
+            # Example 1: Harmful content detection on GPU
+            - model_repo: "ibm-granite/granite-guardian-hap-38m"
+              descriptor: "Harmful and abusive language detector"
+              blocked_classes: [0]
+              device: "cuda"  # Load on GPU for faster inference
+            # Example 2: Prompt injection detection on CPU
+            - model_repo: "protectai/deberta-v3-base-prompt-injection-v2"
+              descriptor: "Prompt injection detector"
+              blocked_classes: ["INJECTION"]
+              device: "cpu"  # Load on CPU
+      input:
+        flows:
+          # Check user input for prompt injection attempts
+          - huggingface detector check input $hf_model="protectai/deberta-v3-base-prompt-injection-v2"
+      output:
+        flows:
+          # Check bot output for harmful content before sending to user
+          - huggingface detector check output $hf_model="ibm-granite/granite-guardian-hap-38m"

nemoguardrails/library/huggingface_detector/README.md

-Original file line number
+Diff line change
@@ -0,0 +1,273 @@
+    # Huggingface Detector
+    The Huggingface detector allows you to use any text classification model from Huggingface Hub to detect and block specific content categories in user inputs or bot outputs.
+    ## Features
+    - Configure multiple Huggingface text classification models
+    - Selectively activate specific models per flow
+    - Configure which classes should trigger blocking for each model
+    - Support for both input and output checking
+    - Model caching for efficient inference
+    ## Installation
+    The Huggingface detector requires the `transformers` library:
+    ```bash
+    pip install transformers
+    ```
+    For GPU acceleration (recommended):
+    ```bash
+    pip install transformers torch
+    ```
+    ## Configuration
+    Add the following to your `config.yml`:
+    ```yaml
+      input:
+        flows:
+          - huggingface detector check input $hf_model="some/hf-model"
+      config:
+        huggingface_detector:
+          models:
+            - model_repo: "some/hf-model"
+              descriptor: "hate speech detector"
+              blocked_classes:
+                - "classA"
+    ```
+    ### Configuration Options
+    - **models** (required): List of model configurations. Each model has:
+      - **model_repo** (required): Huggingface model repository ID (e.g., `"ibm-granite/granite-guardian-hap-38m"`)
+      - **descriptor** (optional): Human-readable description of what this model detects (e.g., `"Harmful language detector"`). This is useful for providing more informative detection messages if the model repo name is not particularly descriptive.
+      - **blocked_classes** (required): List of class labels (strings) or class indices (integers) that should trigger blocking
+        - Can use class labels: `["harmful", "violence", "hate_speech"]`
+        - Can use class indices: `[0, 1, 2]`
+        - Must be either all strings OR all integers per model, not mixed
+        - Use indices if your model doesn't provide label mappings (id2label)
+      - **device** (optional): Device to load the model onto (e.g., `"cuda"`, `"cpu"`, `"cuda:0"`). If not specified, PyTorch will use its default device selection.
+    **Important:** Configure all models you want to use here, then specify which model to use in your flows with the `$hf_model` parameter.
+    ## Usage
+    Configure multiple models and selectively activate specific ones by specifying the model repository parameter.
+    The flows accept a `$hf_model` parameter to specify which configured model to use:
+    - `huggingface detector check input $hf_model="<model-repo>"`
+    - `huggingface detector check output $hf_model="<model-repo>"`
+    ## Example Configuration
+    ### Single Model - Input Checking
+    Check user inputs using the IBM Granite Guardian HAP model:
+    ```yaml
+    models:
+      - type: main
+        engine: openai
+        model: gpt-3.5-turbo
+    rails:
+      input:
+        flows:
+          - huggingface detector check input $hf_model="some/hf-model"
+      config:
+        huggingface_detector:
+          models:
+            - model_repo: "some/hf-model"
+              descriptor: "hate speech detector"
+              blocked_classes:
+                - "classA"
+    ```
+    ### Multiple Models - Input Checking
+    Check inputs against multiple models sequentially:
+    ```yaml
+    models:
+      - type: main
+        engine: openai
+        model: gpt-3.5-turbo
+    rails:
+      input:
+        flows:
+          - huggingface detector check input $hf_model="some/hf-model"
+          - huggingface detector check input $hf_model="different/hf-model"
+      config:
+        huggingface_detector:
+          models:
+            - model_repo: "some/hf-model"
+              descriptor: "hate speech detector"
+              blocked_classes:
+                - "classA"
+            - model_repo: "different/hf-model"
+              descriptor: "jailbreak detector"
+              blocked_classes:
+                - "classB"
+                - "classD"
+    ```
+    ### Output Checking
+    Check bot outputs before sending to the user:
+    ```yaml
+    models:
+      - type: main
+        engine: openai
+        model: gpt-3.5-turbo
+    rails:
+      output:
+        flows:
+          - huggingface detector check output $hf_model="some/hf-model"
+      config:
+        huggingface_detector:
+          models:
+            - model_repo: "some/hf-model"
+              descriptor: "hate speech detector"
+              blocked_classes:
+                - "classA"
+    ```
+    ### Using Class Indices
+    For models without label mappings, use class indices:
+    ```yaml
+    rails:
+      input:
+        flows:
+          - huggingface detector check input $hf_model="some-model/text-classifier"
+      config:
+        huggingface_detector:
+          models:
+            - model_repo: "some-model/text-classifier"
+              blocked_classes: [0, 1]  # Block classes at indices 0 and 1
+    ```
+    ### Specifying Device
+    To load models on a specific device (CPU or GPU):
+    ```yaml
+    rails:
+      input:
+        flows:
+          - huggingface detector check input $hf_model="some/hf-model"
+      config:
+        huggingface_detector:
+          models:
+            - model_repo: "some/hf-model"
+              descriptor: "hate speech detector"
+              blocked_classes: ["classA"]
+              device: "cuda"  # Load on GPU
+            - model_repo: "different/hf-model"
+              descriptor: "jailbreak detector"
+              blocked_classes: ["classB"]
+              device: "cpu"  # Load on CPU
+    ```
+    ## Supported Models
+    Any Huggingface text classification model should work. Some popular options:
+    - **HAP Detection**: `ibm-granite/granite-guardian-hap-38m`
+    - **Prompt Injection**: `protectai/deberta-v3-base-prompt-injection-v2`
+    - **Content Moderation**: Various models available on Huggingface Hub
+    ## How It Works
+. You configure multiple models in `rails.config.huggingface_detector.models`
+. In your flows, you activate specific models using the `$hf_model` parameter
+. The specified model and tokenizer are loaded (cached after first load)
+. Input/output text is tokenized and passed through the model
+. The model returns classification scores for each class
+. If the top predicted class is in the model's `blocked_classes` list, the content is blocked
+. The result includes the detected class, confidence score, and all class scores
+    ## Return Structure
+    The detector returns a dictionary with the following structure:
+    - `allowed`: Boolean indicating if the text is allowed
+    - `detected_class`: The predicted class label
+    - `score`: The confidence score for the prediction
+    - `all_scores`: Dictionary of scores for all classes
+    **Example return value:**
+    ```python
+    {
+      "allowed": False,
+      "detected_class": "harmful",
+      "score": 0.95,
+      "all_scores": {
+        "safe": 0.05,
+        "harmful": 0.95,
+        "violence": 0.00
+      }
+    }
+    ```
+    These values are available as flow variables after calling the detector.
+    ## Exception Handling
+    When `enable_rails_exceptions` is enabled in your config, the detector will raise:
+    - `HuggingfaceDetectorInputException` for blocked inputs
+    - `HuggingfaceDetectorOutputException` for blocked outputs
+    ## Performance Considerations
+    - Models are cached after first load to avoid reloading on each call
+    - If you activate multiple models in your flows, they run sequentially and will increase latency
+    - Consider using smaller models for faster inference
+    - GPU acceleration is recommended for production use (configure with `device: "cuda"`)
+    - You can specify different devices for different models (e.g., put smaller models on CPU and larger ones on GPU)
+    - All models run locally, so no external API calls are needed
+    - Only the models you explicitly activate in your flows will be loaded and run
+    ## Troubleshooting
+    **ImportError: transformers not found**
+    - Install transformers: `pip install transformers`
+    **Model loading fails**
+    - Ensure the model repository ID is correct
+    - Check your internet connection OR ensure models are pre-cached in the Huggingface Hub cache directory
+    - Models are downloaded from Huggingface Hub on first use. For offline/air-gapped environments, you need to:
+      - Pre-download models using `huggingface-cli download <model-repo>` on a machine with internet access
+      - Copy the Huggingface cache directory (usually `~/.cache/huggingface/`) to your offline machine
+      - Set the `HF_HOME` or `TRANSFORMERS_CACHE` environment variable to point to the cache directory if needed
+    - Verify you have enough disk space for the model
+    **Out of memory errors**
+    - Try using a smaller model
+    - Reduce the max_length parameter in tokenization
+    - Use CPU inference if GPU memory is limited by setting `device: "cpu"` in the model configuration
+    **ValueError: Model does not provide label mappings**
+    - Your model doesn't have `id2label` or `label2id` configuration
+    - Use class indices instead of labels in your `blocked_classes` configuration
+    - Example: `blocked_classes: [0, 1, 2]` instead of `blocked_classes: ["harmful", "violence"]`
+    **ValueError: Class label 'X' not found in model's label mapping**
+    - The label you specified is not recognized by the model
+    - Check the model's documentation for available labels
+    - Use class indices instead if you're unsure about label names
+    - The error message will show all available labels
+    **ValidationError: blocked_classes validation failed**
+    - Don't mix class labels and indices in the same model's blocked_classes
+    - Use either `["label1", "label2"]` OR `[0, 1]`, not `["label1", 1]`
+    - Each model can independently use labels or indices, but not mixed within one model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add support for local Hugging Face text classifiers #1612

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

Feat: Add support for local Hugging Face text classifiers #1612

Are you sure you want to change the base?

Uh oh!

Feat: Add support for local Hugging Face text classifiers #1612

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!