Skip to content

Feat: Add support for local Hugging Face text classifiers#1612

Open
RobGeada wants to merge 3 commits intoNVIDIA-NeMo:developfrom
RobGeada:HuggingfaceBuiltIn
Open

Feat: Add support for local Hugging Face text classifiers#1612
RobGeada wants to merge 3 commits intoNVIDIA-NeMo:developfrom
RobGeada:HuggingfaceBuiltIn

Conversation

@RobGeada
Copy link
Contributor

@RobGeada RobGeada commented Feb 2, 2026

Description

Adds support to run local Hugging Face text classifiers as rails. The models are run inside the NeMo-Guardrails process, and as such, this feature is best suited for smaller predictive text models in the sub-100m parameter range.

Features:

  1. Models are loaded either from the local HF cache or download when first called
  2. Configuration of which predicted classes constitute a guardrail violation
  3. Provides access to a huge number of open access guardrail models via Hugging Face Hub

Example config:

rails:
  config:
    huggingface_detector:
      models:
        # Example 1: Harmful content detection on GPU
        - model_repo: "ibm-granite/granite-guardian-hap-38m"
          descriptor: "Harmful and abusive language detector"
          blocked_classes: [0]
          device: "cuda"  # Load on GPU for faster inference

        # Example 2: Prompt injection detection on CPU
        - model_repo: "protectai/deberta-v3-base-prompt-injection-v2"
          descriptor: "Prompt injection detector"
          blocked_classes: ["INJECTION"]
          device: "cpu"  # Load on CPU
  input:
    flows:
      # Check user input for prompt injection attempts
      - huggingface detector check input $hf_model="protectai/deberta-v3-base-prompt-injection-v2"
  output:
    flows:
      # Check bot output for harmful content before sending to user
      - huggingface detector check output $hf_model="ibm-granite/granite-guardian-hap-38m"

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 2, 2026

Greptile Overview

Greptile Summary

Adds integration for local HuggingFace text classification models as guardrails. The implementation allows users to run any HuggingFace text classifier locally to detect harmful content, prompt injections, or other policy violations in user inputs, bot outputs, and tool messages.

Key Features

  • Local model execution: Models run entirely locally, no external API calls required
  • Flexible configuration: Support for multiple models with different purposes, configurable blocked classes (by label or index), and device placement (CPU/GPU)
  • Model caching: Models are cached after first load to optimize performance
  • Comprehensive testing: 867 lines of unit tests covering configuration, classification logic, device management, and error handling
  • Documentation: Detailed READMEs with examples, troubleshooting, and configuration guides

Critical Issues

  • Wrong dependency in pyproject.toml: The code imports transformers but the dependency file specifies sentence-transformers. This will cause the feature to work only when sentence-transformers is installed (which includes transformers transitively), but creates version mismatch risks and adds unnecessary dependencies. The extras name should also be huggingface not sentence-transformers for clarity.

Minor Issues

Architecture

The implementation follows NeMo Guardrails patterns with configuration schemas in config.py, action handlers in actions.py, and both Colang v1 and v2 flow definitions. The detector integrates into the rails pipeline and can block content by raising exceptions or refusing to respond based on configuration.

Confidence Score: 3/5

  • Safe to merge after fixing the dependency issue in pyproject.toml
  • The implementation is well-designed with comprehensive testing and documentation. However, there's a critical dependency mismatch in pyproject.toml that specifies sentence-transformers instead of transformers, which could cause installation issues and version conflicts. Once this is corrected, the PR adds valuable functionality with proper error handling and device management.
  • pyproject.toml must be corrected before merge - wrong package specified (sentence-transformers vs transformers)

Important Files Changed

Filename Overview
pyproject.toml Added dependency for HuggingFace detector, but wrong package specified (sentence-transformers instead of transformers)
nemoguardrails/library/huggingface_detector/actions.py Core implementation for HuggingFace text classifier integration with proper error handling, caching, and device management
examples/configs/huggingface_detector/README.md Usage documentation with examples and configuration guide (minor typo on line 62)

Sequence Diagram

sequenceDiagram
    participant User
    participant NeMoGuardrails
    participant HFDetectorFlow as HuggingFace Detector Flow
    participant HFAction as huggingface_detector_check
    participant ModelLoader as _load_model_and_tokenizer
    participant HFHub as HuggingFace Hub
    participant Model as Classification Model
    
    User->>NeMoGuardrails: Send message
    NeMoGuardrails->>HFDetectorFlow: Execute input flow with $hf_model param
    HFDetectorFlow->>HFAction: Call HuggingfaceDetectorCheckAction(context_key, model_repo)
    
    HFAction->>HFAction: Extract text from context
    HFAction->>HFAction: Find model config by model_repo
    HFAction->>ModelLoader: Load model and tokenizer
    
    alt Model not in cache
        ModelLoader->>HFHub: Download model and tokenizer
        HFHub-->>ModelLoader: Return model files
        ModelLoader->>ModelLoader: Load AutoModelForSequenceClassification
        ModelLoader->>ModelLoader: Move to device (if specified)
        ModelLoader->>ModelLoader: Cache model
    else Model in cache
        ModelLoader->>ModelLoader: Return cached model
    end
    
    ModelLoader-->>HFAction: Return (model, tokenizer)
    
    HFAction->>HFAction: Convert blocked_classes to indices
    HFAction->>Model: Tokenize and classify text
    Model-->>HFAction: Return logits and predictions
    HFAction->>HFAction: Calculate probabilities (softmax)
    HFAction->>HFAction: Check if predicted class in blocked_classes
    
    HFAction-->>HFDetectorFlow: Return {allowed, detected_class, score, all_scores}
    
    alt Content is blocked (not allowed)
        HFDetectorFlow->>NeMoGuardrails: Send exception or refuse to respond
        NeMoGuardrails->>User: Block message
    else Content is allowed
        HFDetectorFlow-->>NeMoGuardrails: Continue processing
        NeMoGuardrails->>User: Process message normally
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1612

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant