Skip to content

Commit 28826ec

Browse files
authored
docs(examples): add nemoguards cache configuration example (#1459)
1 parent 6676990 commit 28826ec

File tree

3 files changed

+400
-0
lines changed

3 files changed

+400
-0
lines changed
Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
# NeMoGuard Safety Rails with Caching
2+
3+
This example demonstrates how to configure NeMo Guardrails with caching support for multiple NVIDIA NeMoGuard NIMs, including content safety, topic control, and jailbreak detection.
4+
5+
## Features
6+
7+
- **Content Safety Checks**: Validates content against 23 safety categories (input and output)
8+
- **Topic Control**: Ensures conversations stay within allowed topics (input)
9+
- **Jailbreak Detection**: Detects and prevents jailbreak attempts (input)
10+
- **Per-Model Caching**: Each safety model has its own dedicated cache instance
11+
- **Thread Safety**: Fully thread-safe for use in multi-threaded web servers
12+
- **Cache Statistics**: Optional performance monitoring for each model
13+
14+
## Folder Structure
15+
16+
- `config.yml` - Main configuration file with model definitions, rails configuration, and cache settings
17+
- `prompts.yml` - Prompt templates for content safety and topic control checks
18+
19+
## Configuration Overview
20+
21+
### Basic Configuration with Caching
22+
23+
```yaml
24+
models:
25+
- type: main
26+
engine: nim
27+
model: meta/llama-3.3-70b-instruct
28+
29+
- type: content_safety
30+
engine: nim
31+
model: nvidia/llama-3.1-nemoguard-8b-content-safety
32+
cache:
33+
enabled: true
34+
maxsize: 10000
35+
stats:
36+
enabled: true
37+
38+
- type: topic_control
39+
engine: nim
40+
model: nvidia/llama-3.1-nemoguard-8b-topic-control
41+
cache:
42+
enabled: true
43+
maxsize: 10000
44+
stats:
45+
enabled: true
46+
47+
- type: jailbreak_detection
48+
engine: nim
49+
model: jailbreak_detect
50+
cache:
51+
enabled: true
52+
maxsize: 10000
53+
stats:
54+
enabled: true
55+
56+
rails:
57+
input:
58+
flows:
59+
- jailbreak detection model
60+
- content safety check input $model=content_safety
61+
- topic safety check input $model=topic_control
62+
63+
output:
64+
flows:
65+
- content safety check output $model=content_safety
66+
67+
config:
68+
jailbreak_detection:
69+
nim_base_url: "https://ai.api.nvidia.com"
70+
nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect"
71+
api_key_env_var: NVIDIA_API_KEY
72+
```
73+
74+
## NeMoGuard NIMs Used
75+
76+
### 1. Content Safety (`nvidia/llama-3.1-nemoguard-8b-content-safety`)
77+
78+
Checks for unsafe content across 23 safety categories including violence, hate speech, sexual content, and more.
79+
80+
**Cache Configuration:**
81+
82+
```yaml
83+
- type: content_safety
84+
engine: nim
85+
model: nvidia/llama-3.1-nemoguard-8b-content-safety
86+
cache:
87+
enabled: true
88+
maxsize: 10000
89+
stats:
90+
enabled: true
91+
```
92+
93+
### 2. Topic Control (`nvidia/llama-3.1-nemoguard-8b-topic-control`)
94+
95+
Ensures conversations stay within allowed topics and prevents topic drift.
96+
97+
**Cache Configuration:**
98+
99+
```yaml
100+
- type: topic_control
101+
engine: nim
102+
model: nvidia/llama-3.1-nemoguard-8b-topic-control
103+
cache:
104+
enabled: true
105+
maxsize: 10000
106+
stats:
107+
enabled: true
108+
```
109+
110+
### 3. Jailbreak Detection (`jailbreak_detect`)
111+
112+
Detects and prevents jailbreak attempts that try to bypass safety measures.
113+
114+
**IMPORTANT**: For jailbreak detection caching to work, the `type` and `model` **MUST** be set to these exact values:
115+
116+
- `type: jailbreak_detection`
117+
- `model: jailbreak_detect`
118+
119+
**Cache Configuration:**
120+
121+
```yaml
122+
- type: jailbreak_detection
123+
engine: nim
124+
model: jailbreak_detect
125+
cache:
126+
enabled: true
127+
maxsize: 10000
128+
stats:
129+
enabled: true
130+
```
131+
132+
The actual NIM endpoint is configured separately in the `rails.config` section:
133+
134+
```yaml
135+
rails:
136+
config:
137+
jailbreak_detection:
138+
nim_base_url: "https://ai.api.nvidia.com"
139+
nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect"
140+
api_key_env_var: NVIDIA_API_KEY
141+
```
142+
143+
## How It Works
144+
145+
1. **User Input**: When a user sends a message, it goes through multiple safety checks:
146+
- Jailbreak detection evaluates for manipulation attempts
147+
- Content safety checks for unsafe content
148+
- Topic control validates topic adherence
149+
150+
2. **Caching**: Each model has its own cache:
151+
- First check: API call to NeMoGuard NIM, result cached
152+
- Subsequent identical inputs: Cache hit, no API call needed
153+
154+
3. **Response Generation**: If all input checks pass, the main model generates a response
155+
156+
4. **Output Check**: The response is checked by content safety before returning to user
157+
158+
## Cache Configuration Options
159+
160+
### Default Behavior (No Caching)
161+
162+
By default, caching is **disabled**. Models without cache configuration will have no caching.
163+
164+
### Enabling Cache
165+
166+
Add cache configuration to any model definition:
167+
168+
```yaml
169+
cache:
170+
enabled: true # Enable caching
171+
maxsize: 10000 # Cache capacity (number of entries)
172+
stats:
173+
enabled: true # Enable statistics tracking
174+
log_interval: 300.0 # Log stats every 5 minutes (optional)
175+
```
176+
177+
### Cache Configuration Parameters
178+
179+
- **enabled**: `true` to enable caching, `false` to disable
180+
- **maxsize**: Maximum number of entries in the cache (LRU eviction when full)
181+
- **stats.enabled**: Track cache hit/miss rates and performance metrics
182+
- **stats.log_interval**: How often to log statistics (in seconds, optional)
183+
184+
## Architecture
185+
186+
Each NeMoGuard model gets its own dedicated cache instance, providing:
187+
188+
- **Isolated cache management** per model
189+
- **Different cache capacities** for different models
190+
- **Model-specific performance tuning**
191+
- **Thread-safe concurrent access**
192+
193+
This architecture allows you to:
194+
195+
- Set larger caches for frequently-used models
196+
- Disable caching for specific models
197+
- Monitor performance per model
198+
199+
## Thread Safety
200+
201+
The implementation is fully thread-safe:
202+
203+
- **Concurrent Requests**: Safely handles multiple simultaneous safety checks
204+
- **Efficient Locking**: Uses RLock for minimal performance impact
205+
- **Atomic Operations**: Prevents duplicate LLM calls for the same content
206+
207+
Suitable for:
208+
209+
- Multi-threaded web servers (FastAPI, Flask, Django)
210+
- Concurrent request processing
211+
- High-traffic applications
212+
213+
## Running the Example
214+
215+
```bash
216+
export NVIDIA_API_KEY=your_api_key_here
217+
218+
nemoguardrails server --config examples/configs/nemoguards_cache/
219+
```
220+
221+
## Benefits
222+
223+
1. **Performance**: Avoid redundant NeMoGuard API calls for repeated inputs
224+
2. **Cost Savings**: Reduce API usage significantly
225+
3. **Flexibility**: Enable caching per model based on usage patterns
226+
4. **Clean Architecture**: Each model has its own dedicated cache
227+
5. **Scalability**: Easy to add new models with different caching strategies
228+
6. **Observability**: Cache statistics help monitor effectiveness
229+
230+
## Tips
231+
232+
- Start with moderate cache sizes (5,000-10,000 entries) and adjust based on usage
233+
- Enable stats logging to monitor cache effectiveness
234+
- Jailbreak detection typically has high cache hit rates
235+
- Content safety caching is most effective for chatbots with common queries
236+
- Topic control benefits from caching when topics are well-defined
237+
- Adjust cache sizes independently for each model based on their usage patterns
238+
239+
## Documentation
240+
241+
For more details about NeMoGuard NIMs and deployment options, see:
242+
243+
- [NeMo Guardrails Documentation](https://docs.nvidia.com/nemo/guardrails/index.html)
244+
- [Llama 3.1 NemoGuard 8B ContentSafety NIM](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/)
245+
- [Llama 3.1 NemoGuard 8B TopicControl NIM](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-topiccontrol/latest/)
246+
- [NemoGuard JailbreakDetect NIM](https://docs.nvidia.com/nim/nemoguard-jailbreakdetect/latest/)
247+
- [NeMoGuard Models on NVIDIA API Catalog](https://build.nvidia.com/search?q=nemoguard)
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
models:
2+
- type: main
3+
engine: nim
4+
model: meta/llama-3.3-70b-instruct
5+
6+
- type: content_safety
7+
engine: nim
8+
model: nvidia/llama-3.1-nemoguard-8b-content-safety
9+
cache:
10+
enabled: true
11+
maxsize: 10000
12+
stats:
13+
enabled: true
14+
15+
- type: topic_control
16+
engine: nim
17+
model: nvidia/llama-3.1-nemoguard-8b-topic-control
18+
cache:
19+
enabled: true
20+
maxsize: 10000
21+
stats:
22+
enabled: true
23+
24+
- type: jailbreak_detection
25+
engine: nim
26+
model: jailbreak_detect
27+
cache:
28+
enabled: true
29+
maxsize: 10000
30+
stats:
31+
enabled: true
32+
33+
rails:
34+
input:
35+
flows:
36+
- jailbreak detection model
37+
- content safety check input $model=content_safety
38+
- topic safety check input $model=topic_control
39+
40+
output:
41+
flows:
42+
- content safety check output $model=content_safety
43+
44+
config:
45+
jailbreak_detection:
46+
nim_base_url: "https://ai.api.nvidia.com"
47+
nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect"
48+
api_key_env_var: NVIDIA_API_KEY

0 commit comments

Comments
 (0)