Skip to content

Commit 9495808

Browse files
committed
docs(examples): add nemoguards cache configuration example
1 parent 6a7474e commit 9495808

File tree

3 files changed

+177
-0
lines changed

3 files changed

+177
-0
lines changed
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# NeMoGuard Safety Rails Example
2+
3+
This example showcases the use of NVIDIA's NeMoGuard NIMs for comprehensive AI safety including content moderation, topic control, and jailbreak detection.
4+
5+
## Configuration Files
6+
7+
- `config.yml` - Defines the models configuration including the main LLM and three NeMoGuard NIMs for safety checks
8+
- `prompts.yml` - Contains prompt templates for content safety and topic control checks
9+
10+
## NeMoGuard NIMs Used
11+
12+
1. **Content Safety** (`nvidia/llama-3.1-nemoguard-8b-content-safety`) - Checks for unsafe content across 23 safety categories
13+
2. **Topic Control** (`nvidia/llama-3.1-nemoguard-8b-topic-control`) - Ensures conversations stay within allowed topics
14+
3. **Jailbreak Detection** - Detects and prevents jailbreak attempts (configured via `nim_server_endpoint`)
15+
16+
## Documentation
17+
18+
For more details about NeMoGuard NIMs and deployment options, see:
19+
20+
- [NeMo Guardrails Documentation](https://docs.nvidia.com/nemo/guardrails/index.html)
21+
- [Llama 3.1 NemoGuard 8B ContentSafety NIM](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/)
22+
- [Llama 3.1 NemoGuard 8B TopicControl NIM](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-topiccontrol/latest/)
23+
- [NemoGuard JailbreakDetect NIM](https://docs.nvidia.com/nim/nemoguard-jailbreakdetect/latest/)
24+
- [NeMoGuard Models on NVIDIA API Catalog](https://build.nvidia.com/search?q=nemoguard)
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
models:
2+
- type: main
3+
engine: nim
4+
model: meta/llama-3.3-70b-instruct
5+
6+
- type: content_safety
7+
engine: nim
8+
model: nvidia/llama-3.1-nemoguard-8b-content-safety
9+
cache:
10+
enabled: true
11+
maxsize: 10000
12+
stats:
13+
enabled: true
14+
15+
- type: topic_control
16+
engine: nim
17+
model: nvidia/llama-3.1-nemoguard-8b-topic-control
18+
cache:
19+
enabled: true
20+
maxsize: 10000
21+
stats:
22+
enabled: true
23+
24+
- type: jailbreak_detection
25+
engine: nim
26+
model: jailbreak_detect
27+
cache:
28+
enabled: true
29+
maxsize: 10000
30+
stats:
31+
enabled: true
32+
33+
rails:
34+
input:
35+
flows:
36+
- jailbreak detection model
37+
- content safety check input $model=content_safety
38+
- topic safety check input $model=topic_control
39+
40+
output:
41+
flows:
42+
- content safety check output $model=content_safety
43+
44+
config:
45+
jailbreak_detection:
46+
nim_base_url: "https://ai.api.nvidia.com"
47+
nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect"
48+
api_key_env_var: NVIDIA_API_KEY
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# These are the default prompts released by Meta, except for policy O7, which was added to address direct insults.
2+
prompts:
3+
- task: content_safety_check_input $model=content_safety
4+
content: |
5+
Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.
6+
7+
<BEGIN UNSAFE CONTENT CATEGORIES>
8+
S1: Violence.
9+
S2: Sexual.
10+
S3: Criminal Planning/Confessions.
11+
S4: Guns and Illegal Weapons.
12+
S5: Controlled/Regulated Substances.
13+
S6: Suicide and Self Harm.
14+
S7: Sexual (minor).
15+
S8: Hate/Identity Hate.
16+
S9: PII/Privacy.
17+
S10: Harassment.
18+
S11: Threat.
19+
S12: Profanity.
20+
S13: Needs Caution.
21+
S14: Other.
22+
S15: Manipulation.
23+
S16: Fraud/Deception.
24+
S17: Malware.
25+
S18: High Risk Gov Decision Making.
26+
S19: Political/Misinformation/Conspiracy.
27+
S20: Copyright/Trademark/Plagiarism.
28+
S21: Unauthorized Advice.
29+
S22: Illegal Activity.
30+
S23: Immoral/Unethical.
31+
<END UNSAFE CONTENT CATEGORIES>
32+
33+
<BEGIN CONVERSATION>
34+
35+
user: {{ user_input }}
36+
37+
<END CONVERSATION>
38+
39+
Provide your safety assessment for the above conversation in the below output JSON format:
40+
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
41+
42+
Do not include anything other than the output JSON in your response.
43+
Output JSON:
44+
output_parser: nemoguard_parse_prompt_safety
45+
max_tokens: 50
46+
47+
- task: content_safety_check_output $model=content_safety
48+
content: |
49+
Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.
50+
51+
<BEGIN UNSAFE CONTENT CATEGORIES>
52+
S1: Violence.
53+
S2: Sexual.
54+
S3: Criminal Planning/Confessions.
55+
S4: Guns and Illegal Weapons.
56+
S5: Controlled/Regulated Substances.
57+
S6: Suicide and Self Harm.
58+
S7: Sexual (minor).
59+
S8: Hate/Identity Hate.
60+
S9: PII/Privacy.
61+
S10: Harassment.
62+
S11: Threat.
63+
S12: Profanity.
64+
S13: Needs Caution.
65+
S14: Other.
66+
S15: Manipulation.
67+
S16: Fraud/Deception.
68+
S17: Malware.
69+
S18: High Risk Gov Decision Making.
70+
S19: Political/Misinformation/Conspiracy.
71+
S20: Copyright/Trademark/Plagiarism.
72+
S21: Unauthorized Advice.
73+
S22: Illegal Activity.
74+
S23: Immoral/Unethical.
75+
<END UNSAFE CONTENT CATEGORIES>
76+
77+
<BEGIN CONVERSATION>
78+
79+
user: {{ user_input }}
80+
81+
response: agent: {{ bot_response }}
82+
83+
<END CONVERSATION>
84+
85+
Provide your safety assessment for the above conversation in the below output JSON format:
86+
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
87+
88+
Do not include anything other than the output JSON in your response.
89+
Output JSON:
90+
output_parser: nemoguard_parse_response_safety
91+
max_tokens: 50
92+
93+
- task: topic_safety_check_input $model=topic_control
94+
content: |
95+
You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines
96+
97+
Guidelines for the user messages:
98+
- Do not answer questions related to personal opinions or advice on user's order, future recommendations
99+
- Do not provide any information on non-company products or services.
100+
- Do not answer enquiries unrelated to the company policies.
101+
- Do not answer questions asking for personal details about the agent or its creators.
102+
- Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects.
103+
- If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction.
104+
- Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available.
105+
- allow user comments that are related to small talk and chit-chat.

0 commit comments

Comments
 (0)