Guidance for Securing Sensitive Data in RAG Applications using Amazon Bedrock

Table of Contents (required)

Required

Overview
- Architecture
- Cost
Prerequisites
Deployment Steps
Cleanup
Resources
Notices
Authors

Overview

Organizations implementing Retrieval Augmented Generation (RAG) applications face security challenges when handling sensitive information such as personally identifiable information (PII), protected health information (PHI), and confidential business data. Without proper security controls, organizations risk exposing sensitive information to unauthorized users, potentially resulting in regulatory violations, breach of customer trust, and reputational damage. This Guidance helps organizations implement a threat model for generative AI applications while maintaining the utility and effectiveness of RAG workflows.

This Guidance for Securing Sensitive Data in RAG Applications using Amazon Bedrock addresses these challenges by providing security architecture patterns that protect sensitive data throughout the RAG workflow, from ingestion to retrieval. The Guidance presents two security architecture patterns implemented with AWS services:

Data Redaction at Storage Level: A zero-trust approach that identifies and redacts sensitive data before storing it in vector databases, ensuring sensitive information is never exposed during retrieval and generation.
Role-Based Access Control: A permission-based approach that enables selective access to sensitive information based on user roles during retrieval, appropriate for environments where sensitive data needs to be accessible to authorized personnel while being protected from unauthorized access.

Both patterns are implemented using Amazon Bedrock Knowledge Bases as the foundation, complemented by security services such as Amazon Comprehend, Amazon Macie, Amazon Cognito, and Amazon Bedrock Guardrails to create a defense-in-depth security strategy.

Architecture

This section provides a reference implementation architecture diagram for the components deployed with this Guidance.

This Guidance presents two distinct architectural patterns for securing sensitive data in RAG applications using Amazon Bedrock. Each pattern addresses different security requirements and organizational needs. The first focuses on proactive redaction of sensitive data before storage, while the second implements role-based access controls for situations where sensitive data must be preserved but accessed selectively.

Scenario 1: Data redaction at storage level

Figures 1a and 1b show this scenario's reference architecture, illustrating how customers can safely ingest sensitive documents through automated redaction and verification processes while enabling secure, guardrail-protected access to their knowledge base. The architecture demonstrates a complete security workflow that begins with document upload and PII detection, continues through multi-layered redaction and verification mechanisms, and culminates with authenticated retrieval protected by input and output guardrails. This comprehensive approach ensures sensitive information is properly secured throughout the entire lifecycle while maintaining functional access for authorized users.

Scenario 1: Architecture diagram

Figure 1a: Architecture for Scenario 1: Data redaction at storage level - Part 1

Figure 1b: Architecture for Scenario 1: Data redaction at storage level - Part 2

Scenario 2: Role-based access to sensitive data

Figure 2 shows this scenario's reference architecture, illustrating the dynamic application of security controls based on user roles when accessing sensitive information. Unlike Scenario 1, this approach maintains sensitive data in the knowledge base but implements sophisticated controls to restrict access based on user permissions and identity attributes.

The diagram illustrates the technical implementation that enables fine-grained access control. It features dual guardrail configurations—one for administrators and another for non-administrators—that are automatically applied based on user authentication claims. When users submit queries, the Lambda orchestrator analyzes their role and applies the appropriate guardrail to the request.

This approach ensures Amazon Bedrock Knowledge Bases retrieves only documents with metadata attributes matching the user's permission level. The result is a seamless but secure experience where users receive only the information appropriate to their role, allowing organizations to maintain a single knowledge base while enforcing different levels of information access.

Scenario 2: Architecture diagram

Figure 2: Architecture for role-based access to sensitive data in RAG applications

Cost

You are responsible for the cost of the AWS services used while running this Guidance. As of April 2025, the cost for running this Guidance with an estimated 10GB of document processing in the US East (N. Virginia) Region is approximately $572.30 per month.

Refer to the pricing webpage for each AWS service used in this Guidance.

We recommend creating a budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.

Sample cost table

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.

AWS service	Dimensions	Cost [USD]
Amazon S3	100GB Standard Storage, various operations and data transfers	$4.30/month
AWS Lambda	20,000 total invocations across functions	$2.00/month
Amazon DynamoDB	On-demand capacity for job tracking	$5.00/month
Amazon Comprehend	PII Detection & Redaction for 5,000 documents	$20.00/month
Amazon Macie	Sensitive Data Discovery for 100GB	$100.00/month
Amazon Bedrock	10,000 Knowledge Base queries and Guardrails processing	$81.00/month
Amazon OpenSearch	10GB Serverless capacity	$350.00/month
Amazon API Gateway	10,000 API calls	$3.50/month
Amazon Cognito	100 monthly active users	$5.50/month
Amazon EventBridge	10,000 events	$1.00/month
Total		$572.30/month

Prerequisites

Before you deploy this Guidance, ensure that you have the following:

An AWS account with administrator permissions
Python version 3.10.16 or later installed on your local machine
AWS Cloud Development Kit (CDK) CLI version 2.1005.0 or later installed
Docker Desktop installed and running (required for custom CDK constructs)
Amazon Macie enabled in your AWS account
Model access enabled in Amazon Bedrock for:
- Anthropic Claude (Text and Text & Vision generation models)
- Amazon Titan Text Embedding V2 (Embedding Model)

Deployment Steps

Before you launch the Guidance, review the cost, architecture, security, and other considerations discussed in this guide. Follow the step-by-step instructions in this section to configure and deploy the Guidance into your account.

Time to deploy: Approximately 45-60 minutes

Step 1: Clone the repository and prepare the environment

Open a terminal window.

Clone the Amazon Bedrock samples repository.

git clone https://github.com/aws-solutions-library-samples/guidance-for-securing-sensitive-data-in-rag-applications-using-amazon-bedrock.git

Navigate to the securing-rag-apps directory.

cd guidance-for-securing-sensitive-data-in-rag-applications-using-amazon-bedrock

Create and activate a Python virtual environment.
```
python -m venv .venv
source .venv/bin/activate
```

Upgrade pip and install the required dependencies.

pip install -U pip
pip install -r requirements.txt

Step 2: Generate synthetic data for testing

Generate sample data containing synthetic PII for testing purposes.
```
python synthetic_data.py --seed 123 generate -n 10
```
Verify that the data files are created in the data/ directory.

Step 3: Deploy and test your chosen scenario

Choose either Scenario 1 or Scenario 2 based on your security requirements.

Option A: Deploy Scenario 1 (Data redaction at storage level)

Navigate to the scenario_1 directory.
```
cd scenario_1
```
Make the deployment script executable and run it.
```
chmod +x run_app.sh
./run_app.sh
```
When prompted, set a password for the Cognito user [email protected].
Wait for the deployment to complete (approximately 30-45 minutes). The script will:
- Deploy the CDK stack
- Trigger Lambda functions
- Monitor Amazon Comprehend and Amazon Macie job completions
- Launch the Streamlit application
After deployment completes, the Streamlit app will automatically launch at http://localhost:8501/.
Log in with [email protected] and the password you set earlier.
From the sidebar, select a model and optionally adjust parameters like temperature and top_p.
Test the application with sample queries such as:
- "What medications were recommended for Chronic migraines"
- "What is the home address of Nikhil Jayashankar"
- "List all patients under Institution Flores Group Medical Center"

Option B: Deploy Scenario 2 (Role-based access to sensitive data)

Navigate to the scenario_2 directory.
```
cd scenario_2
```
Make the deployment script executable and run it.
```
chmod +x run_app.sh
./run_app.sh
```
Wait for the deployment to complete. The script will deploy the CDK stack and launch the Streamlit application.
After deployment completes, the Streamlit app will automatically launch at http://localhost:8501/.
Log in with either:
- [email protected] for Admin access
- [email protected] for Non-Admin access
From the sidebar, select a model and optionally adjust parameters.
Test the application with sample queries such as:
- "List all patients with Obesity as Symptom and the recommended medications"
- "Generate a list of all patient names and a summary of their symptoms grouped by symptoms. Output in markdown table format."
- "Tell me more about Mr. Smith and the reason PMD is needed" (works for Admins only)

Cleanup

You can uninstall the Guidance for Securing Sensitive Data in RAG Applications by following these steps:

Using CDK to destroy resources

Navigate to the appropriate scenario directory (scenario_1 or scenario_2).
Change to the cdk directory.
```
cd cdk
```
Run the CDK destroy command.
```
cdk destroy
```
When prompted, confirm the deletion of the stack.

Important note on resource cleanup

The cdk destroy command will delete all deployed resources including S3 buckets. Ensure you've saved any important data before proceeding with the cleanup.

Resources

Amazon Bedrock Documentation - Provides detailed information about Amazon Bedrock services and capabilities.
Amazon Bedrock Knowledge Bases - Explains how to create and manage knowledge bases for RAG applications.
Amazon Bedrock Guardrails - Describes how to implement safeguards in your generative AI applications.
Bedrock Guardrails sensitive information filters - Information about configuring sensitive data filters in Bedrock Guardrails.
OWASP Top 10 for Large Language Model Applications - Provides information about security risks associated with generative AI applications.

Notices

Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.

Authors

Praveen Chamarti - Sr AI/ML Specialist
Srikanth Reddy - Sr AI/ML Specialist

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
scenario_1		scenario_1
scenario_2		scenario_2
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
synthetic_data.py		synthetic_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Guidance for Securing Sensitive Data in RAG Applications using Amazon Bedrock

Table of Contents (required)

Required

Overview

Architecture

Scenario 1: Data redaction at storage level

Scenario 1: Architecture diagram

Scenario 2: Role-based access to sensitive data

Scenario 2: Architecture diagram

Cost

Sample cost table

Prerequisites

Deployment Steps

Step 1: Clone the repository and prepare the environment

Step 2: Generate synthetic data for testing

Step 3: Deploy and test your chosen scenario

Option A: Deploy Scenario 1 (Data redaction at storage level)

Option B: Deploy Scenario 2 (Role-based access to sensitive data)

Cleanup

Using CDK to destroy resources

Important note on resource cleanup

Resources

Notices

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

aws-solutions-library-samples/guidance-for-securing-sensitive-data-in-rag-applications-using-amazon-bedrock

Folders and files

Latest commit

History

Repository files navigation

Guidance for Securing Sensitive Data in RAG Applications using Amazon Bedrock

Table of Contents (required)

Required

Overview

Architecture

Scenario 1: Data redaction at storage level

Scenario 1: Architecture diagram

Scenario 2: Role-based access to sensitive data

Scenario 2: Architecture diagram

Cost

Sample cost table

Prerequisites

Deployment Steps

Step 1: Clone the repository and prepare the environment

Step 2: Generate synthetic data for testing

Step 3: Deploy and test your chosen scenario

Option A: Deploy Scenario 1 (Data redaction at storage level)

Option B: Deploy Scenario 2 (Role-based access to sensitive data)

Cleanup

Using CDK to destroy resources

Important note on resource cleanup

Resources

Notices

Authors

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages