Secure RAG Demo with MongoDB, Permit.io & LangChain

This project demonstrates how to set up a Secure Retrieval-Augmented Generation (RAG) system using MongoDB, Permit.io, and LangChain. It provides a secure AI agent that retrieves and generates responses based on user identity and permissions, ensuring agents can only access the data authorized for the users that instruct them.

Quickstart

Requirements:

MongoDB Atlas cluster (Free Tier is fine)
Permit.io account (Free Tier is fine)
OpenAI API Key
Docker & Docker Compose installed on your machine
Python 3.11+ (recommended for running manual scripts, e.g., setup_users.py)

You can check your version with python3 --version or install Python from Python Official Website

1. Set Up Vector Search in MongoDB Atlas

Go to MongoDB Atlas and log in or sign up.
Create a new project, then create a new cluster (Free M0 tier works).
Once your cluster is ready: Click "Browse Collections" → Create a new database named secure_rag and a collection named documents.
Go to the "Search Indexes" tab and click "Create Search Index".
Select Vector Search as the search type.
Choose your secure_rag database and documents collection.
Select JSON Editor under "Configuration Method" and paste the following:

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "vector_embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "metadata.department",
      "type": "filter"
    },
    {
      "path": "document_id",
      "type": "filter"
    }
  ]
}

Click Save. Let the index finish building before continuing.
You're done with MongoDB Vector Search setup!

Learn more about vector search indexing

2. Clone the Project Repository & Add Your .env File

Clone the repository:

git clone https://github.com/permitio/permit-mongodb-secure-rag.git
cd permit-mongodb-secure-rag

Create a .env file in the root of the project and add your credentials:

MONGODB_URI=<your-mongodb-uri>
OPENAI_API_KEY=<your-openai-api-key>
PERMIT_API_KEY=<your-permit-api-key>
PERMIT_PDP_URL=http://permit-pdp:7000

3. Run It All

docker-compose up --build

This will:

This command will:

Start Permit PDP locally
Sync markdown docs in the docs/ folder to MongoDB
Generate embeddings for each document
Sync metadata, users, departments & ReBAC policies to Permit.io
Start the LangChain FastAPI server so you can begin querying securely

Once all services are healthy, the app will be fully running and query-ready!

Test the RAG Query API

Query endpoint:

curl -X POST http://localhost:8000/query \
 -H "Content-Type: application/json" \
 -d '{"query": "Tell me about 2024 budget", "user_id": "user_marketing_1"}'

Try querying with:

carol → user_marketing_1 viewer in marketing
alice → user_engineering_1 → viewer in engineering

Project Structure

docs/ → Markdown documents grouped by department
watcher/ → Syncs docs to MongoDB + triggers embeddings generation for each document
scripts/ → Permit ReBAC setup (policies, departments, users)
app/ → LangChain API logic (FastAPI powered)
Dockerfile* → Docker setup for each service

How Permissions Work

Each markdown file contains frontmatter metadata (e.g., department, confidential)
This metadata is synced to Permit.io as resource instances
When a user makes a query, LangChain asks Permit.io which documents they can access
MongoDB vector search filters based on permitted IDs only
Final response is generated using OpenAI from only allowed docs

In-Depth Project Overview

Secure RAG combines Retrieval-Augmented Generation with Role-Based Access Control (ReBAC) to ensure that an AI agent only retrieves and generates responses from data a user is authorized to access. This project integrates:

MongoDB Atlas: Stores documents and their embeddings for RAG, with vector search and database indexing for efficient retrieval.
Permit.io: Manages access control using a ReBAC model to enforce permissions based on user identity and department.
LangChain: Provides the framework for building the RAG pipeline, connecting MongoDB for retrieval and OpenAI for generation.
Docker Compose: Orchestrates the services (MongoDB, Permit PDP, LangChain app, and file-watcher).

Key Features

Secure Retrieval: Retrieves only documents a user is permitted to access, based on their identity and department.
File Syncing: A file-watcher service monitors the docs directory for changes (additions, updates, deletions) and syncs Markdown files to MongoDB.
Policy Tagging: Syncs document metadata to Permit.io as resource instances (e.g., document:api_design_3d08a90b) with attributes like department and confidential.
Immediate RAG Availability: Newly added files are synced to MongoDB and immediately available for RAG queries.

Architecture Diagram

The following diagram illustrates how the components interact:

Architecture Components

File-Watcher: Monitors the docs directory and syncs files to MongoDB.
MongoDB Atlas: Stores documents and embeddings, enabling vector search for RAG.
Permit.io: Enforces ReBAC policies to filter documents based on user permissions.
LangChain App: Queries MongoDB for permitted documents and uses OpenAI to generate answers.
OpenAI API: Provides the LLM for answer generation.

Prerequisites

Before setting up the project, ensure you have the following:

Docker and Docker Compose installed.
MongoDB Atlas account (free tier is sufficient).
Permit.io account (free tier available).
OpenAI API Key for embeddings and generation.
Python 3.11 or higher (for running manual scripts).
Git to clone the repository.

Setup Instructions

MongoDB Atlas Setup

Create a MongoDB Atlas Cluster:
- Sign up/login to MongoDB Atlas.
- Create a new cluster (e.g., Cluster0) in your preferred region (e.g., AWS Singapore).
- Use the free tier (M0 Sandbox) for this demo.
Create the Database and Collection:
- In your cluster, go to the "Collections" tab.
- Create a database named secure_rag.
- Add a collection named documents.
Set Up Vector Search Index:
- Go to the "Atlas Search" tab in your cluster.
- Click "Create Search Index" and select "Vector Search".
- Use the JSON editor to create an index named vector_index with the following configuration:

{
  "fields": [
    {
      "type": "vector",
      "path": "vector_embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    },
    {
      "type": "filter",
      "path": "metadata.department"
    },
    {
      "type": "filter",
      "path": "document_id"
    }
  ]
}

Explanation:

vector_embedding: Field storing document embeddings (1536 dimensions for OpenAI embeddings).
cosine: Similarity metric for vector search.
metadata.department: Enables pre-filtering by department.

Click "Next" and "Create Index". Wait for the index status to show as "READY".
Set Up Database Index on document_id:
- Go to the "Indexes" tab in the secure_rag.documents collection.
- Click "Create Index" (not "Create Search Index").
- Use the following configuration:
  - Field: document_id
  - Type: 1 (asc)
  - Options:
    - Check "Create unique index".
    - Index name: document_id_index.
    - Leave "Create TTL" unchecked.
- Click "Create Index".

Explanation:

This index improves performance for lookups and updates on document_id, which is used as a unique identifier.

Get MongoDB Connection URI:
- In the Atlas UI, click "Connect" on your cluster.
- Choose "Connect your application".
- Copy the connection string (e.g., mongodb+srv://<username>:<password>@cluster0.mongodb.net/secure_rag?retryWrites=true&w=majority).
- Replace <username> and <password> with your Atlas credentials.

Permit.io Setup

To enable authorization in our RAG, we have to configure policy rules in Permit.io. For the sake of this tutorial, we will set up a Relationship-based Access Control (RBAC) model that includes implicit permissions granted using role derivations.

Sign Up/Login to Permit.io:
- Go to Permit.io and sign up/login.
- Create a new project (e.g., SecureRAGDemo).
Define Resources and Roles:
- Go to the "Policy Editor" in the Permit.io UI.
- Define the following resources and roles:
  - Resource: department
    - Actions: None needed for this demo.
  - Resource: document
    - Actions: read
  - Role: viewer
    - Permissions: document:read
Create Department Instances:
- Go to "Resources" > "department".
- Add the following department instances:
  - department:engineering (Attributes: name: "Engineering Department")
  - department:marketing (Attributes: name: "Marketing Department")
  - department:finance (Attributes: name: "Finance Department")
Steps:
- Click "Create Resource Instance".
- Set the key to engineering, the tenant to default, and add the attribute name: "Engineering Department".
- Repeat for marketing and finance.
Create Users:
- Go to "Users" in the Permit.io UI.
- Add the following users:
  - User: alice (email: [email protected])
  - User: bob (email: [email protected])
Steps:
- Click "Create User".
- Set the key to alice, and the email to [email protected].
- Repeat for bob.
Assign Roles to Users:
- Go to "Role Assignments".
- Assign roles:
  - alice → viewer in department:engineering
  - bob → viewer in department:marketing
Steps:
- Click "Assign Role".
- Select user alice, role viewer, and resource department:engineering.
- Repeat for bob with department:marketing.
Define Role Derivations (Policy):
- Go to "Policy Editor".
- Add a role derivation to inherit permissions:
  - If a user has the viewer role in department:X, they inherit document:read for documents where the department is the parent of the document.
Steps:
- In the "Policy Editor", add a derivation rule:
  - Resource: document
  - Role: viewer
  - Condition: User has role viewer in department AND department is parent of document.
Get Permit API Key and PDP URL:
- Go to "Settings" > "API Keys" in Permit.io.
- Generate an API key and copy it.
- The PDP URL is typically http://permit-pdp:7000 (as defined in docker-compose.yml).

Automating Your Permit.io Setup with Scripts

If you prefer to automate the setup of resources, roles, departments, users, and relationships in Permit.io, you can use the provided scripts instead of configuring everything manually via the UI. After running these scripts, you’ll only need to define the role derivation manually in the Permit.io UI.

Install Script Dependencies:
- The scripts require the permit package. Install it using:
```
pip install permit
```
Set Environment Variables for Scripts:
- Ensure PERMIT_API_KEY and PERMIT_PDP_URL are set in your .env file:
```
PERMIT_API_KEY=<your-permit-api-key>
PERMIT_PDP_URL=http://permit-pdp:7000
```
- You can get the PERMIT_API_KEY from the Permit.io UI under "Settings" > "API Keys".
Run the Setup Scripts:
- Set Up ReBAC Model:
  - Run setup_rebac.py to define resources (department, document), roles (viewer), and relationships:
```
python setup_rebac.py
```
- Create Departments:
  - Run setup_departments.py to create department instances (department:engineering, department:marketing, department:finance):
```
python setup_departments.py
```
- Create Users and Assign Roles:
  - Run setup_users.py to create users (alice, bob) and assign roles (e.g., alice as viewer in department:engineering):
```
python setup_users.py
```
Manually Define Role Derivation in Permit.io UI:
- The scripts set up resources, roles, and relationships, but you need to define the role derivation manually.
- Go to the "Policy Editor" in the Permit.io UI.
- Add a role derivation:
  - Resource: document
  - Role: viewer
  - Condition: User has role viewer in department AND department is parent of document.
- Explanation:
  - This derivation ensures that a user with the viewer role in a department (e.g., department:engineering) can read documents where that department is the parent.
Verify Setup:
- In the Permit.io UI, check:
  - "Resources" > "department" for department instances.
  - "Resources" > "document" for document instances (after syncing documents).
  - "Users" for alice and bob with their role assignments.
  - "Policy Editor" for role derivation.

Environment Variables

Create a .env File:

In the project root, create a .env file.
Add the following variables:

MONGODB_URI=<your-mongodb-atlas-uri>
OPENAI_API_KEY=<your-openai-api-key>
PERMIT_API_KEY=<your-permit-api-key>
PERMIT_PDP_URL=http://permit-pdp:7000

Running the Project

Clone the Repository:

git clone https://github.com/<your-username>/secure-rag-demo.git
cd secure-rag-demo

Install Dependencies:
- For Dockerized Services (LangChain App and File-Watcher):
  - Dependencies are automatically installed when you build the Docker containers using docker-compose up --build. The Dockerfile and Dockerfile.watcher handle installing requirements.txt and requirements.watcher.txt, respectively.
- For Manual Scripts (Embedding Generation and Permit.io Syncing):
  - To run any of the scripts in the scripts folder, install their dependencies manually:
```
pip install -r requirements.embeddings.txt
pip install -r requirements.txt
```
Spin Up Services with Docker Compose:
- Run the following command to start all services:
```
docker-compose up --build
```
- This will start:
  - permit-pdp: Permit.io Policy Decision Point.
  - file-watcher: Monitors the docs directory and syncs files to MongoDB.
  - langchain-app: The Secure RAG API, accessible at http://localhost:8000.
Sync Documents to Permit.io:
- The file-watcher will sync documents in the docs directory to MongoDB.
- Manually sync documents to Permit.io using the sync_documents.py script:
```
python sync_documents.py
```
- Note: Update the file_path in sync_documents.py to the document you want to sync (e.g., ./docs/engineering/api_design.md).
Generate Embeddings for Documents:
- Run the generate_embeddings.py script to generate embeddings for documents in MongoDB:
```
python generate_embeddings.py --all
```
- This generates embeddings for all documents and stores them in the vector_embedding field.
Test the Secure RAG API:
- Query the API with a user ID to test Secure RAG:
```
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "What is API design?", "user_id": "alice"}'
```
- Expected Behavior:
  - alice (in engineering) can access documents in the engineering department.
  - bob (in marketing) cannot access engineering documents.

ReBAC Policy Demo in Permit.io

Here's a simple ReBAC policy configured in Permit.io:

Resources:

department: Represents departments (e.g., department:engineering).
document: Represents documents (e.g., document:api_design_3d08a90b).

Roles:

viewer: Has document:read permission.

Relationships:

department:engineering is the parent of document:api_design_3d08a90b.

Role Derivation:

A user with the viewer role in department:engineering inherits document:read for all documents where department:engineering is the parent.

Example

Document: document:api_design_3d08a90b (attributes: department:"engineering")
User: alice (role: viewer in department:engineering)

Policy Result:

alice can read document:api_design_3d08a90b because she has the viewer role in department:engineering, and department:engineering is the parent of the document.
bob (role: viewer in department:marketing) cannot read this document.

How Permissions Enable Secure RAG

User Identity:

The API receives a user_id (e.g., alice) in the query request.

Permission Check:

The LangChain app queries Permit.io to determine which documents alice can read.
Permit.io evaluates the ReBAC policy and returns a list of accessible document IDs.

Filtered Retrieval:

LangChain queries MongoDB Atlas, filtering by the permitted document IDs and the user's query (e.g., "What is API design?").
MongoDB performs a vector search on the vector_embedding field to retrieve relevant documents.

Answer Generation:

The retrieved documents are passed to OpenAI to generate a context-aware answer.
Only content from permitted documents is used in the response.

This ensures that sensitive data (e.g., engineering documents) is only accessible to authorized users (e.g., alice), making the RAG system secure.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch (git checkout -b feature/your-feature).
Make your changes and commit (git commit -m "Add your feature").
Push to your branch (git push origin feature/your-feature).
Open a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
app		app
docs		docs
scripts		scripts
utils		utils
watcher		watcher
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.watcher		Dockerfile.watcher
MANUAL_SETUP.md		MANUAL_SETUP.md
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example
permit_sync_entrypoint.py		permit_sync_entrypoint.py
requirements.txt		requirements.txt
requirements.watcher.txt		requirements.watcher.txt

permitio/permit-mongodb-secure-rag

Folders and files

Latest commit

History

Repository files navigation

Secure RAG Demo with MongoDB, Permit.io & LangChain

Table of Contents

Quickstart

Requirements:

1. Set Up Vector Search in MongoDB Atlas

2. Clone the Project Repository & Add Your .env File

Clone the repository:

3. Run It All

Test the RAG Query API

Project Structure

How Permissions Work

In-Depth Project Overview

Key Features

Architecture Diagram

Architecture Components

Prerequisites

Setup Instructions

MongoDB Atlas Setup

Permit.io Setup

Automating Your Permit.io Setup with Scripts

Environment Variables

Running the Project

ReBAC Policy Demo in Permit.io

Resources:

Roles:

Relationships:

Role Derivation:

Example

Policy Result:

How Permissions Enable Secure RAG

User Identity:

Permission Check:

Filtered Retrieval:

Answer Generation:

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages