This repository represents a basic, more-or-less functional prototype-grade implementation of retrieval augmented generation (RAG) on Amazon Bedrock.
The project uses TypeScript and, I hope, a clear structure to understand the steps, undiluted without LangChain or other magics (other than a small text-splitting library!).
When making this, unfortunately it was harder than expected to get clear, first-principles examples and guidance, especially if you are not primarily interested in Python as the language. Thanks mainly to Janakiram MSV's videos on RAG and a set of examples shared by David Boyne on LinkedIn (unable to find the link, though...) I've been able to get something working that should demonstrate how to think of constructing these capabilities.
I hope this repo will make this technique easier to understand and implement for you, even in you might use a different tech stack.
- Recent Node.js (ideally 20+) installed.
- Amazon Web Services (AWS) account with sufficient permissions so that you can deploy infrastructure. A naive but simple policy would be full rights for CloudWatch, Lambda, API Gateway, and S3.
- Ideally some experience with Serverless Framework as that's what we will use to deploy the service and infrastructure.
In the AWS console:
- Go to the Bedrock page and enable any models you want to use. We will use Jurassic-2 Ultra and Amazon Titan Text Embeddings.
- Create an SQS queue named
document-chat-demo-embeddings
with standard settings. - Create an OpenSearch Serverless collection using the "Vector search" and "Easy create" options. This will take a few minutes to start up.
- Under
Collections
, select your collection, go into theIndexes
tab and create a vector index.- Select the "JSON" option and paste the contents from
opensearch-index.json
into the text field. - Give the index the name
documents
. - In the
Collections
view, make note of the OpenSearch URL; you will update the infrastructure configuration in the next step.
- Select the "JSON" option and paste the contents from
In serverless.yml
:
- Update the values at
custom.awsAccountNumber
,custom.documentsBucketName
(your choice of random name), andcustom.openSearchUrl
to your values
In your IDE/CLI:
- Deploy the stack with
npm run deploy
In the AWS console:
- In the console for OpenSearch, under
Serverless > Security > Data access policies
, open the pre-baked policy and add the Lambda functions' roles (Ask
andGenerateEmbeddings
) to the selected principals
There is also a file src/config/config.ts
that you may wish to modify, if you want a different region or similar.
Clone, fork, or download the repo as you normally would. Run npm install
.
npm start
: Run application locallynpm run build
: Package application with Serverless Frameworknpm run deploy
: Deploy application to AWS with Serverless Frameworknpm run teardown
: Remove stack from AWS
You will need documents for this to use your "own data".
In the current implementation, the infrastructure allows for S3 events to be emitted for either PDF and TXT files being added to a documents
folder in your bucket (create this if you haven't already).
However, the actual chunking function will only currently do anything with TXT
files. Feel free to extend this with PDF parsing and whatever you might need. It's not too complicated, and this repo is about showing the principles in a working minimal way, so I've not felt any need to over-invest here and now.
To start the process of embedding vectors on document data, simply upload one of the provided documents (or any other such document) to your buckets documents
folder. There is a TXT and a PDF file, with essentially the same content, located in the data
directory.
The endpoint takes a GET request with a URL-encoded string. If you don't know how to do this by heart, then there are simple online tools that can help you.
For the question "What does Mikael say about dumb, predictable code?", the call would be:
curl https://RANDOM_ID.execute-api.REGION.amazonaws.com/\?ask\=What%20does%20Mikael%20say%20about%20dumb%2C%20predictable%20code%3F
This will respond back with the LLM's answer in a few seconds.
- Titan Text Large:
amazon.titan-tg1-large
- Titan Text Embeddings:
amazon.titan-e1t-medium
- Titan Text Embeddings v2:
amazon.titan-embed-g1-text-02
- Titan Text G1 - Express:
amazon.titan-text-express-v1
- Titan Embeddings G1 - Text:
amazon.titan-embed-text-v1
- Stable Diffusion XL:
stability.stable-diffusion-xl
- Stable Diffusion XL:
stability.stable-diffusion-xl-v0
- J2 Grande Instruct:
ai21.j2-grande-instruct
- J2 Jumbo Instruct:
ai21.j2-jumbo-instruct
- Jurassic-2 Mid:
ai21.j2-mid
- Jurassic-2 Mid:
ai21.j2-mid-1
- Jurassic-2 Ultra:
ai21.j2-ultra
- Jurassic-2 Ultra:
ai21.j2-ultra-v1
- Claude Instant:
anthropic.claude-instant-v1
- Claude:
anthropic.claude-v1
- Claude:
anthropic.claude-v2
- Cohere Command:
cohere.command-text-v14
- Don't forget to add the Lambda IAM profile to OpenSearch's allowed users for the data access policy.
- Don't use the name
vector
for fields in OpenSearch - it won't work :) - Check in the AWS web console/GUI, on the Bedrock Providers tab, for API request examples.
- AWS: Supported OpenSearch API operations and permissions
- AWS: Identity and Access Management for Amazon OpenSearch Serverless
- AWS: Ingesting data into Amazon OpenSearch Serverless collections
- OpenSearch: JavaScript client
- OpenSearch: k-NN index
- OpenSearch: Approximate k-NN search
- OpenSearch: Semantic search
- Janakiram MSV: Implementing RAG with Amazon Bedrock and Amazon Titan - Part 2