Skip to content

Commit 8d55b2f

Browse files
committed
adding semantic vector caching section at the end of the video qa tutorial
1 parent 30b49f0 commit 8d55b2f

File tree

1 file changed

+180
-105
lines changed

1 file changed

+180
-105
lines changed

docs/howtos/solutions/vector/video-qa/index-video-qa.mdx

Lines changed: 180 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Our application leverages these technologies to create a unique Q&A platform bas
4545

4646
Here's how our application uses AI and semantic vector search to answer user questions based on video content:
4747

48-
1. **Uploading videos**: Users can upload YouTube videos either via links (e.g. `https://www.youtube.com/watch?v=LaiQFZ5bXaM`) or video IDs (e.g. `LaiQFZ5bXaM`). The application processes these inputs to retrieve necessary video information. For the purposes of this tutorial, the app is pre-seeded with a collection of videos from the [Redis YouTube channel](https://www.youtube.com/@Redisinc).
48+
1. **Uploading videos**: Users can upload YouTube videos either via links (e.g. `https://www.youtube.com/watch?v=LaiQFZ5bXaM`) or video IDs (e.g. `LaiQFZ5bXaM`). The application processes these inputs to retrieve necessary video information. For the purposes of this tutorial, the app is pre-seeded with a collection of videos from the [Redis YouTube channel](https://www.youtube.com/@Redisinc). However, when you run the application you can adjust it to cover your own set of videos.
4949

5050
![Upload videos screenshot](./images/upload-videos.png).
5151

@@ -531,96 +531,13 @@ One of the key features of our application is the ability to search through vide
531531

532532
When a user submits a question through the frontend, the backend performs the following steps to obtain the answer to the question as well as supporting videos:
533533

534-
1. An `answerVectorStore` is used to check if a similar question has already been answered. If so, we can skip the LLM and return the answer. This is called `semantic vector caching`.
535-
- This step is optional, the user can choose to generate a unique answer every time.
536-
1. Assuming we need to generate a unique answer, we generate a semantically similar question to the one being asked. This helps to find the most relevant videos.
534+
1. We generate a semantically similar question to the one being asked. This helps to find the most relevant videos.
537535
1. We then use the `vectorStore` to search for the most relevant videos based on the semantic question.
538536
1. If we don't find any relevant videos, we search with the original question.
539537
1. Once we find videos, we call the LLM to answer the question.
540-
1. We cache the answer and videos in Redis by generating a vector embedding for the original question and storing it along with the answer and videos.
541538
1. Finally, we return the answer and supporting videos to the user.
542539

543-
The `answerVectorStore` looks nearly identical to the `vectorStore` we defined earlier, but it uses a different [algorithm and disance metric](https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/).
544-
545-
```typescript title="services/video-search/src/api/store.ts" {6-7}
546-
const answerVectorStore = new RedisVectorStore(embeddings, {
547-
redisClient: client,
548-
indexName: `${prefix}-${config.redis.ANSWER_INDEX_NAME}`,
549-
keyPrefix: `${prefix}-${config.redis.ANSWER_PREFIX}`,
550-
indexOptions: {
551-
ALGORITHM: VectorAlgorithms.FLAT,
552-
DISTANCE_METRIC: 'L2',
553-
},
554-
});
555-
```
556-
557-
The following code demonstrates how to use the `answerVectorStore` to check if a similar question has already been answered.
558-
559-
```typescript title="services/video-search/src/api/search.ts" {16-19}
560-
async function checkAnswerCache(question: string) {
561-
const haveAnswers = await answerVectorStore.checkIndexExists();
562-
563-
if (!(haveAnswers && config.searches.answerCache)) {
564-
return;
565-
}
566-
567-
log.debug(`Searching for closest answer to question: ${question}`, {
568-
location: `${prefix}.search.getAnswer`,
569-
question,
570-
});
571-
572-
/**
573-
* Scores will be between 0 and 1, where 0 is most accurate and 1 is least accurate
574-
*/
575-
let results = (await answerVectorStore.similaritySearchWithScore(
576-
question,
577-
config.searches.KNN,
578-
)) as Array<[AnswerDocument, number]>;
579-
580-
if (Array.isArray(results) && results.length > 0) {
581-
// Filter out results with too high similarity score
582-
results = results.filter(
583-
(result) => result[1] <= config.searches.maxSimilarityScore,
584-
);
585-
586-
const inaccurateResults = results.filter(
587-
(result) => result[1] > config.searches.maxSimilarityScore,
588-
);
589-
590-
if (Array.isArray(inaccurateResults) && inaccurateResults.length > 0) {
591-
log.debug(
592-
`Rejected ${inaccurateResults.length} similar answers that have a score > ${config.searches.maxSimilarityScore}`,
593-
{
594-
location: `${prefix}.search.getAnswer`,
595-
scores: inaccurateResults.map((result) => result[1]),
596-
},
597-
);
598-
}
599-
}
600-
601-
if (Array.isArray(results) && results.length > 0) {
602-
log.debug(
603-
`Accepted ${results.length} similar answers that have a score <= ${config.searches.maxSimilarityScore}`,
604-
{
605-
location: `${prefix}.search.getAnswer`,
606-
scores: results.map((result) => result[1]),
607-
},
608-
);
609-
610-
return results.map((result) => {
611-
return {
612-
...result[0].metadata,
613-
question: result[0].pageContent,
614-
isOriginal: false,
615-
};
616-
});
617-
}
618-
}
619-
```
620-
621-
The `similaritySearchWithScore` will find similar questions to the one being asked. It ranks them from `0` to `1`, where `0` is most similar or "closest". We then filter out any results that are too similar, as defined by the `maxSimilarityScore` environment variable. If we find any results, we return them to the user. Using a max score is crucial here, because we don't want to return inaccurate results.
622-
623-
If we don't find answers in the `semantic vector cache` then we need to generate a unique answer. This is done by first generating a semantically similar question to the one being asked. This is done using the `QUESTION_PROMPT` defined below:
540+
To answer a question, we first generate a semantically similar question to the one being asked. This is done using the `QUESTION_PROMPT` defined below:
624541

625542
```typescript title="services/video-search/src/api/templates/questions.ts"
626543
import { PromptTemplate } from 'langchain/prompts';
@@ -643,7 +560,7 @@ export const QUESTION_PROMPT = PromptTemplate.fromTemplate(questionTemplate);
643560

644561
Using this prompt, we generate the `semantic question` and use it to search for videos. We may also need to search using the original `question` if we don't find any videos with the `semantic question`. This is done using the `ORIGINAL_QUESTION_PROMPT` defined below:
645562

646-
```typescript title="services/video-search/src/api/search.ts" {12-14,33,38,48,55,58,61-67}
563+
```typescript title="services/video-search/src/api/search.ts" {12-14,22,27,37,44,46-52}
647564
async function getVideos(question: string) {
648565
log.debug(
649566
`Performing similarity search for videos that answer: ${question}`,
@@ -660,22 +577,11 @@ async function getVideos(question: string) {
660577
>);
661578
}
662579

663-
async function searchVideos(
664-
question: string,
665-
{ useCache = config.searches.answerCache }: VideoSearchOptions = {},
666-
) {
580+
async function searchVideos(question: string) {
667581
log.debug(`Original question: ${question}`, {
668582
location: `${prefix}.search.search`,
669583
});
670584

671-
if (useCache) {
672-
const existingAnswer = await checkAnswerCache(question);
673-
674-
if (typeof existingAnswer !== 'undefined') {
675-
return existingAnswer;
676-
}
677-
}
678-
679585
const semanticQuestion = await prompt.getSemanticQuestion(question);
680586

681587
log.debug(`Semantic question: ${semanticQuestion}`, {
@@ -700,10 +606,6 @@ async function searchVideos(
700606

701607
const answerDocument = await prompt.answerQuestion(question, videos);
702608

703-
if (config.searches.answerCache) {
704-
await answerVectorStore.addDocuments([answerDocument]);
705-
}
706-
707609
return [
708610
{
709611
...answerDocument.metadata,
@@ -714,7 +616,7 @@ async function searchVideos(
714616
}
715617
```
716618

717-
The code above shows the whole process, from checking the `semantic vector cache` for existing answers, all the way to getting answers from the LLM and caching them in Redis for future potential questions. Once relevant videos are identified, the backend uses either Google Gemini or OpenAI's ChatGPT to generate answers. These answers are formulated based on the video transcripts stored in Redis, ensuring they are contextually relevant to the user's query. The `ANSWER_PROMPT` used to ask the LLM for answers is as follows:
619+
The code above shows the whole process for getting answers from the LLM and returning them to the user. Once relevant videos are identified, the backend uses either Google Gemini or OpenAI's ChatGPT to generate answers. These answers are formulated based on the video transcripts stored in Redis, ensuring they are contextually relevant to the user's query. The `ANSWER_PROMPT` used to ask the LLM for answers is as follows:
718620

719621
```typescript title="services/video-search/src/api/templates/answers.ts"
720622
import { PromptTemplate } from 'langchain/prompts';
@@ -743,7 +645,178 @@ ANSWER:
743645
export const ANSWER_PROMPT = PromptTemplate.fromTemplate(answerTemplate);
744646
```
745647

746-
That's it! The backend will now return the answer and supporting videos to the user. Not included in this tutorial is an overview of the frontend `Next.js` app. However, you can find the code in the [GitHub repository](https://github.com/wjohnsto/genai-qa-videos) in the `app` directory.
648+
That's it! The backend will now return the answer and supporting videos to the user.
649+
650+
## Going further with semantic answer caching
651+
652+
The application we've built in this tutorial is a great starting point for exploring the possibilities of AI-powered video Q&A. However, there are many ways to improve the application and make it more efficient. One such improvement is to use Redis as a semantic vector cache.
653+
654+
Note in the previous section, we discussed making a call to the LLM to answer every question. There is a performance bottleneck during this step, because LLM response times vary, but can take several seconds. What if there was a way we could prevent unnecessary calls to the LLM? This is where `semantic vector caching` comes in.
655+
656+
### What is semantic vector caching?
657+
658+
Semantic vector caching happens when you take the results of a call to an LLM and cache them alongside the vector embedding for the prompt. In the case of our application, we could generate vector embeddings for the questions and store them in Redis with the answer from the LLM. This would allow us to avoid calling the LLM for similar questions that have already been answered.
659+
660+
You might ask why store the question as a vector? Why not just store the question as a string? The answer is that storing the question as a vector allows us to perform semantic vector similarity searches. So rather than relying on someone asking the exact same question, we can determine an acceptable similarity score and return answers for similar questions.
661+
662+
### How to implement semantic vector caching in Redis
663+
664+
If you're already familiar with storing vectors in Redis, which we have covered in this tutorial, semantic vector caching is an extenson of that and operates in essentially the same way. The only difference is that we are storing the question as a vector, rather than the video summary. We are also using the [cache aside](https://www.youtube.com/watch?v=AJhTduDOVCs) pattern. The process is as follows:
665+
666+
1. When a user asks a question, we perform a vector similarity search for existing answers to the question.
667+
1. If we find an answer, we return it to the user. Thus, avoiding a call to the LLM.
668+
1. If we don't find an answer, we call the LLM to generate an answer.
669+
1. We then store the question as a vector in Redis, along with the answer from the LLM.
670+
671+
In order to store the question vectors we need to create a new vector store. This will create an index specifically for the question and answer vector. The code looks like this:
672+
673+
```typescript title="services/video-search/src/api/store.ts" {6-7}
674+
const answerVectorStore = new RedisVectorStore(embeddings, {
675+
redisClient: client,
676+
indexName: `${prefix}-${config.redis.ANSWER_INDEX_NAME}`,
677+
keyPrefix: `${prefix}-${config.redis.ANSWER_PREFIX}`,
678+
indexOptions: {
679+
ALGORITHM: VectorAlgorithms.FLAT,
680+
DISTANCE_METRIC: 'L2',
681+
},
682+
});
683+
```
684+
685+
The `answerVectorStore` looks nearly identical to the `vectorStore` we defined earlier, but it uses a different [algorithm and disance metric](https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/). This algorithm is better suited for similarity searches for our questions.
686+
687+
The following code demonstrates how to use the `answerVectorStore` to check if a similar question has already been answered.
688+
689+
```typescript title="services/video-search/src/api/search.ts" {16-19}
690+
async function checkAnswerCache(question: string) {
691+
const haveAnswers = await answerVectorStore.checkIndexExists();
692+
693+
if (!(haveAnswers && config.searches.answerCache)) {
694+
return;
695+
}
696+
697+
log.debug(`Searching for closest answer to question: ${question}`, {
698+
location: `${prefix}.search.getAnswer`,
699+
question,
700+
});
701+
702+
/**
703+
* Scores will be between 0 and 1, where 0 is most accurate and 1 is least accurate
704+
*/
705+
let results = (await answerVectorStore.similaritySearchWithScore(
706+
question,
707+
config.searches.KNN,
708+
)) as Array<[AnswerDocument, number]>;
709+
710+
if (Array.isArray(results) && results.length > 0) {
711+
// Filter out results with too high similarity score
712+
results = results.filter(
713+
(result) => result[1] <= config.searches.maxSimilarityScore,
714+
);
715+
716+
const inaccurateResults = results.filter(
717+
(result) => result[1] > config.searches.maxSimilarityScore,
718+
);
719+
720+
if (Array.isArray(inaccurateResults) && inaccurateResults.length > 0) {
721+
log.debug(
722+
`Rejected ${inaccurateResults.length} similar answers that have a score > ${config.searches.maxSimilarityScore}`,
723+
{
724+
location: `${prefix}.search.getAnswer`,
725+
scores: inaccurateResults.map((result) => result[1]),
726+
},
727+
);
728+
}
729+
}
730+
731+
if (Array.isArray(results) && results.length > 0) {
732+
log.debug(
733+
`Accepted ${results.length} similar answers that have a score <= ${config.searches.maxSimilarityScore}`,
734+
{
735+
location: `${prefix}.search.getAnswer`,
736+
scores: results.map((result) => result[1]),
737+
},
738+
);
739+
740+
return results.map((result) => {
741+
return {
742+
...result[0].metadata,
743+
question: result[0].pageContent,
744+
isOriginal: false,
745+
};
746+
});
747+
}
748+
}
749+
```
750+
751+
The `similaritySearchWithScore` will find similar questions to the one being asked. It ranks them from `0` to `1`, where `0` is most similar or "closest". We then filter out any results that are too similar, as defined by the `maxSimilarityScore` environment variable. If we find any results, we return them to the user. Using a max score is crucial here, because we don't want to return inaccurate results.
752+
753+
To complete this process, we need to apply the `cache aside` pattern and store the question as a vector in Redis. This is done as follows:
754+
755+
```typescript title="services/video-search/src/api/search.ts" {3,9-15,23-29,50-52}
756+
async function searchVideos(
757+
question: string,
758+
{ useCache = config.searches.answerCache }: VideoSearchOptions = {},
759+
) {
760+
log.debug(`Original question: ${question}`, {
761+
location: `${prefix}.search.search`,
762+
});
763+
764+
if (useCache) {
765+
const existingAnswer = await checkAnswerCache(question);
766+
767+
if (typeof existingAnswer !== 'undefined') {
768+
return existingAnswer;
769+
}
770+
}
771+
772+
const semanticQuestion = await prompt.getSemanticQuestion(question);
773+
774+
log.debug(`Semantic question: ${semanticQuestion}`, {
775+
location: `${prefix}.search.search`,
776+
});
777+
778+
if (useCache) {
779+
const existingAnswer = await checkAnswerCache(semanticQuestion);
780+
781+
if (typeof existingAnswer !== 'undefined') {
782+
return existingAnswer;
783+
}
784+
}
785+
786+
let videos = await getVideos(semanticQuestion);
787+
788+
if (videos.length === 0) {
789+
log.debug(
790+
'No videos found for semantic question, trying with original question',
791+
{
792+
location: `${prefix}.search.search`,
793+
},
794+
);
795+
796+
videos = await getVideos(question);
797+
}
798+
799+
log.debug(`Found ${videos.length} videos`, {
800+
location: `${prefix}.search.search`,
801+
});
802+
803+
const answerDocument = await prompt.answerQuestion(question, videos);
804+
805+
if (config.searches.answerCache) {
806+
await answerVectorStore.addDocuments([answerDocument]);
807+
}
808+
809+
return [
810+
{
811+
...answerDocument.metadata,
812+
question: answerDocument.pageContent,
813+
isOriginal: true,
814+
},
815+
];
816+
}
817+
```
818+
819+
When a question is asked, we first check the answer cache. We check both the question and the generated semantic question. If we find an answer, we return it to the user. If we don't find an answer, we call the LLM to generate an answer. We then store the question as a vector in Redis, along with the answer from the LLM. It may look like we're doing more work here than we were without the cache, but keep in mind the LLM is the bottleneck. By doing this, we are avoiding unnecessary calls to the LLM.
747820

748821
Below are a couple screenshots from the application to see what it looks like when you find an existing answer to a question:
749822

@@ -755,6 +828,8 @@ Below are a couple screenshots from the application to see what it looks like wh
755828

756829
In this tutorial, we've explored how to build an AI-powered video Q&A application using Redis, LangChain, and various other technologies. We've covered setting up the environment, processing video uploads, and implementing search functionality. You also saw how to use Redis as a `vector store` and `semantic vector cache`.
757830

831+
> NOTE: Not included in this tutorial is an overview of the frontend `Next.js` app. However, you can find the code in the [GitHub repository](https://github.com/wjohnsto/genai-qa-videos) in the `app` directory.
832+
758833
### Key takeaways
759834

760835
- Generative AI can be leveraged to create powerful applications without writing a ton of code.

0 commit comments

Comments
 (0)