You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -45,7 +45,7 @@ Our application leverages these technologies to create a unique Q&A platform bas
45
45
46
46
Here's how our application uses AI and semantic vector search to answer user questions based on video content:
47
47
48
-
1.**Uploading videos**: Users can upload YouTube videos either via links (e.g. `https://www.youtube.com/watch?v=LaiQFZ5bXaM`) or video IDs (e.g. `LaiQFZ5bXaM`). The application processes these inputs to retrieve necessary video information. For the purposes of this tutorial, the app is pre-seeded with a collection of videos from the [Redis YouTube channel](https://www.youtube.com/@Redisinc).
48
+
1.**Uploading videos**: Users can upload YouTube videos either via links (e.g. `https://www.youtube.com/watch?v=LaiQFZ5bXaM`) or video IDs (e.g. `LaiQFZ5bXaM`). The application processes these inputs to retrieve necessary video information. For the purposes of this tutorial, the app is pre-seeded with a collection of videos from the [Redis YouTube channel](https://www.youtube.com/@Redisinc). However, when you run the application you can adjust it to cover your own set of videos.
@@ -531,96 +531,13 @@ One of the key features of our application is the ability to search through vide
531
531
532
532
When a user submits a question through the frontend, the backend performs the following steps to obtain the answer to the question as well as supporting videos:
533
533
534
-
1. An `answerVectorStore` is used to check if a similar question has already been answered. If so, we can skip the LLM and return the answer. This is called `semantic vector caching`.
535
-
- This step is optional, the user can choose to generate a unique answer every time.
536
-
1. Assuming we need to generate a unique answer, we generate a semantically similar question to the one being asked. This helps to find the most relevant videos.
534
+
1. We generate a semantically similar question to the one being asked. This helps to find the most relevant videos.
537
535
1. We then use the `vectorStore` to search for the most relevant videos based on the semantic question.
538
536
1. If we don't find any relevant videos, we search with the original question.
539
537
1. Once we find videos, we call the LLM to answer the question.
540
-
1. We cache the answer and videos in Redis by generating a vector embedding for the original question and storing it along with the answer and videos.
541
538
1. Finally, we return the answer and supporting videos to the user.
542
539
543
-
The `answerVectorStore` looks nearly identical to the `vectorStore` we defined earlier, but it uses a different [algorithm and disance metric](https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/).
`Accepted ${results.length} similar answers that have a score <= ${config.searches.maxSimilarityScore}`,
604
-
{
605
-
location: `${prefix}.search.getAnswer`,
606
-
scores: results.map((result) =>result[1]),
607
-
},
608
-
);
609
-
610
-
returnresults.map((result) => {
611
-
return {
612
-
...result[0].metadata,
613
-
question: result[0].pageContent,
614
-
isOriginal: false,
615
-
};
616
-
});
617
-
}
618
-
}
619
-
```
620
-
621
-
The `similaritySearchWithScore` will find similar questions to the one being asked. It ranks them from `0` to `1`, where `0` is most similar or "closest". We then filter out any results that are too similar, as defined by the `maxSimilarityScore` environment variable. If we find any results, we return them to the user. Using a max score is crucial here, because we don't want to return inaccurate results.
622
-
623
-
If we don't find answers in the `semantic vector cache` then we need to generate a unique answer. This is done by first generating a semantically similar question to the one being asked. This is done using the `QUESTION_PROMPT` defined below:
540
+
To answer a question, we first generate a semantically similar question to the one being asked. This is done using the `QUESTION_PROMPT` defined below:
Using this prompt, we generate the `semantic question` and use it to search for videos. We may also need to search using the original `question` if we don't find any videos with the `semantic question`. This is done using the `ORIGINAL_QUESTION_PROMPT` defined below:
The code above shows the whole process, from checking the `semantic vector cache`for existing answers, all the way to getting answers from the LLM and caching them in Redis for future potential questions. Once relevant videos are identified, the backend uses either Google Gemini or OpenAI's ChatGPT to generate answers. These answers are formulated based on the video transcripts stored in Redis, ensuring they are contextually relevant to the user's query. The `ANSWER_PROMPT` used to ask the LLM for answers is as follows:
619
+
The code above shows the whole processfor getting answers from the LLM and returning them to the user. Once relevant videos are identified, the backend uses either Google Gemini or OpenAI's ChatGPT to generate answers. These answers are formulated based on the video transcripts stored in Redis, ensuring they are contextually relevant to the user's query. The `ANSWER_PROMPT` used to ask the LLM for answers is as follows:
That's it! The backend will now return the answer and supporting videos to the user. Not included in this tutorial is an overview of the frontend `Next.js` app. However, you can find the code in the [GitHub repository](https://github.com/wjohnsto/genai-qa-videos) in the `app` directory.
648
+
That's it! The backend will now return the answer and supporting videos to the user.
649
+
650
+
## Going further with semantic answer caching
651
+
652
+
The application we've built in this tutorial is a great starting point for exploring the possibilities of AI-powered video Q&A. However, there are many ways to improve the application and make it more efficient. One such improvement is to use Redis as a semantic vector cache.
653
+
654
+
Note in the previous section, we discussed making a call to the LLM to answer every question. There is a performance bottleneck during this step, because LLM response times vary, but can take several seconds. What if there was a way we could prevent unnecessary calls to the LLM? This is where `semantic vector caching` comes in.
655
+
656
+
### What is semantic vector caching?
657
+
658
+
Semantic vector caching happens when you take the results of a call to an LLM and cache them alongside the vector embedding for the prompt. In the case of our application, we could generate vector embeddings for the questions and store them in Redis with the answer from the LLM. This would allow us to avoid calling the LLM for similar questions that have already been answered.
659
+
660
+
You might ask why store the question as a vector? Why not just store the question as a string? The answer is that storing the question as a vector allows us to perform semantic vector similarity searches. So rather than relying on someone asking the exact same question, we can determine an acceptable similarity score and return answers for similar questions.
661
+
662
+
### How to implement semantic vector caching in Redis
663
+
664
+
If you're already familiar with storing vectors in Redis, which we have covered in this tutorial, semantic vector caching is an extenson of that and operates in essentially the same way. The only difference is that we are storing the question as a vector, rather than the video summary. We are also using the [cache aside](https://www.youtube.com/watch?v=AJhTduDOVCs) pattern. The process is as follows:
665
+
666
+
1. When a user asks a question, we perform a vector similarity search for existing answers to the question.
667
+
1. If we find an answer, we return it to the user. Thus, avoiding a call to the LLM.
668
+
1. If we don't find an answer, we call the LLM to generate an answer.
669
+
1. We then store the question as a vector in Redis, along with the answer from the LLM.
670
+
671
+
In order to store the question vectors we need to create a new vector store. This will create an index specifically for the question and answer vector. The code looks like this:
The `answerVectorStore` looks nearly identical to the `vectorStore` we defined earlier, but it uses a different [algorithm and disance metric](https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/). This algorithm is better suited for similarity searches for our questions.
686
+
687
+
The following code demonstrates how to use the `answerVectorStore` to check if a similar question has already been answered.
`Accepted ${results.length} similar answers that have a score <= ${config.searches.maxSimilarityScore}`,
734
+
{
735
+
location: `${prefix}.search.getAnswer`,
736
+
scores: results.map((result) =>result[1]),
737
+
},
738
+
);
739
+
740
+
returnresults.map((result) => {
741
+
return {
742
+
...result[0].metadata,
743
+
question: result[0].pageContent,
744
+
isOriginal: false,
745
+
};
746
+
});
747
+
}
748
+
}
749
+
```
750
+
751
+
The `similaritySearchWithScore` will find similar questions to the one being asked. It ranks them from `0` to `1`, where `0` is most similar or "closest". We then filter out any results that are too similar, as defined by the `maxSimilarityScore` environment variable. If we find any results, we return them to the user. Using a max score is crucial here, because we don't want to return inaccurate results.
752
+
753
+
To complete this process, we need to apply the `cache aside` pattern and store the question as a vector in Redis. This is done as follows:
When a question is asked, we first check the answer cache. We check both the question and the generated semantic question. If we find an answer, we return it to the user. If we don't find an answer, we call the LLM to generate an answer. We then store the question as a vector in Redis, along with the answer from the LLM. It may look like we're doing more work here than we were without the cache, but keep in mind the LLM is the bottleneck. By doing this, we are avoiding unnecessary calls to the LLM.
747
820
748
821
Below are a couple screenshots from the application to see what it looks like when you find an existing answer to a question:
749
822
@@ -755,6 +828,8 @@ Below are a couple screenshots from the application to see what it looks like wh
755
828
756
829
In this tutorial, we've explored how to build an AI-powered video Q&A application using Redis, LangChain, and various other technologies. We've covered setting up the environment, processing video uploads, and implementing search functionality. You also saw how to use Redis as a `vector store` and `semantic vector cache`.
757
830
831
+
> NOTE: Not included in this tutorial is an overview of the frontend `Next.js` app. However, you can find the code in the [GitHub repository](https://github.com/wjohnsto/genai-qa-videos) in the `app` directory.
832
+
758
833
### Key takeaways
759
834
760
835
- Generative AI can be leveraged to create powerful applications without writing a ton of code.
0 commit comments