Merge pull request amd#80 from pooja/dev

updated rag readme.md
savitha-srinivasan · Jul 29, 2024 · d078781 · d078781
2 parents 9f67f49 + 0602b8f
commit d078781
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/example/transformers/models/rag/README.md b/example/transformers/models/rag/README.md
@@ -2,7 +2,7 @@
 
 To leverage AMD Ryzen AI NPU for RAG LLM application involves setting up your environment correctly. The following steps will guide you through preparing your environment, quantizing the model for efficient execution on the NPU, and integrating it into the RAG framework. 
 
-**Note:** This example is intended solely for demonstrating the integration of the Transformers flow with LlamaIndex for the Retrieval-Augmented Generation (RAG) application. The context passed with RAG could have a prompt length greater than 2048 tokens, for which it is not tuned for performance optimization yet.
+**Note:** This example is intended solely for demonstrating the integration of the Ryzen-AI LLM flow with LlamaIndex for the Retrieval-Augmented Generation (RAG) application. The context passed with RAG could have a prompt length greater than 2048 tokens, for which it is not tuned for performance optimization yet.
 
 ### 1. Clone Ryzen AI Transformers Repository 
 
@@ -40,4 +40,4 @@ Configure the RAG Application to use the quantized model, enabling the optional
 
 ```python run.py --model_name llama-2-7b-chat --target aie --no-direct_llm --quantized --assisted_generation ```
 
-*Note: `fast_attention` optimization is currently only supported for input prompt/token length <=2048, and is turned off in this RAG example for 1.2 release. 
+*Note:* `fast_attention` optimization is currently only supported for input prompt/token length <=2048, and is turned off in this RAG example for 1.2 release.