Changes to the table of contents for examples/demos

- update ReadMe to remove any reference to date
savitha-srinivasan · Jul 26, 2024 · 9371dad · 9371dad
1 parent 0e58f1a
commit 9371dad
Show file tree

Hide file tree

Showing 4 changed files with 11 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -30,15 +30,18 @@ git lfs pull
 - [Run multiple concurrent AI applications with ONNXRuntime](example/multi-model)
 - [Run Ryzen AI Library example](example/Ryzen-AI-Library)
 - [Run ONNX end-to-end examples with custom pre/post-processing nodes running on IPU](https://github.com/amd/RyzenAI-SW/tree/main/example/onnx-e2e)
-- Generative AI Examples
-   - [Run LLM OPT-1.3B model with ONNXRuntime](example/transformers/)
-   - [Run LLM OPT-1.3B model with PyTorch](example/transformers/)
-   - [Run LLM Llama 2 model with PyTorch](example/transformers/)
+- LLM Examples
+   - [LLMs on RyzenAI with Pytorch](./models/llm/docs/README.md)
+   - [Speculative Decoding of LLMs in Pytorch](./models/llm_assisted_generation/README.md)
+   - [LLMs on RyzenAI with ONNX](./models/llm_onnx/docs/README.md)
+   - [LLMs on RyzenAI with llama.cpp](./models/llm_gguf/docs/README.md)
+   - [RAG LLM application](./models/rag/README.md)
 
 ## Demos
 
 - [Cloud-to-Client demo on Ryzen AI](demo/cloud-to-client)
 - [Multiple model concurrency demo on Ryzen AI](demo/multi-model-exec)
+- [NPU-GPU Pipeline on RyzenAI](demo/NPU-GPU-Pipeline)
 
 ## Tutorials
 
@@ -52,7 +55,6 @@ git lfs pull
 - [ONNX Benchmarking utilities](onnx-benchmark)
 
 
-
 ## Getting Started
 
 To run the demos and examples in this repository, please follow the instructions of README.md in each directory. 

diff --git a/example/transformers/models/llm/docs/README.md b/example/transformers/models/llm/docs/README.md
@@ -39,7 +39,7 @@ The above list is a just representative collection of models supported using the
 
 ## Performance on PHX, HPT and STX
 
-The following table provides the best token-time (tokens/sec) observed on PHX, STX and HPT boards (4/2024)
+The following table provides the best token-time (tokens/sec) observed on PHX, STX and HPT boards
 
 | Model Name                                               | Quantization |  PHX  |  HPT  |  STX  | STX GGUF |
 |----------------------------------------------------------|--------------|-------|-------|-------|----------|
@@ -62,11 +62,9 @@ The following table provides the best token-time (tokens/sec) observed on PHX, S
 | [state-spaces/mamba-1.4b-hf](./mamba-1.4b.md)            | PerGrp       | 3.0   | 4.4   | 5.6   |          |
 | [state-spaces/mamba-2.8b-hf](./mamba-2.8b.md)            | PerGrp       | 1.5   | 2.3   | 3.0   |          |
 
-For STX board, OS: Microsoft Windows 11 Pro 10.0.26085 Build 26085 and Driver: 11.201.8.138 3/29/2024 Versions are used.
-
 [Smoothquant + w8a8 models](./smoothquant_latency.md)
 
-The following table provides the best token-time (tokens/sec) observed on PHX, STX and HPT boards (4/2024) for w8a8, w8a16
+The following table provides the best token-time (tokens/sec) observed on PHX, STX and HPT boards for w8a8, w8a16
 
 | Model Name                                               | Quantization |  PHX  |  HPT  |  STX  |
 |----------------------------------------------------------|--------------|-------|-------|-------|
@@ -82,7 +80,6 @@ cd <transformers>
 conda env create --file=env.yaml
 conda activate ryzenai-transformers
 build_dependencies.bat
-# build_dependencies.ps1
 ```
 
 AWQ Model zoo has precomputed scales, clips and zeros for various LLMs including OPT, Llama. Get the precomputed results:

diff --git a/example/transformers/models/llm/docs/llama2.md b/example/transformers/models/llm/docs/llama2.md
@@ -34,7 +34,7 @@ w4abf16 (AWQ, 3-bit, g:128) + FA + lm_head(g:32)  | NPU        |  6.951
 w8a8  (SmoothQuant + PTDQ) + FA                   | NPU        | 19.745                     | na                              | 26.67
 w8a16 (SmoothQuant + PTDQ) + FA                   | NPU        |  6.990                     | na                              | na
 
-# Support modes on AIE/IPU - 2024.01
+# Support modes on AIE/IPU
 
 | Precision    | PHX      | STX     | HPT
 :--------------|----------|---------|--------

diff --git a/example/transformers/models/llm_gguf/docs/README.md b/example/transformers/models/llm_gguf/docs/README.md
@@ -103,7 +103,7 @@ Launch your website and promote it through social media,
 
 ## Performance on PHX, HPT and STX
 
-The following table provides the best token-time (tokens/sec) observed on PHX, HPT, and STX boards (4/2024)
+The following table provides the best token-time (tokens/sec) observed on PHX, HPT, and STX boards
 
 | Model Name                                               | Quantization |  PHX  |  HPT  |  STX  |
 |----------------------------------------------------------|--------------|-------|-------|-------|