Skip to content

Commit

Permalink
Changes to the table of contents for examples/demos
Browse files Browse the repository at this point in the history
- update ReadMe to remove any reference to date
  • Loading branch information
cyndwith committed Jul 26, 2024
1 parent 0e58f1a commit 9371dad
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 12 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,18 @@ git lfs pull
- [Run multiple concurrent AI applications with ONNXRuntime](example/multi-model)
- [Run Ryzen AI Library example](example/Ryzen-AI-Library)
- [Run ONNX end-to-end examples with custom pre/post-processing nodes running on IPU](https://github.com/amd/RyzenAI-SW/tree/main/example/onnx-e2e)
- Generative AI Examples
- [Run LLM OPT-1.3B model with ONNXRuntime](example/transformers/)
- [Run LLM OPT-1.3B model with PyTorch](example/transformers/)
- [Run LLM Llama 2 model with PyTorch](example/transformers/)
- LLM Examples
- [LLMs on RyzenAI with Pytorch](./models/llm/docs/README.md)
- [Speculative Decoding of LLMs in Pytorch](./models/llm_assisted_generation/README.md)
- [LLMs on RyzenAI with ONNX](./models/llm_onnx/docs/README.md)
- [LLMs on RyzenAI with llama.cpp](./models/llm_gguf/docs/README.md)
- [RAG LLM application](./models/rag/README.md)

## Demos

- [Cloud-to-Client demo on Ryzen AI](demo/cloud-to-client)
- [Multiple model concurrency demo on Ryzen AI](demo/multi-model-exec)
- [NPU-GPU Pipeline on RyzenAI](demo/NPU-GPU-Pipeline)

## Tutorials

Expand All @@ -52,7 +55,6 @@ git lfs pull
- [ONNX Benchmarking utilities](onnx-benchmark)



## Getting Started

To run the demos and examples in this repository, please follow the instructions of README.md in each directory.
Expand Down
7 changes: 2 additions & 5 deletions example/transformers/models/llm/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The above list is a just representative collection of models supported using the
## Performance on PHX, HPT and STX

The following table provides the best token-time (tokens/sec) observed on PHX, STX and HPT boards (4/2024)
The following table provides the best token-time (tokens/sec) observed on PHX, STX and HPT boards

| Model Name | Quantization | PHX | HPT | STX | STX GGUF |
|----------------------------------------------------------|--------------|-------|-------|-------|----------|
Expand All @@ -62,11 +62,9 @@ The following table provides the best token-time (tokens/sec) observed on PHX, S
| [state-spaces/mamba-1.4b-hf](./mamba-1.4b.md) | PerGrp | 3.0 | 4.4 | 5.6 | |
| [state-spaces/mamba-2.8b-hf](./mamba-2.8b.md) | PerGrp | 1.5 | 2.3 | 3.0 | |

For STX board, OS: Microsoft Windows 11 Pro 10.0.26085 Build 26085 and Driver: 11.201.8.138 3/29/2024 Versions are used.

[Smoothquant + w8a8 models](./smoothquant_latency.md)

The following table provides the best token-time (tokens/sec) observed on PHX, STX and HPT boards (4/2024) for w8a8, w8a16
The following table provides the best token-time (tokens/sec) observed on PHX, STX and HPT boards for w8a8, w8a16

| Model Name | Quantization | PHX | HPT | STX |
|----------------------------------------------------------|--------------|-------|-------|-------|
Expand All @@ -82,7 +80,6 @@ cd <transformers>
conda env create --file=env.yaml
conda activate ryzenai-transformers
build_dependencies.bat
# build_dependencies.ps1
```

AWQ Model zoo has precomputed scales, clips and zeros for various LLMs including OPT, Llama. Get the precomputed results:
Expand Down
2 changes: 1 addition & 1 deletion example/transformers/models/llm/docs/llama2.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ w4abf16 (AWQ, 3-bit, g:128) + FA + lm_head(g:32) | NPU | 6.951
w8a8 (SmoothQuant + PTDQ) + FA | NPU | 19.745 | na | 26.67
w8a16 (SmoothQuant + PTDQ) + FA | NPU | 6.990 | na | na

# Support modes on AIE/IPU - 2024.01
# Support modes on AIE/IPU

| Precision | PHX | STX | HPT
:--------------|----------|---------|--------
Expand Down
2 changes: 1 addition & 1 deletion example/transformers/models/llm_gguf/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Launch your website and promote it through social media,

## Performance on PHX, HPT and STX

The following table provides the best token-time (tokens/sec) observed on PHX, HPT, and STX boards (4/2024)
The following table provides the best token-time (tokens/sec) observed on PHX, HPT, and STX boards

| Model Name | Quantization | PHX | HPT | STX |
|----------------------------------------------------------|--------------|-------|-------|-------|
Expand Down

0 comments on commit 9371dad

Please sign in to comment.