Name	Name	Last commit message	Last commit date
parent directory ..
llama3-70b	llama3-70b
README.md	README.md

Name

Last commit message

Last commit date

LLM Inference Optimization References

In this repo we take a look at applying LLM Inference Optimization techniques such as Quantization and Speculative Decoding via the SageMaker Python SDK Model Builder class.

Achieve up to ~2x higher throughput while reducing costs by ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 1
https://aws.amazon.com/blogs/machine-learning/achieve-up-to-2x-higher-throughput-while-reducing-costs-by-up-to-50-for-generative-ai-inference-on-amazon-sagemaker-with-the-new-inference-optimization-toolkit-part-2/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3

Llama3

README.md

LLM Inference Optimization References

Files

Llama3

Directory actions

More options

Directory actions

More options

Latest commit

History

Llama3

Folders and files

parent directory

README.md

LLM Inference Optimization References