schetlur-nv / TensorRT-LLM Public

forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

nvidia.github.io/TensorRT-LLM

Apache-2.0 license

0 stars 1.5k forks Branches Tags Activity

Notifications

Error
Looks like something went wrong!

About

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

nvidia.github.io/TensorRT-LLM

Apache-2.0 license

Report repository

Releases

No releases published

Packages

No packages published

Languages

C++ 99.5%
Python 0.4%
Cuda 0.1%
Groovy 0.0%
CMake 0.0%
Shell 0.0%