erenup / TensorRT-LLM Public

forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

nvidia.github.io/TensorRT-LLM

Apache-2.0 license

2 stars 1.5k forks Branches Tags Activity

Notifications

Error
Looks like something went wrong!

About

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

nvidia.github.io/TensorRT-LLM

Apache-2.0 license

Report repository

Releases

No releases published

Packages

No packages published

Languages

C++ 99.2%
Python 0.6%
Cuda 0.2%
CMake 0.0%
Smarty 0.0%
Shell 0.0%