#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 6,814 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Jan 26, 2026
Python

hashcat / hashcat

World's fastest and most advanced password recovery utility

c opencl cuda password gpgpu hashes cracking hashcat

Updated Nov 20, 2025
C

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

reinforcement-learning cuda inference transformer moe attention llama glm minimax wan diffusion vlm blackwell llm qwen deepseek gpt-oss qwen-image

Updated Jan 26, 2026
Python

NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs

docker gpu cuda nvidia-docker

Updated Dec 6, 2023

instant-ngp

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

machine-learning real-time computer-vision neural-network computer-graphics realtime cuda signed-distance-functions nerf 3d-reconstruction function-approximation real-time-rendering

Updated Dec 14, 2025
Cuda

kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

shell c-plus-plus cuda speech speech-recognition speech-to-text kaldi speaker-verification speaker-id

Updated Sep 22, 2025
Shell

tracel-ai / burn

Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

rust machine-learning deep-learning metal cross-platform neural-network vulkan cuda wasm pytorch scientific-computing ndarray tensor webgpu rocm autodiff onnx kernel-fusion

Updated Jan 24, 2026
Rust

vosen / ZLUDA

CUDA on non-NVIDIA GPUs

rust cuda

Updated Jan 21, 2026
Rust

Open3D

isl-org / Open3D

Open3D: A Modern Library for 3D Data Processing

Updated Jan 23, 2026
C++

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moe blackwell llm-serving

Updated Jan 26, 2026
Python

srush / GPU-Puzzles

Solve puzzles. Learn CUDA.

machine-learning cuda puzzles

Updated Sep 1, 2024
Jupyter Notebook

numba

numba / numba

NumPy aware dynamic Python compiler using LLVM

python compiler numpy llvm parallel cuda numba

Updated Jan 26, 2026
Python

cupy / cupy

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Jan 25, 2026
Python

LeetCUDA

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

cuda cuda-kernels cuda-demo cuda-toolkit cuda-library cuda-kernel learn-cuda cuda-cpp hgemm flash-attention leet-cuda cuda-12

Updated Jan 18, 2026
Cuda

rapidsai / cudf

cuDF - GPU DataFrame Library

python data-science cpp gpu arrow pydata cuda pandas data-analysis dask dataframe rapids cudf

Updated Jan 25, 2026
C++

oneflow

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

machine-learning deep-neural-networks deep-learning neural-network cuda ml distributed

Updated Dec 4, 2025
C++

replicate / cog

Containers for machine learning

docker machine-learning ai deep-learning containers tensorflow cuda pytorch

Updated Jan 24, 2026
Go

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

python deep-learning cpp gpu cuda nvidia deep-learning-library

Updated Jan 24, 2026
C++

catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

python data-science machine-learning data-mining tutorial r big-data gpu cuda kaggle gbdt gbm gpu-computing decision-trees gradient-boosting coreml catboost categorical-features

Updated Jan 25, 2026
C++

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

cuda cuda-kernels cuda-driver-api cuda-opengl

Updated Jan 6, 2026
C

Created by Nvidia

Released June 23, 2007

Followers: 283 followers
Website: github.com/topics/cuda
Wikipedia: Wikipedia

Related topics

nvcc