Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"
-
Updated
Jul 2, 2024 - Python
Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
On-device LLM Inference Powered by X-Bit Quantization
"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
List of papers related to neural network quantization in recent AI conferences and journals.
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
Semiparametric efficient rank-based estimation of copula parameters
[NeurIPS'23] Speculative Decoding with Big Little Decoder
[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark
Dive into the forefront of Large Language Models (LLMs) with our concise guide on the top 10 hot topics. Explore bias mitigation, efficient training, multimodal models, and more. Stay abreast of the latest advancements shaping the landscape of LLMs.
Explorations into some recent techniques surrounding speculative decoding
[ICML 2023] Linkless Link Prediction via Relational Distillation
Graph Based image processing for segmenting images and detecting free spots in crowded scenes.
Official PyTorch training code of Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity (ICCV2023-RCV)
Add a description, image, and links to the efficient-inference topic page so that developers can more easily learn about it.
To associate your repository with the efficient-inference topic, visit your repo's landing page and select "manage topics."