efficient-inference

Here are 61 public repositories matching this topic...

czg1225 / AsyncDiff

Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"

distributed-computing text-to-image efficient-inference diffusion-models text-to-video inference-acceleration stable-diffusion training-free

Updated Jul 2, 2024
Python

horseee / DeepCache

Star

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

model-compression efficient-inference diffusion-models stable-diffusion training-free

Updated Jun 27, 2024
Python

Picovoice / picollm

Star

On-device LLM Inference Powered by X-Bit Quantization

natural-language-processing compression self-hosted llama language-models quantization language-model gemma mistral model-compression efficient-inference llm llms generative-ai large-language-model llama2 mixtral llm-infernece llama3

Updated Jun 24, 2024
Python

VITA-Group / LightGaussian

Star

"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang

3d-reconstruction efficient-inference gaussian-splatting

Updated Jun 20, 2024
Python

SqueezeAILab / KVQuant

Star

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

natural-language-processing compression text-generation transformer llama quantization mistral model-compression efficient-inference efficient-model large-language-models llm small-models localllm localllama

Updated Jun 13, 2024
Python

raymin0223 / fast_robust_early_exit

Star

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

nlp efficient-inference early-exiting autoregressive-models llms

Updated Jun 11, 2024
Python

horseee / learning-to-cache

Star

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

efficient-inference diffusion-models

Updated Jun 5, 2024
Python

Zhen-Dong / Awesome-Quantization-Papers

Star

List of papers related to neural network quantization in recent AI conferences and journals.

neural-networks awesome-list papers quantization model-compression edge-computing efficient-inference diffusion-models large-language-models

Updated May 28, 2024

huawei-noah / Efficient-AI-Backbones

Star

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

tensorflow pytorch transformer imagenet convolutional-neural-networks pretrained-models model-compression efficient-inference ghostnet vision-transformer

Updated May 8, 2024
Python

snap-research / graphless-neural-networks

Star

[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)

deep-learning scalability pytorch knowledge-distillation efficient-inference distillation graph-algorithm graph-neural-networks gnn

Updated May 3, 2024
Python

SqueezeAILab / SqueezeLLM

Star

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

natural-language-processing text-generation transformer llama quantization model-compression efficient-inference post-training-quantization large-language-models llm small-models localllm

Updated May 2, 2024
Python

SqueezeAILab / LLMCompiler

Star

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

nlp natural-language-processing transformer llama efficient-inference large-language-models llm llms llm-agent function-calling llama2 llm-framework llm-agents parallel-function-call

Updated Apr 15, 2024
Python

ramonVDAKKER / research-copulas

Star

Semiparametric efficient rank-based estimation of copula parameters

estimation gaussian efficient-inference rank-based semiparametric-estimation copulas

Updated Mar 30, 2024
MATLAB

kssteven418 / BigLittleDecoder

Star

[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference speculative-execution fast-inference llm speculative-decoding

Updated Feb 6, 2024
Python

IBM / AutoVP

Star

[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark

efficient-inference finetuning reprogramming model-agnostic downstream-tasks low-data-regime foundation-models visual-prompt visual-prompting ood-robustness

Updated Jan 16, 2024
Python

Saeidhoseinipour / LLMnewTopics

Star

Dive into the forefront of Large Language Models (LLMs) with our concise guide on the top 10 hot topics. Explore bias mitigation, efficient training, multimodal models, and more. Stay abreast of the latest advancements shaping the landscape of LLMs.

bias zero-shot-learning interpretability domain-adaptation efficient-inference continual-learning explainability llm llms