Inferless
Popular repositories Loading
-
triton-co-pilot
triton-co-pilot PublicGenerate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
-
qwq-32b-preview
qwq-32b-preview Public templateA 32B experimental reasoning model for advanced text generation and robust instruction following. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
-
whisper-large-v3
whisper-large-v3 Public templateState‑of‑the‑art speech recognition model for English, delivering transcription accuracy across diverse audio scenarios. <metadata> gpu: T4 | collections: ["CTranslate2"] </metadata>
-
deepseek-r1-distill-qwen-32b
deepseek-r1-distill-qwen-32b Public templateA distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
Repositories
- pyannote-speaker-diarization-3.1 Public template
A state-of-the-art model that segments and labels audio recordings by accurately distinguishing different speakers. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
inferless/pyannote-speaker-diarization-3.1’s past year of commit activity - facebook-bart-cnn Public template
A variant of the BART model designed specifically for natural language summarization. It was pre-trained on a large corpus of English text and later fine-tuned on the CNN/Daily Mail dataset. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
inferless/facebook-bart-cnn’s past year of commit activity - qwen-image Public
inferless/qwen-image’s past year of commit activity - Qwen3-30B-A3B-Instruct-2507 Public
inferless/Qwen3-30B-A3B-Instruct-2507’s past year of commit activity - qwen3-coder-30B-a3B-instruct Public
30.5B MoE code generation model purpose-tuned for code generation and agentic tool use. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/qwen3-coder-30B-a3B-instruct’s past year of commit activity - gpt-oss-20b Public template
A 21B open‑weight language model (with ~3.6 billion active parameters per token) developed by OpenAI for reasoning, tool integration, and low‑latency usage. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/gpt-oss-20b’s past year of commit activity - voxtral-mini-3b Public template
3B parameter audio-language model with speech transcription, translation, and audio understanding capabilities. <metadata> gpu: A10 | collections:["HF_Transformers"] </metadata>
inferless/voxtral-mini-3b’s past year of commit activity - kyutai-tts-1.6b Public template
1.6B parameter text-to-speech model that supports real-time streaming text input with ultra-low latency and voice conditioning capabilities.<metadata> gpu: A10 | collections:["HF_Transformers"] </metadata>
inferless/kyutai-tts-1.6b’s past year of commit activity - llama-3.1-8b-instruct-gguf Public template
An 8B-parameter, instruction-tuned variant of Meta's Llama-3.1 model, optimized in GGUF format for efficient inference. <metadata> gpu: A100 | collections: ["lama.cpp"] </metadata>
inferless/llama-3.1-8b-instruct-gguf’s past year of commit activity - stable-diffusion-3-5-large-turbo Public template
A fast, optimized diffusion model that generates high-quality images from text prompts, ideal for creative visual content. <metadata> gpu: A100 | collections: ["Diffusers"] </metadata>
inferless/stable-diffusion-3-5-large-turbo’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…