Skip to content

feat: add sharding support for mlx-lm models #5

@andthattoo

Description

@andthattoo

Integrate all MLX-LM model architectures with proper sharding augmentations for distributed inference in dnet.

Priority based on production deployments, HuggingFace downloads, and benchmark performance

  • gpt_oss
  • deepseek_v2
  • deepseek_v3
  • llama
  • llama4
  • qwen3
  • qwen3_moe
  • qwen3_next
  • qwen2
  • qwen2_moe
  • internlm3
  • gemma3
  • gemma3_text
  • gemma3n
  • glm4
  • glm4_moe
  • olmo2
  • olmo3

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions