Skip to content

Releases: VectorInstitute/vector-inference

v0.2.1

06 Jul 15:58
2c43a25
Compare
Choose a tag to compare
  • Add CodeLlama
  • Update model variant names for Llama 2 in README

v0.2.0

04 Jul 14:29
635e13f
Compare
Choose a tag to compare
  • Update default environment to use singularity container, added associated Dockerfile
  • Update vLLM to 0.5.0 and added VLM support (LLaVa-1.5 and LLaVa-NEXT) and updated example scripts
  • Refactored repo structure for simpler model onboard and update process

v0.1.1

23 May 20:32
Compare
Choose a tag to compare
  • Update vllm to 0.4.2, which resolves the flash attention package not found issue
  • Update instructions for using the default environment to prevent/resolve NCCL not found error

v0.1.0

24 Apr 20:21
0784588
Compare
Choose a tag to compare

Easy-to-use high-throughput LLM inference on Slurm clusters using vLLM

Supported models and variants:

  • Command R plus
  • DBRX: Instruct
  • Llama 2: 7b, 7b-chat, 13b, 13b-chat, 70b, 70b-chat
  • Llama 3: 8B, 8B-Instruct, 70B, 70B-Instruct
  • Mixtral: 8x7B-Instruct-v0.1, 8x22B-v0.1, 8x22B-Instruct-v0.1

Supported functionalities:

  • Completions and chat completions
  • Logits generation