Would this work on consumer hardware and integrated in frameworks like llama.cpp or others? #5

Mayorc1978 · 2024-05-11T14:37:46Z

As per title.
Example: with GPUs like 3060 12GB or 3090 24GB.

ys-2020 · 2024-05-14T17:01:39Z

Hi @Mayorc1978 , thank you very much for your interest in QServe! Although it is targeted for large-scale LLM serving, QServe can also work on consumer GPUs like RTX 4090 and 3090. For RTX 4090, you can expect a similar speedup over TensorRT-LLM as on L40S. We did not do many experiments on 3060 or 3090, but we believe that the principles will still hold.

tp-nan · 2024-05-17T08:35:46Z

Hi, how about Tesla T4 and RTX2080Ti?

ys-2020 · 2024-05-17T19:51:02Z

Hi @tp-nan , Tesla T4 and RTX2080 are not supported in QServe right now. Currently, we have some instructions that can only be compiled with Ampere+ architecture. We will consider support older GPUs after cleaning the cuda code. Thank you!

anaivebird · 2024-11-08T08:43:17Z

@ys-2020 will the performance of qserve outperform trtllm w4a8 in llama3 13b?

Mayorc1978 changed the title ~~Would this work on consumer hardware and with frameworks like llama.cpp or others?~~ Would this work on consumer hardware and integrated in frameworks like llama.cpp or others? May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would this work on consumer hardware and integrated in frameworks like llama.cpp or others? #5

Would this work on consumer hardware and integrated in frameworks like llama.cpp or others? #5

Mayorc1978 commented May 11, 2024 •

edited

Loading

ys-2020 commented May 14, 2024 •

edited

Loading

tp-nan commented May 17, 2024

ys-2020 commented May 17, 2024

anaivebird commented Nov 8, 2024

Would this work on consumer hardware and integrated in frameworks like llama.cpp or others? #5

Would this work on consumer hardware and integrated in frameworks like llama.cpp or others? #5

Comments

Mayorc1978 commented May 11, 2024 • edited Loading

ys-2020 commented May 14, 2024 • edited Loading

tp-nan commented May 17, 2024

ys-2020 commented May 17, 2024

anaivebird commented Nov 8, 2024

Mayorc1978 commented May 11, 2024 •

edited

Loading

ys-2020 commented May 14, 2024 •

edited

Loading