ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

xctan · 2024-10-24T07:12:26Z

This PR supersedes Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #9953
The PR contains RISC-V Vector version of ggml_gemm_q4_0_8x8_q8_0 used for Q4_0_8_8 quantized models

I've tested Mistral 7B in qemu, and it just worked. I'm still choosing suitable 3B models for my dev board with only 4GB RAM (Banana Pi BPI-F3), so I can't give any performance evaluation as of now and any help is welcome! BTW Mistral 7B could run on BPI-F3 4GB with another 4GB swap memory enabled, but it was much slower than even qemu.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

xctan · 2024-10-24T16:33:50Z

Model: https://huggingface.co/CobraMamba/mamba-gpt-3b-v4
Compiler: GCC 13.2.0

model	size	params	backend	threads	test	t/s	speedup	commit
llama 3B Q4_0_8_8	1.84 GiB	3.43 B	CPU	8	pp512	1.82 ± 0.00	271%	`78c78e2`
llama 3B Q4_0_8_8	1.84 GiB	3.43 B	CPU	8	pp512	0.49 ± 0.00		`66c2c93`

llama 3B Q4_0_8_8	1.84 GiB	3.43 B	CPU	8	tg128	2.25 ± 0.10	350%	`78c78e2`
llama 3B Q4_0_8_8	1.84 GiB	3.43 B	CPU	8	tg128	0.50 ± 0.01		`66c2c93`

xctan added 5 commits October 20, 2024 01:15

ggml : RISC-V vector gemv for q4_0_8x8

9bfecf4

ggml : Added WIP rvv q4_0_8x8 gemm

3f7fdf2

ggml : Added initial implementation of rvv gemm

238cd66

ggml : optimize gemm to avoid register spillover

c039415

ggml : Fix GCC rvv load alignment issue

78c78e2

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 24, 2024

ggerganov approved these changes Oct 25, 2024

View reviewed changes

ggml : Format gemm rvv code

37057a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

xctan commented Oct 24, 2024

xctan commented Oct 24, 2024 •

edited

Loading

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

Are you sure you want to change the base?

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

Conversation

xctan commented Oct 24, 2024

xctan commented Oct 24, 2024 • edited Loading

xctan commented Oct 24, 2024 •

edited

Loading