Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

xctan
Copy link

@xctan xctan commented Oct 24, 2024

I've tested Mistral 7B in qemu, and it just worked. I'm still choosing suitable 3B models for my dev board with only 4GB RAM (Banana Pi BPI-F3), so I can't give any performance evaluation as of now and any help is welcome! BTW Mistral 7B could run on BPI-F3 4GB with another 4GB swap memory enabled, but it was much slower than even qemu.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 24, 2024
@xctan
Copy link
Author

xctan commented Oct 24, 2024

Model: https://huggingface.co/CobraMamba/mamba-gpt-3b-v4
Compiler: GCC 13.2.0

model size params backend threads test t/s speedup commit
llama 3B Q4_0_8_8 1.84 GiB 3.43 B CPU 8 pp512 1.82 ± 0.00 271% 78c78e2
llama 3B Q4_0_8_8 1.84 GiB 3.43 B CPU 8 pp512 0.49 ± 0.00 66c2c93
llama 3B Q4_0_8_8 1.84 GiB 3.43 B CPU 8 tg128 2.25 ± 0.10 350% 78c78e2
llama 3B Q4_0_8_8 1.84 GiB 3.43 B CPU 8 tg128 0.50 ± 0.01 66c2c93

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants