Introduction of gemm4xN and gemmMx4 for Q4_0 and Q8_0 for better performance results #8908
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
GCC Linux :
Meta Llama2 7B model:
Q4_0 Model :
Q8_0 Model :
Mistral-7B-Instruct-v0.3 model:
Q4_0 Model :
Q8_0 Model :
GCC Version = 12.3
The PR was tested in AMD Raphael 7600X which supports the following flags by default :
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1|
Original Unquantized Models :
Llama2 7B : https://huggingface.co/meta-llama/Llama-2-7b
Mistral 7B Instruct v0.3 : https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3