performance vs GMP? #1

timotheecour · 2020-12-21T05:38:48Z

Curious how it compares to GMP, and, less importantly, to these:

tiny-bignum-c (as used in V IIRC /cc @xflywind)
performance vs GMP? negative number support? ilia3101/Big-Integer-C#1

links

mratsim · 2020-12-22T14:32:26Z

The library is still vaporware at the moment.

That said I plan to bring all the lessons learned from https://github.com/mratsim/constantine on writing high performance big int code. In particular the inline assembler I developed for example for multiplication using MULX/ADCX/ADOX: https://github.com/mratsim/constantine/blob/e89429e/constantine/arithmetic/assembly/limbs_asm_mul_x86_adx_bmi2.nim#L72-L110

Constantine is significantly faster than GMP even though its constant-time but it's mostly because of specialization:

sizes are known at compile-time which allow full unrolling of loops
sizes are known at compile-time so no worry about having to resize the buffer
Big Integers are unsigned
Operand have the same size proc sub(a: var BigInt, b: Bigint) is a drag to implement when a < b

In particular, no library that uses pure C can reach GMP or Constantine speed simply due to the compiler being the biggest optimization barrier for big integer code, even with intrinsics: https://gcc.godbolt.org/z/2h768y

#include <stdint.h>
#include <x86intrin.h>

void add256(uint64_t a[4], uint64_t b[4]){
  uint8_t carry = 0;
  for (int i = 0; i < 4; ++i)
    carry = _addcarry_u64(carry, a[i], b[i], &a[i]);
}

GCC:

add256:
        movq    (%rsi), %rax
        addq    (%rdi), %rax
        setc    %dl
        movq    %rax, (%rdi)
        movq    8(%rdi), %rax
        addb    $-1, %dl
        adcq    8(%rsi), %rax
        setc    %dl
        movq    %rax, 8(%rdi)
        movq    16(%rdi), %rax
        addb    $-1, %dl
        adcq    16(%rsi), %rax
        setc    %dl
        movq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        addb    $-1, %dl
        adcq    %rax, 24(%rdi)
        ret

Clang:

add256:
        movq    (%rsi), %rax
        addq    %rax, (%rdi)
        movq    8(%rsi), %rax
        adcq    %rax, 8(%rdi)
        movq    16(%rsi), %rax
        adcq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        adcq    %rax, 24(%rdi)
        retq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance vs GMP? #1

performance vs GMP? #1

timotheecour commented Dec 21, 2020

mratsim commented Dec 22, 2020 •

edited

Loading

performance vs GMP? #1

performance vs GMP? #1

Comments

timotheecour commented Dec 21, 2020

links

mratsim commented Dec 22, 2020 • edited Loading

mratsim commented Dec 22, 2020 •

edited

Loading