Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance vs GMP? #1

Open
timotheecour opened this issue Dec 21, 2020 · 1 comment
Open

performance vs GMP? #1

timotheecour opened this issue Dec 21, 2020 · 1 comment

Comments

@timotheecour
Copy link

Curious how it compares to GMP, and, less importantly, to these:

links

@mratsim
Copy link
Collaborator

mratsim commented Dec 22, 2020

The library is still vaporware at the moment.

That said I plan to bring all the lessons learned from https://github.com/mratsim/constantine on writing high performance big int code. In particular the inline assembler I developed for example for multiplication using MULX/ADCX/ADOX: https://github.com/mratsim/constantine/blob/e89429e/constantine/arithmetic/assembly/limbs_asm_mul_x86_adx_bmi2.nim#L72-L110

Constantine is significantly faster than GMP even though its constant-time but it's mostly because of specialization:

  • sizes are known at compile-time which allow full unrolling of loops
  • sizes are known at compile-time so no worry about having to resize the buffer
  • Big Integers are unsigned
  • Operand have the same size proc sub(a: var BigInt, b: Bigint) is a drag to implement when a < b

In particular, no library that uses pure C can reach GMP or Constantine speed simply due to the compiler being the biggest optimization barrier for big integer code, even with intrinsics: https://gcc.godbolt.org/z/2h768y

#include <stdint.h>
#include <x86intrin.h>

void add256(uint64_t a[4], uint64_t b[4]){
  uint8_t carry = 0;
  for (int i = 0; i < 4; ++i)
    carry = _addcarry_u64(carry, a[i], b[i], &a[i]);
}

GCC:

add256:
        movq    (%rsi), %rax
        addq    (%rdi), %rax
        setc    %dl
        movq    %rax, (%rdi)
        movq    8(%rdi), %rax
        addb    $-1, %dl
        adcq    8(%rsi), %rax
        setc    %dl
        movq    %rax, 8(%rdi)
        movq    16(%rdi), %rax
        addb    $-1, %dl
        adcq    16(%rsi), %rax
        setc    %dl
        movq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        addb    $-1, %dl
        adcq    %rax, 24(%rdi)
        ret

Clang:

add256:
        movq    (%rsi), %rax
        addq    %rax, (%rdi)
        movq    8(%rsi), %rax
        adcq    %rax, 8(%rdi)
        movq    16(%rsi), %rax
        adcq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        adcq    %rax, 24(%rdi)
        retq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants