rvint

Integer-based mathematical subroutines which implement RISC-V M and B extension functionality on I, and Zmmul instruction sets. There is partial support for the reduced functionality E instruction set where this is easy to do and doesn't involve a performance tradeoff.

These routines are designed to be both concise and efficient, although where those two goals clash, I went with concise.

Division

On 32-bit processors

32-bit by 32-bit signed and unsigned division with 32-bit result and remainder. (unsigned division is RV32E compatible)
Fast 32-bit unsigned division by 3
Fast 32-bit unsigned division by 5
Fast 32-bit unsigned division by 6
Fast 32-bit unsigned division by 10

On 64-bit processors

64-bit by 64-bit signed and unsigned division with 64-bit result and remainder. (unsigned division is RV64E compatible)
Fast 64-bit unsigned division by 3
Fast 64-bit unsigned division by 5
Fast 64-bit unsigned division by 6
Fast 64-bit unsigned division by 10

Multiplication

On 32-bit processors:

32-bit by 32-bit signed and unsigned multiplication with 32-bit result.
32-bit by 32-bit signed and unsigned multiplication with 64-bit result.

on 64-bit processors:

32-bit by 32-bit signed and unsigned multiplication with 64-bit result.
64-bit by 64-bit signed and unsigned multiplication with 64-bit result.
64-bit by 64-bit signed and unsigned multiplication with 128-bit result.

Base Conversions & I/O Operations

These operations support 32-bit numbers on 32-bit architectures and 64-bit numbers on 64-bit architectures.

ASCII binary to binary. (RV32E compatible)
ASCII unsigned decimal to binary. (RV32E compatible)
ASCII signed decimal to two's complement binary. (RV32E compatible)
ASCII hexadecimal to binary. (RV32E compatible)
binary to ASCII binary. (RV32E compatible)
two's complement binary to ASCII signed decimal. (RV32E compatible)
unsigned binary to unsigned ASCII decimal. (RV32E compatible)
binary to ASCII hexadecimal. (RV32E compatible)

Square Root

32-bit integer square root on 32-bit processors.
64-bit integer square root on 64-bit processors.

Greatest Common Divisor

32-bit GCD of two unsigned 32-bit numbers on 32-bit processors.
64-bit GCD of two unsigned 64-bit numbers on 64-bit processors.

Least Common Multiple

32-bit LCM of two unsigned 32-bit numbers on 32-bit processors.
64-bit LCM of two unsigned 64-bit numbers on 64-bit processors.

Bit Operations

Count leading zeroes in 32-bit number on 32-bit processors. (RV32E compatible)
Count leading zeroes in 64-bit number on 64-bit processors. (RV64E compatible)
Count trailing zeroes in 32-bit number on 32-bit processors. (RV32E compatible)
Count trailing zeroes in 64-bit number on 64-bit processors. (RV64E compatible)

Hash Table

Double hashing collision resolution
Linear probing with tombstone deletion
Configurable table size and entry format
O(1) average case insert, retrieve, remove operations
Rehashing to optimize probe sequences
Iterator interface for traversing entries

Building

clang, lld, and make are assumed. I'm currently using clang 18. The TARGET, ARCH, and ABI should be edited in the Makefile, and the CPU_BITS constant should be set to match in config.s

A static library, librvint.a, will be created to link against.

Only certain routines support RV32E and RV64E. The routines that do not currently give compile errors; this will be fixed in the future. RV64E is not a focus as I am unaware of any real world implementations. The Makefile has a CH32V003 target that is an RV32EC_zicsr target.

Tests

The test programs assume Linux syscalls are available. In particular syscall 64 is write and 93 is exit.

Running

I use this emulator to run RV32I code: https://riscv-programming.org/ale/
I'm running RV64 code on a scaleway elastic metal cloud instance: https://labs.scaleway.com/en/em-rv1/

API

Bit manipulation - bits.s:


################################################################################
# routine: bits_ctz
#
# Count the number of trailing zeroes in a number via binary search - O(log n).
# This is useful for processors with no B extension. This routine provides the
# functionality of the ctz instruction (on 32-bit processors) and ctzw (on
# 64-bit processors).
# RV32E compatible.
#
# input registers:
# a0 = number
#
# output registers:
# a0 = result containing the number of trailing zeroes
################################################################################

################################################################################
# routine: bits_clz
#
# Count the number of leading zeroes in a number via binary search - O(log n).
# This is useful for processors with no B extension. This routine provides the
# functionality of the clz instruction (on 32-bit processors) and clzw (on
# 64-bit processors).
# RV32E compatible.
#
# input registers:
# a0 = number
#
# output registers:
# a0 = result containing the number of leading zeroes in the input
################################################################################

Division - div.s:

################################################################################
# routine: divremu
#
# Unsigned integer division without using M extension.
# RV32E compatible.
# This division is 64-bit on 64-bit CPUs and 32-bit on 32-bit CPUs.
# It uses the restoring division algorithm. It can be used to emulate
# the RISC-V M extension divu, remu, divuw, and remuw instructions.
#
# input registers:
# a0 = dividend
# a1 = divisor
#
# output registers:
# a0 = quotient
# a1 = remainder
################################################################################

################################################################################
# routine: divrem
#
# Signed integer division - rounds towards zero.
# This division is 64-bit on 64-bit CPUs and 32-bit on 32-bit CPUs.
# It uses the restoring division algorithm. It can be used to emulate
# the RISC-V M extension div, rem, divw, and remw instructions.
#
# input registers:
# a0 = dividend (N)
# a1 = divisor (D)
#
# output registers:
# a0 = quotient (Q)
# a1 = remainder (R)
################################################################################

################################################################################
# routine: div3u
#
# Unsigned fast division by 3 without using M extension.
# This routine is 64-bit on 64-bit CPUs and 32-bit on 32-bit CPUs.
# It uses a fast multiply/shift/add/correct algorithm.
# Also suitable for use on RV32E architectures.
#
# input registers:
# a0 = unsigned dividend (32 or 64 bits)
#
# output registers:
# a0 = quotient (unsigned)
################################################################################

################################################################################
# routine: div5u
#
# Unsigned fast division by 5 without using M extension.
# This routine is 64-bit on 64-bit CPUs and 32-bit on 32-bit CPUs.
# It uses a fast multiply/shift/add/correct algorithm.
# Suitable for use on RV32E architectures.
#
# input registers:
# a0 = unsigned dividend (32 or 64 bits)
#
# output registers:
# a0 = quotient (unsigned)
################################################################################

################################################################################
# routine: div6u
#
# Unsigned fast division by 6 without using M extension.
# This routine is 64-bit on 64-bit CPUs and 32-bit on 32-bit CPUs.
# It uses a fast multiply/shift/add/correct algorithm.
# Suitable for use on RV32E architectures.
#
# input registers:
# a0 = unsigned dividend (32 or 64 bits)
#
# output registers:
# a0 = quotient (unsigned)
################################################################################

################################################################################
# routine: div10u
#
# Unsigned fast division by 10 without using M extension.
# This routine is 64-bit on 64-bit CPUs and 32-bit on 32-bit CPUs.
# It uses a fast multiply/shift/add/correct algorithm.
# Suitable for use on RV32E architectures.
#
# input registers:
# a0 = unsigned dividend (32 or 64 bits)
#
# output registers:
# a0 = quotient (unsigned)
################################################################################

Algorithms: gcd.s

################################################################################
# routine: gcd
#
# Compute the greatest common divisor (gcd) of two unsigned numbers.
# 64 bit algorithm on 64-bit CPUs, 32-bit algorithm on 32-bit CPUs.
#
# input registers:
# a0 = first number (u)
# a1 = second number (v)
#
# output registers:
# a0 = gcd(u, v)
################################################################################

################################################################################
# routine: lcm
#
# Compute the least common multiple (lcm) of two unsigned numbers.
# 64 bit algorithm on 64-bit CPUs, 32-bit algorithm on 32-bit CPUs.
#
# input registers:
# a0 = u
# a1 = v
#
# output registers:
# a0 = lcm(u,v)
################################################################################

I/O and Base Conversion io.s

################################################################################
# routine: to_hex
#
# Convert a value in a register to an ASCII hexadecimal string.
# RV32E compatible
#
# input registers:
# a0 = number to convert to ascii hex
# a1 = number of bytes to convert (eg, 1, 2, 4, 8)
# a2 = 0 do not insert leading 0x, 1 insert leading 0x
#
# output registers:
# a0 = address of nul (\0)-terminated buffer with output
# a1 = length of string
################################################################################

################################################################################
# routine: to_bin
#
# Convert a value in a register to an ASCII binary string.
# RV32E compatible.
#
# input registers:
# a0 = number to convert to ascii binary
# a1 = number of bytes to convert (eg 1, 2, 4, 8)
# a2 = 0 do not insert spaces every 8 bits, 1 insert spaces every 8 bits
#
# output registers:
# a0 = address of nul (\0)-terminated buffer with output
# a1 = length of string
################################################################################

################################################################################
# routine: to_decu
#
# Convert a value in a register to an unsigned ASCII decimal string.
# RV32E compatible
#
# input registers:
# a0 = unsigned number to convert to ascii unsigned decimal
#
# output registers:
# a0 = address of nul-terminated (\0) buffer with output
# a1 = length of string
################################################################################

################################################################################
# routine: to_dec
#
# Convert a value in a register to a signed ASCII decimal string.
#
# input registers:
# a0 = signed number to convert to ascii signed decimal
#
# output registers:
# a0 = address of nul-terminated (\0) buffer with output
# a1 = length of string
################################################################################

################################################################################
# routine: from_hex
#
# Read an ASCII hexidecimal string into a register. The parsing of the value
# stops when we read the first non-hex character.
# RV32E compatible.
#
# input registers:
# a0 = pointer to number to convert from hex, terminated with non-hex char.
#
# output registers:
# a0 = pointer (advanced to point to non-hex char)
# a1 = number
# a2 = error check: 0 if no digits found, otherwise 1
################################################################################

################################################################################
# routine: from_bin
#
# Read an ASCII binary string into a register. The parsing of the value
# stops when we read the first non-binary character.
# RV32E compatible.
#
# input registers:
# a0 = pointer to number to convert from binary, terminated with non-binary char.
#
# output registers:
# a0 = pointer (advanced to point to non-binary char)
# a1 = number
# a2 = error check: 0 if no digits found, otherwise 1
################################################################################

################################################################################
# routine: from_decu
#
# Read an ASCII unsigned decimal string into a register. The parsing of the value
# stops when we read the first non-decimal character.
# RV32E compatible.
#
# input registers:
# a0 = pointer to number to convert from decimal, terminated with non-decimal char.
#
# output registers:
# a0 = pointer (advanced to point to non-decimal char)
# a1 = number
# a2 = error check: 0 if no digits found, otherwise 1
################################################################################

################################################################################
# routine: from_dec
#
# Read an ASCII signed decimal string into a register. The parsing of the value
# stops when we read the first non-decimal character.
# RV32E compatible.
#
# input registers:
# a0 = pointer to number to convert from decimal, terminated with non-decimal char.
#
# output registers:
# a0 = pointer (advanced to point to non-decimal char)
# a1 = number
# a2 = error check: 0 if no digits found, otherwise 1
################################################################################

Multiplication mul.s

################################################################################
# routine: nmul
#
# Native word length (64 on 64-bit processors, 32 on 32-bit processors)
# multiplication via shift-and-add technique.
#
# - RV32I: 32x32 -> 32 bit result (signed or unsigned)
# - RV64I: 64x64 -> 64 bit result (signed or unsigned)
# Implemented using only RV32I / RV64I base instructions (No 'M' Extension).
# This provides the functionality of the M extension mul/mulu/mulw instructions
#
# input registers:
# a0 = CPU_BITS-bit multiplicand
# a1 = CPU_BITS-bit multiplier
#
# output registers:
# a0 = CPU_BITS-bit product (lower bits)
################################################################################

################################################################################
# routine: mul32
#
# Unified (RV64I and RV32I) Unsigned/Signed 32x32-bit to 64-bit multiply. This
# provides the functionality of the M extension mul/mulh instructions
#
# input registers:
# a0 = op1
# a1 = op2
# a2 = signed_flag: 0=unsigned, 1=signed
#
# output registers:
# a0 = product low word
# a1 = product high word
################################################################################

################################################################################
# routine: m128
#
# 64x64-bit to 128-bit Multiplication (Signed/Unsigned) on 64-bit processors.
# This provides the functionality of the mulhu/mulhsu instructions on RV64.
#
# input registers:
# a0 = 64-bit Operand 1 (multiplicand)
# a1 = 64-bit Operand 2 (multiplier)
# a2 = Signedness flag (0 for unsigned, non-zero for signed)
#
# output registers:
# a0 = Lower 64 bits of the 128-bit product
# a1 = Upper 64 bits of the 128-bit product
################################################################################

Square Root sqrt.s

################################################################################
# routine: isqrt
#
# Compute the integer square root of an unsigned number - floor(sqrt(N)).
# Algorithm: Non-restoring binary square root. On 64-bit processors this is
# a 64-bit algorithm, on 32-bit, it is 32-bits.
#
# input registers:
# a0 = n
# output registers:
# a0 = root - isqrt(n)
################################################################################

Hash Table hash.s

NOTE: Hash table is currently only partly implemented

################################################################################
# Hash Table Implementation
#
# A doubly-hashed open addressing hash table optimized for RISC-V.
# Features:
# - Configurable table size (default prime number 103)
# - Double hashing for collision resolution
# - Tombstone-based deletion for maintaining probe sequences
# - In-place rehashing without extra memory allocation
# - Support for both 32-bit and 64-bit architectures
# - Optimized probe sequence using conditional subtraction
# - No M-extension dependency (no hardware multiply/divide)
#
# Key functions:
# - Insertion, retrieval, and removal of key-value pairs
# - Iteration over entries
# - Memory-efficient rehashing
################################################################################

################################################################################
# routine: hash_insert
#
# Insert a key-value pair into the hash table using double hashing.
# Probes until an empty slot or tombstone is found. Uses double hashing
# for collision resolution with h1 for initial position and h2 for step size.
#
# The probing sequence is optimized to avoid expensive division operations:
# 1. Initial position = h1(key) * ELEMENTLEN
# 2. Step size = h2(key) (pre-scaled by ELEMENTLEN)
# 3. Each probe: new_pos = current_pos + step_size
# 4. Wrap using conditional subtraction since new_pos < 2 * table_size
#
# input registers:
# a0 = pointer to key string
# a1 = value to insert
#
# output registers:
# a0 = value if successful, 0 if table is full
################################################################################

################################################################################
# routine: hash_retrieve
#
# Retrieve a value from the hash table by key.
# Uses double hashing to probe until key is found or empty slot is reached.
# Handles both in-use and tombstone slots during probing.
#
# Uses the same optimized probing sequence as hash_insert.
#
# input registers:
# a0 = pointer to key string to look up
#
# output registers:
# a0 = value if found, 0 if not found
################################################################################

################################################################################
# routine: hash_remove
#
# Remove an entry from the hash table by key.
# Uses double hashing to find the entry, then marks it as a tombstone
# rather than completely clearing it to maintain probe sequences.
#
# Uses the same optimized probing sequence as hash_insert.
#
# input registers:
# a0 = pointer to key string to remove
#
# output registers:
# a0 = value that was removed if found, 0 if not found
################################################################################

################################################################################
# routine: hash_rehash
#
# Rebuild table to remove tombstones and optimize probe sequences.
# Uses in-place algorithm to avoid extra memory allocation.
#
# Uses the same optimized probing sequence as hash_insert.
#
# input registers:
# none
#
# output registers:
# a0 = number of entries rehashed
################################################################################

The hash table implementation provides an efficient key-value store optimized for RISC-V architectures. It uses double hashing for collision resolution, which provides better clustering properties than linear or quadratic probing. The implementation is designed to work efficiently in bare metal environments by avoiding dynamic memory allocation and expensive operations.

Key features:

O(1) average case for insertions, retrievals and deletions
Configurable table size (default 103)
Support for both 32-bit and 64-bit architectures
Memory-efficient in-place rehashing
Maintains probe chain integrity using tombstone deletion
Provides iteration capability over entries
Optimized probe sequence without expensive division operations
No dependency on M-extension (hardware multiply/divide)

The implementation uses two hash functions:

h1(key) - Primary hash for initial probe position
h2(key) - Secondary hash for probe step size (pre-scaled by ELEMENTLEN)

The probing sequence is optimized to avoid expensive division operations:

Initial position = h1(key) * ELEMENTLEN (using shift-and-add)
Step size = h2(key) (pre-scaled by ELEMENTLEN)
Each probe: new_pos = current_pos + step_size
Wrap using conditional subtraction since new_pos < 2 * table_size

Deletion is handled using tombstones to maintain probe sequences, with periodic rehashing available to reclaim space and optimize probe lengths. The rehashing algorithm is designed to work in-place without requiring additional memory allocation, making it suitable for bare metal environments.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
src		src
LICENSE		LICENSE
README.md		README.md
rvint.png		rvint.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rvint

Division

On 32-bit processors

On 64-bit processors

Multiplication

On 32-bit processors:

on 64-bit processors:

Base Conversions & I/O Operations

Square Root

Greatest Common Divisor

Least Common Multiple

Bit Operations

Hash Table

Building

Tests

Running

API

Bit manipulation - bits.s:

Division - div.s:

Algorithms: gcd.s

I/O and Base Conversion io.s

Multiplication mul.s

Square Root sqrt.s

Hash Table hash.s

About

Uh oh!

Releases

Packages

Languages

License

benmesander/rvint

Folders and files

Latest commit

History

Repository files navigation

rvint

Division

On 32-bit processors

On 64-bit processors

Multiplication

On 32-bit processors:

on 64-bit processors:

Base Conversions & I/O Operations

Square Root

Greatest Common Divisor

Least Common Multiple

Bit Operations

Hash Table

Building

Tests

Running

API

Bit manipulation - bits.s:

Division - div.s:

Algorithms: gcd.s

I/O and Base Conversion io.s

Multiplication mul.s

Square Root sqrt.s

Hash Table hash.s

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages