Skip to content

Latest commit

 

History

History
142 lines (104 loc) · 4.28 KB

README.md

File metadata and controls

142 lines (104 loc) · 4.28 KB

unroll

Unroll for loops with integer literal bounds using an attribute-like macro.

On crates.io On docs.rs Build status

This crate provides a attribute-like macro unroll_for_loops that can be applied to functions. This macro looks for loops to unroll and unrolls them at compile time.

Why a procedural macro?

There is already a crate called Crunchy that unrolls loops with a macro_rules! macro. This crate provides an alternative solution for unrolling loops using an attribute-like macro instead. Some benefits of an item based attribute-like macro for unrolling loops are:

  • Macro invocation doesn't pollute the algorithm with annotations. Only one annotation is needed for each function.
  • Can use variables defined outside of the macro invocation.
  • Can be extended to do partial loop unrolling

Usage

Just add #[unroll_for_loops] above the function whose for loops you would like to unroll. Currently all for loops with integer literal bounds will be unrolled, although this macro currently can't see inside complex code (e.g. for loops within closures).

Examples

The following example demonstrates the kinds of code that can be unrolled automatically.

#[unroll_for_loops]
fn main() {
    println!("matrix = ");
    for i in 0..10 {
        for j in 0..10 {
            print!("({:?}, {:?})", i, j);
        }
        println!("");
    }
}

Good Uses

Of course not all code benefits from unrolling. For instance the code above is bound by I/O, and hence likely wont benefit from unrolling.

On the other hand, computational tasks, such as matrix multiplication, can benefit significantly as demonstrated by this simple example of a matrix-vector product:

#[unroll_for_loops]
fn mtx_vec_mul(mtx: &[[f64; 5]; 5], vec: &[f64; 5]) -> [f64; 5] {
    let mut out = [0.0; 5];
    for col in 0..5 {
        for row in 0..5 {
            out[row] += mtx[col][row] * vec[col];
        }
    }
    out
}

A quick benchmark of this code generated by criterion shows a 3x performance increase with loop unrolling:

Matrix-Vector Product/Ordinary
time:   [2.7254 ns 2.7657 ns 2.8153 ns]

Matrix-Vector Product/Unrolled
time:   [878.71 ps 882.96 ps 888.01 ps]

The relative performance difference can be visualized in the violin plot:

Violin plot for Matrix-Vector product benchmark

Bad Uses

There are also places where this macro has no benefit where one might be expected. For instance, when computing a sum of elements in a large array, one might break the computation into independent blocks to allow the CPU to perform out of order execution optimizations. For example the following code:

fn sum(data: &[f64]) -> f64 {
    let mut sum = 0.0;
    for i in 0..data.len() {
        sum += data[i];
    }
    sum
}

can get a substantial boost in performance when computed in chunks as:

fn sum(data: &[f64]) -> f64 {
    let mut sum = [0.0; 32];
    for i in (0..data.len()).step_by(32) {
        for j in 0..32 {
            sum[j] += data[i + j];
        }
    }
    sum.iter().sum()
}

However this example is unlikely to benefit from the unroll_for_loops macro. A benchmark for unrolling sums is included for verification.

Having said this, a macro that can partially unroll loops as shown above can be very useful and is left for future work.

Contributions are welcome!

Acknowledgements

I would like to thank the author of Crunchy for the initial great solution to this problem.

License

This repository is licensed under either of

at your option.