[FEA] Add support for `stcs` and `ldcs` intrinsics #42

gmarkall · 2024-08-09T15:33:58Z

This is to satisfy use cases that involve streaming loads and stores.

gmarkall · 2024-08-09T15:35:52Z

A quick prototype / proof-of-concept:

from llvmlite import ir
from numba import config, cuda, types
from numba.core import cgutils
from numba.core.extending import intrinsic
from numba.core.errors import NumbaTypeError

import numpy as np

config.DUMP_ASSEMBLY = True


@intrinsic
def ldcs(typingctx, base):
    if not isinstance(base, types.Array) or base.dtype != types.float16:
        msg = f"ldcs operates on float16 arrays. Got type {base}"
        raise NumbaTypeError(msg)
    signature = types.float16(base)

    def codegen(context, builder, sig, args):
        int16 = ir.IntType(16)
        int16_ptr = int16.as_pointer()
        ldcs_type = ir.FunctionType(int16, [int16_ptr])
        ldcs = ir.InlineAsm(ldcs_type, "ld.global.cs.b16 $0, [$1];", "=h, l")

        base = cgutils.create_struct_proxy(sig.args[0])(context, builder,
                                                        value=args[0]).data
        return builder.call(ldcs, [base])

    return signature, codegen


@cuda.jit
def f(r, x):
    r[0] = ldcs(x)


x = cuda.device_array(1, np.float16)
r = cuda.device_array(1, np.float16)
f[1, 1](r, x)

which produces

{
        ...
	cvta.to.global.u64 	%rd3, %rd2;
	// begin inline asm
	ld.global.cs.b16 %rs1, [%rd1];
	// end inline asm
	st.global.u16 	[%rd3], %rs1;
	ret;

}

The API needs support for an index into the array, not to just access the first element of the passed array (similar to atomics).

gmarkall added the feature request New feature or request label Aug 9, 2024

gmarkall added this to the v0.0.19 milestone Oct 21, 2024

gmarkall modified the milestones: v0.0.20, v0.0.21, v0.0.22 Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add support for `stcs` and `ldcs` intrinsics #42

[FEA] Add support for `stcs` and `ldcs` intrinsics #42

gmarkall commented Aug 9, 2024

gmarkall commented Aug 9, 2024

[FEA] Add support for __stcs and __ldcs intrinsics #42

[FEA] Add support for __stcs and __ldcs intrinsics #42

Comments

gmarkall commented Aug 9, 2024

gmarkall commented Aug 9, 2024

[FEA] Add support for `stcs` and `ldcs` intrinsics #42

[FEA] Add support for `stcs` and `ldcs` intrinsics #42