Experimental `std::simd` by fbusato · Pull Request #6732 · NVIDIA/cccl

fbusato · 2025-11-22T01:47:18Z

Motivations

Modern GPU architectures are increasingly exposing fine-grained, single-thread SIMD capabilities to maximize throughput within individual CUDA threads. While GPU programming model strongly focuses on the SIMT model, newer hardware relies on specialized SIMD operations to saturate execution units. Some examples include:

int16_t SIMD instructions DPX.
FADDx2, FMULx2, FMAx2.
Bfloat16x2 and Halfx2 intrinsics.
3-Input floating-point Minimum / Maximum.
Integer Add X3 (IADD3).
Integer dot product __dp4a.
SIMD Video Instructions, e.g. vabsdiff4.
SIMD within a register (SWAR) for integer types

C++26 std::simd provides a standardized abstraction to write vectorized code. This is a great opportunity to unify customized code to handle all variants and reduce CUDA software fragmentation. By adopting std::simd-like API, developers can write a single vectorized kernel that compiles to the optimal instructions for any GPU architecture.

PR Goals and Non-Goals

The PR aims to provide a basic implementation of std::simd and provide the foundation for future optimizations and extensions.

Advanced math and bit operations, e.g. std::abs , std::pow, std::popcount etc. , as well as std::complex binding, are outside the scope of the first PR.

Non-Goals:

Fully-implement std::simd.
Implement custom ABIs to target host vector instructions.

Implementation Notes

The implementation is based on the LLVM code experimental/__simd and extended to support the related C++ proposals:

Some optimizations are already exploited in the CCCL code, for example thread_simd.h and thread_reduce.h. They will gradually added to the implementation.

Partially address #30

miscco

This is not using SIMD on the host, is there any reason for that?

fbusato · 2025-11-24T17:05:03Z

because this is the first PR. Secondly, because we care more about GPU than CPU. Third, the feature is also experimental for other std libraries.

mhoemmen

Per offline discussion, Federico will first update to the latest draft N5032 ( https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/n5032.pdf ).

mhoemmen · 2025-12-18T18:04:18Z

cudax/include/cuda/experimental/__simd/declaration.h

+
+#include <cuda/std/__cccl/prologue.h>
+
+namespace cuda::experimental::datapar


WG21 adopted P3691R1 at the June 2025 Sofia meeting. This renamed the namespace to "simd" and renamed the data types basic_mask / mask and basic_vec / vec.

mhoemmen · 2025-12-18T20:25:00Z

cudax/include/cuda/experimental/__simd/declaration.h

+
+namespace cuda::experimental::datapar
+{
+namespace simd_abi


[simd] does not declare any public namespaces other than std::simd.

mhoemmen · 2025-12-18T20:27:28Z

cudax/include/cuda/experimental/__simd/declaration.h

+template <int _Np>
+using fixed_size = __fixed_size<_Np>;
+
+template <typename>
+using compatible = fixed_size<1>;
+
+template <typename>
+using native = fixed_size<1>;


native-abi is exposition-only (see [simd.expos.abi]). The others don't appear to be named at all in [simd].

mhoemmen · 2025-12-18T20:29:41Z

cudax/include/cuda/experimental/__simd/declaration.h

+template <typename _Tp, typename _Abi>
+class basic_simd;
+
+template <typename _Tp, int _Np>
+using simd = basic_simd<_Tp, simd_abi::fixed_size<_Np>>;


The synopsis declares basic_vec like this, where @_X_@ indicates italic code-font X (meaning that it's an exposition-only name).

template<class T, class Abi = @_native-abi_@<T>> class basic_vec; template<class T, @_simd-size-type_@ N = @_simd-size-v_@<T, @_native-abi_@<T>>> using vec = basic_vec<T, @_deduce-abi-t_@<T, N>>;

…rimental-simd

github-actions · 2026-03-31T22:25:49Z

😬 CI Workflow Results

🟥 Finished in 16m 54s: Pass: 12%/48 | Total: 5h 17m | Max: 16m 27s | Hits: 99%/1356

See results here.

fbusato · 2026-03-31T23:16:03Z

this PR has been replaced by

fbusato added 18 commits November 20, 2025 11:37

more operations

0b4ccfe

fixes

9e46a30

other fixes

dd6599a

add simd_mask

a917c30

remove explicit

443837a

follow the standard

f702a01

reduce redundancy

d1d7bec

headers and explicit usage

21bb217

add simd_mask generator

c2a9d1c

fix initialization

523a7c4

formatting

8bf8783

reduce redundancy

e5e67bb

working unit test

4ebcb5d

fix semantic

489d3f3

add simd::scalar

b66bf7e

use bool as mask_storage

fd23872

simplify __simd_reference

3b3a50e

header cleanup

5458cf7

fbusato self-assigned this Nov 22, 2025

fbusato requested a review from a team as a code owner November 22, 2025 01:47

fbusato added the 3.2.0 label Nov 22, 2025

fbusato added this to CCCL Nov 22, 2025

fbusato requested a review from ericniebler November 22, 2025 01:47

github-project-automation bot moved this to Todo in CCCL Nov 22, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Nov 22, 2025

This comment has been minimized.

Sign in to view

miscco reviewed Nov 24, 2025

View reviewed changes

fbusato added 2 commits November 24, 2025 15:00

fix c++17

f672929

fix MSVC warning

2b5d84e

fbusato added 3 commits November 24, 2025 15:06

formatting

e7d976c

fix macro names

4c2d8b8

fix count() signature

9e4ba7c

This comment has been minimized.

Sign in to view

mhoemmen suggested changes Dec 18, 2025

View reviewed changes

github-project-automation bot moved this from In Review to In Progress in CCCL Dec 18, 2025

Merge branch 'main' into experimental-simd

ebaef4a

This comment has been minimized.

Sign in to view

draft refactor

6d6be22

This comment has been minimized.

Sign in to view

add ctor constrains

47bdf7b

alliepiper removed the 3.2.0 label Jan 15, 2026

fbusato moved this from In Progress to Paused in CCCL Jan 27, 2026

fbusato and others added 2 commits March 25, 2026 11:36

Merge branch 'main' into experimental-simd

4d02350

Merge branch 'experimental-simd' of github.com:fbusato/cccl into expe…

7f878a8

…rimental-simd

This comment has been minimized.

Sign in to view

fbusato added 3 commits March 25, 2026 16:13

refactor to match std::simd

06020eb

implemented fixed_size_simple_mask

5e019b6

use fixed_size_simple ABI

4b324db

This comment has been minimized.

Sign in to view

implemented basic_vec

71d8840

fbusato moved this from Paused to In Progress in CCCL Mar 31, 2026

fbusato added 3 commits March 31, 2026 11:15

reduction implementation

48de365

load and store

6c1bf79

a few fixes

8b69b68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental `std::simd`#6732

Experimental `std::simd`#6732
fbusato wants to merge 35 commits intoNVIDIA:mainfrom
fbusato:experimental-simd

fbusato commented Nov 22, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

miscco left a comment

Uh oh!

fbusato commented Nov 24, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

mhoemmen left a comment

Uh oh!

mhoemmen Dec 18, 2025

Uh oh!

mhoemmen Dec 18, 2025

Uh oh!

mhoemmen Dec 18, 2025

Uh oh!

mhoemmen Dec 18, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Mar 31, 2026

Uh oh!

fbusato commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		#include <cuda/std/__cccl/prologue.h>

		namespace cuda::experimental::datapar

Conversation

fbusato commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivations

PR Goals and Non-Goals

Implementation Notes

Uh oh!

This comment has been minimized.

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

fbusato commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

mhoemmen left a comment

Choose a reason for hiding this comment

Uh oh!

mhoemmen Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

mhoemmen Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

mhoemmen Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

mhoemmen Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Mar 31, 2026

😬 CI Workflow Results

🟥 Finished in 16m 54s: Pass: 12%/48 | Total: 5h 17m | Max: 16m 27s | Hits: 99%/1356

Uh oh!

fbusato commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fbusato commented Nov 22, 2025 •

edited

Loading

fbusato commented Nov 24, 2025 •

edited

Loading

fbusato commented Mar 31, 2026 •

edited

Loading