Skip to content

Experimental std::simd#6732

Open
fbusato wants to merge 35 commits intoNVIDIA:mainfrom
fbusato:experimental-simd
Open

Experimental std::simd#6732
fbusato wants to merge 35 commits intoNVIDIA:mainfrom
fbusato:experimental-simd

Conversation

@fbusato
Copy link
Copy Markdown
Contributor

@fbusato fbusato commented Nov 22, 2025

Motivations

Modern GPU architectures are increasingly exposing fine-grained, single-thread SIMD capabilities to maximize throughput within individual CUDA threads. While GPU programming model strongly focuses on the SIMT model, newer hardware relies on specialized SIMD operations to saturate execution units. Some examples include:

C++26 std::simd provides a standardized abstraction to write vectorized code. This is a great opportunity to unify customized code to handle all variants and reduce CUDA software fragmentation. By adopting std::simd-like API, developers can write a single vectorized kernel that compiles to the optimal instructions for any GPU architecture.

PR Goals and Non-Goals

The PR aims to provide a basic implementation of std::simd and provide the foundation for future optimizations and extensions.

Advanced math and bit operations, e.g. std::abs , std::pow, std::popcount etc. , as well as std::complex binding, are outside the scope of the first PR.

Non-Goals:

  • Fully-implement std::simd.
  • Implement custom ABIs to target host vector instructions.

Implementation Notes

The implementation is based on the LLVM code experimental/__simd and extended to support the related C++ proposals:

Some optimizations are already exploited in the CCCL code, for example thread_simd.h and thread_reduce.h. They will gradually added to the implementation.

Partially address #30

@fbusato fbusato self-assigned this Nov 22, 2025
@fbusato fbusato requested a review from a team as a code owner November 22, 2025 01:47
@fbusato fbusato added the 3.2.0 label Nov 22, 2025
@fbusato fbusato added this to CCCL Nov 22, 2025
@fbusato fbusato requested a review from ericniebler November 22, 2025 01:47
@github-project-automation github-project-automation bot moved this to Todo in CCCL Nov 22, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Nov 22, 2025
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not using SIMD on the host, is there any reason for that?

@fbusato
Copy link
Copy Markdown
Contributor Author

fbusato commented Nov 24, 2025

because this is the first PR. Secondly, because we care more about GPU than CPU. Third, the feature is also experimental for other std libraries.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, Federico will first update to the latest draft N5032 ( https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/n5032.pdf ).


#include <cuda/std/__cccl/prologue.h>

namespace cuda::experimental::datapar
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WG21 adopted P3691R1 at the June 2025 Sofia meeting. This renamed the namespace to "simd" and renamed the data types basic_mask / mask and basic_vec / vec.


namespace cuda::experimental::datapar
{
namespace simd_abi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[simd] does not declare any public namespaces other than std::simd.

Comment on lines +37 to +44
template <int _Np>
using fixed_size = __fixed_size<_Np>;

template <typename>
using compatible = fixed_size<1>;

template <typename>
using native = fixed_size<1>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native-abi is exposition-only (see [simd.expos.abi]). The others don't appear to be named at all in [simd].

Comment on lines +59 to +63
template <typename _Tp, typename _Abi>
class basic_simd;

template <typename _Tp, int _Np>
using simd = basic_simd<_Tp, simd_abi::fixed_size<_Np>>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synopsis declares basic_vec like this, where @_X_@ indicates italic code-font X (meaning that it's an exposition-only name).

template<class T, class Abi = @_native-abi_@<T>> class basic_vec;
  template<class T, @_simd-size-type_@ N = @_simd-size-v_@<T, @_native-abi_@<T>>>
    using vec = basic_vec<T, @_deduce-abi-t_@<T, N>>;

@github-project-automation github-project-automation bot moved this from In Review to In Progress in CCCL Dec 18, 2025
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@alliepiper alliepiper removed the 3.2.0 label Jan 15, 2026
@fbusato fbusato moved this from In Progress to Paused in CCCL Jan 27, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@fbusato fbusato moved this from Paused to In Progress in CCCL Mar 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 16m 54s: Pass: 12%/48 | Total: 5h 17m | Max: 16m 27s | Hits: 99%/1356

See results here.

@fbusato
Copy link
Copy Markdown
Contributor Author

fbusato commented Mar 31, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants