Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify requirements for types in device code #733

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Pennycook
Copy link
Contributor

The goal of this PR is to clarify:

  • Which types are guaranteed to be available in device code; and
  • What guarantees exist for consistency of types across host and device.

As part of this change I removed the table of fundamental types, which duplicated a lot of information defined in C++. Just saying that we support the types with the same meaning as C++ seems simpler. I kept the list of types, but long-term I'd prefer to be able to say that SYCL supports all the C++ fundamental types and leave it at that.

Closes #372 and closes #373.


I can't assign this to you, @tahonermann, but since you originally opened these issues I'd appreciate it if you could take a look at this proposed fix.

I've opened it as a draft because it currently doesn't build, due to a broken reference to the table I removed. That reference will ve removed if we merge #730, though, so I wanted to open a PR and get some feedback.

@Pennycook Pennycook added the editorial Some purely editorial problem label Feb 18, 2025
@Pennycook Pennycook requested review from nliber and gmlueck February 18, 2025 15:48
@nliber
Copy link
Collaborator

nliber commented Feb 18, 2025

Still exhausted from the WG21 meeting, so hopefully this is coherent:

If a type is an extended integer type, is that guaranteed to be supported on both the host and device? Same question goes for extended floating point types.

Copy link

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Pennycook. Please see inline comments.

Comment on lines +235 to +236
All types which are available in device code must have the same size, alignment
requirements, and representation on both host and device.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enumeration types should be addressed in this section as well. There is an implementation requirement as well as a user requirement. For enumerations with a fixed underlying type (e.g., enum X : char), there should be a requirement that the same fixed type be specified for host and device code. Similarly, for enumerations without a fixed underlying type, there should be a requirement that the implementation select the same underlying type for host and device code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. I've added wording very similar to what you wrote here in f4373e2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates for enumeration types looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unresolving this conversation. It seems confusing to me to say that applications are required to choose the same underlying type for enumerations. Why state this specifically when there are other things an application could do that cause types to be different on host and device? For example, if an application defines members of a struct to be different on host and device, you would have a similar problem.

If we feel we need to say something about this, perhaps we could say it more generally like:

Applications that define types to be have different size, alignment requirement, or representation on host vs. device via the __SYCL_DEVICE_ONLY__ macro have undefined behavior.

Or maybe it should be "... implementation defined behavior"?

I think it is useful to specifically list __SYCL_DEVICE_ONLY__ here because that is the only way that an application possibly could define types differently. At least I can't think of any other way. You might point out that there are other macros like this in DPC++, but those are vendor extensions and are therefore out of scope of this spec.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to relax the requirement that (user specified) fixed underlying types must match so long as device copyability rules retain a requirement for the same (or at least a compatible) fixed underlying type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a general note like the one @gmlueck suggests would be useful for application developers, but I didn't think that was what @tahonermann was talking about here.

The wording I added in f4373e2 is in the "Device compiler" section, and I didn't intend it to be advice for application developers. The intent was to clarify that the implementation must guarantee that enumerations have matching types. The first sentence just says that enumerations with different fixed underlying types are invalid -- that means a device compiler (and more broadly, the implementation) shouldn't introduce any of those. The second sentence is probably the more important one, because it means that a device compiler may require specific logic to ensure that it selects the same underlying type as the host compiler if an enumeration has no fixed underlying type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regard to explicit fixed types, there is both an implementation and a requirement on application developers.

I expect device code restrictions to be relaxed over time to allow more use of the C and C++ standard libraries in device code. Device code can already link with device-only TUs via SYCL_EXTERNAL. Maintaining a single C and C++ standard library that works for both host and device code isn't particularly realistic, especially when the host implementations are provided by, e.g., GNU C, libstdc++, Microsoft, etc... This increases the likelihood that enumeration types defined in these libraries will be differently specified for the host and the device. This imposes a requirement on the SYCL implementation to ensure such enumeration types are consistently defined if objects of those types might transition the host/device divide. We can either require all such types to be consistently defined, or we can require specific ones to be consistently defined so that those specific types satisfy device copyability rules, or we can say its all implementation-defined and code that uses such types is non-portable.

Likewise, user-defined enumeration types must be consistently defined to be copied between the host and device. For those cases, we can require all such types to be consistently defined, or we can place the requirement in the device copyable restrictions (with or without a compatible size/alignment allowance and corresponding std::underlying_type observability).

@Pennycook
Copy link
Contributor Author

If a type is an extended integer type, is that guaranteed to be supported on both the host and device? Same question goes for extended floating point types.

I'm not very familiar with these extended types.

Wouldn't this be very difficult for us to guarantee? It looks like these types are implementation defined, so there might be types that are defined by "a C++ implementation" (i.e., a standard library or host compiler) but that can't be made compatible with "a SYCL implementation" (i.e., a device compiler) or a specific device.

Could we say something to the effect of: "Whether extended integer and/or extended floating-point types are available in device code is implementation-defined. However, if these types are available, the size, alignment, etc must match the host."

@tahonermann
Copy link

Could we say something to the effect of: "Whether extended integer and/or extended floating-point types are available in device code is implementation-defined. However, if these types are available, the size, alignment, etc must match the host."

That makes sense to me.

A list of types that cannot be used in device code is clearer than what we had
before, and will help to ensure that we give due consideration to additional
types introduced in future versions of ISO C++.
These are implementation-defined, so whether they are available in device code
should also be implementation-defined.
This gives us a place to talk about things like:

- Which C++ standard library features (e.g., type aliases) are guaranteed to
  work on the device; and

- Whether there are any additional restrictions on C++ standard library
  behavior.
There is only one long double type.

Co-authored-by: Tom Honermann <[email protected]>
@TApplencourt
Copy link
Contributor

Just to add a useless grain of salt, a few of our users are already using long-double on GPU. They just use -mlong-double-64 and hope for the best...

So I like the

Could we say something to the effect of: "Whether extended integer and/or extended floating-point types are available in device code is implementation-defined. However, if these types are available, the size, alignment, etc must match the host."

@Pennycook
Copy link
Contributor Author

Just to add a useless grain of salt, a few of our users are already using long-double on GPU. They just use -mlong-double-64 and hope for the best...

This is entirely valid under "undefined behavior". If an implementation wants to say that it supports long double, it can. If an implementation wants to say nothing but it happens to work, that's fine.

The reason I'm suggesting we don't go with "implementation-defined" is that all existing implementations would then have to update their documentation to say something about long double. We'd then merge a PR (hopefully soon) which adds something like aspect::fp128 which officially adds a way to query for long double support, and implementations would have to update their documentation again.

@TApplencourt
Copy link
Contributor

TApplencourt commented Feb 24, 2025

Oh yes, sorry, It was not clear. It was 100% comment to say that I liked your clarification of new wording to use "undefined behavior"!

This is nitpicking, but as you know, IEEE doesn't require long double to be 128 bits (but users can always check their long double size on the host and then check to correct aspect on the GPU, and if the device has it be happy because that mean (by some SYCL rules that I'm sure exist but I don't know -- triviality copyable?) they can use long-double on the GPU.

@nliber
Copy link
Collaborator

nliber commented Feb 24, 2025

It would be nice in the future if we could narrow down the possible behaviors. I can think of three possibilities:

  • Works
  • Doesn't compile
  • Random results

Is random results really a possibility?

Copy link

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, so I'm giving it my stamp of approval for whatever that is worth. Thanks @Pennycook!

@Pennycook
Copy link
Contributor Author

It would be nice in the future if we could narrow down the possible behaviors. I can think of three possibilities:

  • Works
  • Doesn't compile
  • Random results

Is random results really a possibility?

I might be wrong about this, but I convinced myself that random results could really be a possibility with existing SYCL 2020 implementations.

Prior to this change, we didn't say anything about long double at all, which means that there was no requirement for implementations to diagnose use of long double or to ensure that the host and device compilers agreed on what long double means. Some compilers provide options to control what long double means (e.g., clang provides -mlong-double-128, -mlong-double-64 and -mlong-double-80), so I'm worried it would be very easy to get in a situation where the host and device disagree about sizeof(long double) and things fail spectacularly.

I agree that we don't want this to be the behavior forever, but I think the right way to fix things will be via a KHR or SYCL-Next feature. There are a few things to work out (e.g., whether we'd need separate aspects for fp80 and fp128, how to query which is used if a device supports both...)

Comment on lines +235 to +236
All types which are available in device code must have the same size, alignment
requirements, and representation on both host and device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unresolving this conversation. It seems confusing to me to say that applications are required to choose the same underlying type for enumerations. Why state this specifically when there are other things an application could do that cause types to be different on host and device? For example, if an application defines members of a struct to be different on host and device, you would have a similar problem.

If we feel we need to say something about this, perhaps we could say it more generally like:

Applications that define types to be have different size, alignment requirement, or representation on host vs. device via the __SYCL_DEVICE_ONLY__ macro have undefined behavior.

Or maybe it should be "... implementation defined behavior"?

I think it is useful to specifically list __SYCL_DEVICE_ONLY__ here because that is the only way that an application possibly could define types differently. At least I can't think of any other way. You might point out that there are other macros like this in DPC++, but those are vendor extensions and are therefore out of scope of this spec.

@@ -204,6 +204,7 @@ Amongst other things, this restriction makes it illegal for a
However, a function may be defined in another translation unit if the
implementation defines the [code]#SYCL_EXTERNAL# macro as described in
<<subsec:syclexternal>>.
* Use of the [code]#long double# type results in undefined behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'd say:

Device code must not use the long double type.

Statements in this list are inconsistent. Some say that using a prohibited feature causes undefined behavior, while others say that application should not use the feature. It seems to me that we should consistently say that device code must not use these features.

You may argue that an extension might lift some of these restrictions. That's fine. An extension is allowed to lift any prohibition in the spec and assign some defined meaning. There is no need to use the "undefined behavior" wording to allow an extension to do this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had previously suggested that we address long double via #371. I think undefined behavior is the status quo.

Does "device code must not use these features" mean that such code is ill-formed? I think ISO standards reserve "must" for use in implication (A must be B because of C). I understand that other specifications use the term to specify requirements (e.g., RFC 2119), but I think its use can be confusing. I think "shall" communicates the intent better.

It wouldn't be a bad thing for the SYCL specification to explicitly state its requirement terminology; perhaps via reference to RFC 2119 or similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had previously suggested that we address long double via #371. I think undefined behavior is the status quo.

I still like the direction proposed in #371 -- that is that we make long double an optional device feature that is tied to some new aspect. However, that is probably something that needs to wait for SYCL-Next.

I think the status quo in DPC++ is that use of long double in device code is ill-formed. I think the compiler diagnoses a compile-time error for this. (Sorry, haven't checked lately, but this is my memory.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the status quo in DPC++ is that use of long double in device code is ill-formed. I think the compiler diagnoses a compile-time error for this. (Sorry, haven't checked lately, but this is my memory.)

I was curious about this so I just went to check.

By default, DPC++ throws an error because long double defaults to 128-bit, which isn't supported in SPIR-V:

test.cpp:10:51: error: 'max' requires 128 bit size 'long double' type support, but target 'spir64-unknown-unknown' does not support it
   10 |         ld[0] = std::numeric_limits<long double>::max();

You can circumvent that check by adding the -mlong-double-64 flag. In other words, DPC++ doesn't check for long double, it just fails if you try to use 128- or 80-bit data types.

SimSYCL also doesn't do any checks for use of long double, and because it's a header-only implementation I can't immediately think of a way they could add these checks. It's also worth noting that device code using long double always works in SimSYCL, because the code is running on the host CPU.

So I think the status quo is undefined behavior. Sometimes it's ill-formed, sometimes you can make it work (by accident), and sometimes it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can circumvent that check by adding the -mlong-double-64 flag. In other words, DPC++ doesn't check for long double, it just fails if you try to use 128- or 80-bit data types.

Yes, and some people are Argonne is using this flag in production

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. What SPIR-V does DPC++ generate if you compile with -mlong-double-64?

It just treats every instance of long double as if it were double, so everything is generated using OpTypeFloat 64.

I'm not convinced it works properly, because all of the tests I've tried so far have returned nan. But the code compiles without producing any errors or warnings, and runs without producing any errors or warnings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

Yes, and some people are Argonne is using this flag in production

Why? Is it just because you have existing code that uses long double and it's easier to pass the flag than to rewrite the code to use double?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It begs the question why it even works, and why they long double in the first place, but 🤷🏽

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Argonne needs it because they are using user-defined literals (UDLs). A request for UDL support is what lead me to file #371 in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, there may be a cleaner solution now that we have changed the spec to allow long double in manifestly constant-evaluated expressions. As I noted in #379, I think an application can define the the UDL operator"" as consteval, and this will cause the evaluation of UDL's to happen at compile time as a constant expression. SYCL will allow this even in device code because we lifted the restriction about long double (and all other "forbidden" C++ features) when they are used in manifestly constant-evaluated expressions.

Of course, consteval requires the application to compile in C++20 mode.

@@ -216,132 +217,25 @@ The restriction waiver in <<discarded-statement>> or
<<device-function>>.
====

[[subsec:scalartypes]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the latest version of the "main" branch has text that refers to this section identifier. (This was part of the recent vec clarification.) You will need to reword that new text somehow to avoid the reference.

@gmlueck gmlueck removed the editorial Some purely editorial problem label Feb 27, 2025

For enumerations with a fixed underlying type, the same fixed type must be
specified for host and device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment here might have been confusing. I was referring to this sentence "For enumerations with a fixed underlying type, the same fixed type must be specified for host and device."

This sentence seems to be directed to application developers. If an enumeration has a fixed underlying type, it means the source code explicitly specifies the type. Therefore, there's no possibility that the device compiler might choose some other type -- it has to use the type specified in the application source code.

Therefore, I think this sentence must be telling application developers not to use __SYCL_DEVICE_ONLY__ to specify a different underlying type in host vs. device code. This seems weirdly specific because there are other ways an application can use that macro to cause types to have different representations on host vs. device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from.

I think the issue I'm having is that I'm not sure where the device compiler ends and where the implementation begins. There is a lot of source code in headers that is not part of an application, and the device compiler may be responsible for providing the device-compatible parts of the SYCL headers. It was also recommended above that we should add a new "C++ standard library support" subsection to the "Device compiler" section, suggesting that it's the device compiler that handles any required mapping of C++ standard library features from host-compatible ones to device-compatible ones.

My logic was that if the device compiler might be (re)defining types (as part of the SYCL headers or additions to standard C++ headers), then unless we say something here an implementation may assume that it is safe to define device enums with a different underlying fixed type to the equivalent host enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should remove the chapter "SYCL device compiler" and merge it into "SYCL programming interface". The spec should not mandate that there is a distinct "device compiler". Some implementations (using multi pass compiler) might have a "device compiler", but other implementations might not.

Therefore, I think the spec should not try to draw a distinction between "device compiler" and "implementation". If an implementation has a device compiler, it's just part of the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100%! I can never figure out where to introduce new sections in the specification, because I don't understand where the boundaries are drawn. Merging all the implementation stuff into one section would be great.

But without a boundary between the device compiler and implementation, I think what I said holds: the implementation includes some headers, and those headers might not (probably won't) be completely identical for the host and device. Do you agree that the specification needs to point out which parts of the headers are required to be identical? If so, why wouldn't we extend this to the underlying types of enums?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of some specific situation you want to prevent by adding this wording to the spec? That would help me understand the purpose of this text.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps my recent response here would have been more appropriate as a reply to this comment.

The C++ standard doesn't specify many enumeration types for the standard library. Where the C++ standard does specify an enumeration and doesn't explicitly specify an underlying type, I think implementors are free to specify a fixed underlying type if they desire. That is one way that type definitions might diverge.

SYCL 2020 has exactly one match for the term "standard library" and that is a reference to the C++ standard library in a note about differences from SYCL 1.2.1. I guess it is currently undefined which portions of the C++ standard library may be used in device code. Some portions can be inferred as required to be available in device code based on usage in SYCL interfaces (e.g., std::vector). It seems like a lot of work would be required to specify what is actually required to be usable in device code.

A strong +1 for excising the term "device compiler" from the SYCL specification :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants