Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define new tag for atomic ABI #385

Merged
merged 2 commits into from
Sep 6, 2023
Merged

Define new tag for atomic ABI #385

merged 2 commits into from
Sep 6, 2023

Conversation

kito-cheng
Copy link
Collaborator

@kito-cheng kito-cheng commented May 24, 2023

Alternative version of #383

Defined 4 atomic abi variant.

  • unknown, nonatomic, A6C and A6S

Unknown is compatilbe with exising object file.
Nonatomic is used when A-extension is absent
A6C is the table A.6 mapping defined in ISA spec.
And the A6S is the mapping defined in psABI spec.

riscv-elf.adoc Outdated Show resolved Hide resolved
Copy link

@enh-google enh-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a definition of "6c" and "6s" to go with the "tso" definition?

@jrtc27
Copy link
Collaborator

jrtc27 commented May 24, 2023

Do we want this in e_flags or do we want this in .riscv.attributes? TSO is there, but that's because it predates .riscv.attributes (and the existence of the psABI TG). I don't see a good reason why it needs to be in e_flags.

@kito-cheng
Copy link
Collaborator Author

Do we want this in e_flags or do we want this in .riscv.attributes? TSO is there, but that's because it predates .riscv.attributes (and the existence of the psABI TG). I don't see a good reason why it needs to be in e_flags.

Good question, the e_flags should be allocated if we want to check that easier/faster during the dynamic linker, but atomic ABI should be checked at the static linker stage, so yes, that could be using .riscv.attributes to handle that, I guess I intuitively used e_flags because Palmer used that, let me update that.

@kito-cheng
Copy link
Collaborator Author

@enh-google

do we need a definition of "6c" and "6s" to go with the "tso" definition?

I would like defer this to @hboehm, but I think we might need also include the TSO mapping into the psABI spec.

@jrtc27
Copy link
Collaborator

jrtc27 commented Jun 9, 2023

Why are you defining e_flags bits and an attribute? The whole point of the attribute is to not use up flags bits for this minor thing.

@kito-cheng kito-cheng changed the title Define atomic ABI field in e_flags Define new tag for atomic ABI Jun 9, 2023
@kito-cheng
Copy link
Collaborator Author

Changes:

  • Define new attribute rather than new flag in e_flags
  • Include one more atomic ABI: nonatomic, should be only used when A-extension is absent.

@kito-cheng
Copy link
Collaborator Author

@jrtc27 my bad, I am not intend to define both, but just git operation error, will fix soon

@kito-cheng
Copy link
Collaborator Author

Changes:

  • Remove e_flags part, which should remove when introduce attribute.

@hboehm
Copy link
Contributor

hboehm commented Jun 9, 2023

I'm not sure what NON_ATOMIC_ABI means. Possible interpretations:

  1. The source code uses no atomic operations. I would prefer to just claim compatibility in that case.
  2. Atomics are somehow implemented with kernel support, Linux armv5 style. I don't see how that could be made compatible with anything else. And we would have to define that ABI to make it compatible with itself.
  3. Atomics are implemented in terms of locks, which are implemented with just loads and stores an fences. Same comment as (2).
  4. The code only works in a single-threaded executable. I don't think it makes sense to define such an ABI anymore.

My preference would be just to have two bits, defined as follows, using roughly kito-cheng's terminology:

  1. This object is compatible with the A6 classical atomic mappings, as defined by table A.6 in <>.
    Uses mappings from that table, or mappings from <<Mappings from C/C++ primitives to RISC-V primitives>> that do not specify Note 3.
  2. This object is compatible with all mappings defined in <<Mappings from C/C++ primitives to RISC-V primitives>>. May use any mapping from that table.

UNKNOWN_ATOMIC_ABI would correspond to both bits clear. We would probably make that compatible with itself, though that's not guaranteed to be valid.

Any implementation that uses the recommended, non-Note-3 mappings would have both bits set, and would be compatible with anything other than UNKNOWN. An A.6 implementation would only set bit 1. A future A.7 implementation would set only bit 2.

In an ideal world, the linker check would be to and all bit pairs together, and check for a nonzero value, indicating that all objects are compatible with one of the two conventions. As a concession to practice, we would also accept all 0 "UNKNOWN" values, on the optimistic assumption that they all use the same "UNKNOWN" convention.

Interpretation (1) of NON_ATOMIC_ABI above would map to both bits (vacuously) set. The others would map to UNKNOWN (0).

I cannot think of a kind of system that would (a) really care about ABI compatibility, but (b) not support the A extension. Getting a conforming and practical C or C++ atomics implementation on a multicore processor without the A extension seems somewhere between very difficult and impossible.

Copy link
Contributor

@ilovepi ilovepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @kito-cheng . I've left a couple suggestions and a question about compatibility.

riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated
|===
| Value | Symbolic Name | Compatibility | Description

| 0 | UNKNOWN_ATOMIC_ABI | Compatible with all other value. | This object use unknown atomic ABI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be compatible w/ all other values? Anything that predates support for this tag will fall into this category, correct? So that would be the current LLVM behavior(A6C), the old GCC behavior(some kind of broken), and the new GCC behavior(A6S), right?

These compatibility concerns have been brought up a few times, so maybe it would be good to outline the choices we've arrived at in a note with our rationale?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is right, That's good point, we should have some note to mention that!

@kito-cheng
Copy link
Collaborator Author

@hboehm let me explain my thought behind the design, and start from the 0/UNKNOWN_ATOMIC_ABI:

The 0 value is kind of special value in RISC-V attribute, object without define this tag will treat as define this tag as 0, which is little bit implementation detail, but it's de facto on both LLVM and GNU toolchain, and mostly this spec has rely on this de facto, I think we should have more word to describe this on the spec.

Then what's NON_ATOMIC_ABI? the 4 is my target for this value, and why I define that is because RISC-V is a highly configurable ISA, so we could have a configuration without atomic instruction (e.g. -march=rv32imc), and that's the target user of this value/atomic ABI, and this configuration is existing in embedded world, but it's not fit in either A6C or A6S, and using UNKNOWN_ATOMIC_ABI might also misleading - since this could mixed with other situations like @ilovepi listed, so I think we need a NON_ATOMIC_ABI to present this; I agree a system without A-ext won't care about the atomic ABI, but this could help let linker has some guidance to prevent those stuffs beginning mixed, and that's the issue I face during handle multi-lib issue for bare-metal GNU toolchain.

So why not just using two bits? NON_ATOMIC_ABI and UNKNOWN_ATOMIC_ABI will fail into same category in two bits scheme, and that's what I want to prevent, also I could imagine two bit scheme would be very cheap on checking compatibly - but this mostly happened on static link time, and might be happened on dynamic link time, but it will check once at start time, so I would like to keep the extensibility here, and my understanding is this scheme should have the expressiveness of the two-bit solution and can be extended even further, that's the reason why I didn't use bit vector scheme.


And I would like to prevent mention A7 mapping at this moment in spec, it's stable for a while, but not ratified, so...I would be conservatively on this, since everything could be changed before ratified :P

@hboehm
Copy link
Contributor

hboehm commented Jun 12, 2023

Thanks for the clarification on Tag 0. That now makes sense to me.

On the number of different attribute values:
Concretely, I think that we really need to be able to distinguish three flavors of A6C and A6S compliance, in addition to the UNKNOWN/none option we have:

a) A6C compliant, but incompatible with A6S executable that uses the Note 3 mappings (e.g. that use the A.7 mappings).
b) A6C and A6S compliant (and hence also A.7 compatible)
c) A6S compatible, but not A6C compatible (i.e. uses the Note 3 / A.7 mappings).

The current gcc mapping should be described as (b), so that it can be linked against either A.6 classic (a) or A.7 (c) object files. Future A6S executables that use the Note 3 mappings should be described as (c), to prevent linkage against A.6 classic files. Currently, I don't think we have a way to make that necessary distinction.

Not all of the Note 3 mappings require new hardware. Thus this distinction is likely to be relevant soon for code compiled on platforms that never supported A6C.

NON_ATOMIC_ABI:
To what extent do embedded processors without the A extension rely on a standard ABI? I always thought of micro-controller code as being sufficiently self-contained that standard ABIs are much less of a concern.
My concern is that currently I have no idea what the NON_ATOMIC ABI looks like. Which one of the 4 options at the top of my first message are used? How is x.fetch_add(1) or flag.test_and_set() implemented? I don't think that implementing fetch_add as simply load; add; store is neither thread- not interrupt-/signal-safe. So it doesn't strike me as a useful implementation of atomics.

Thus I still feel that we need another value to identify "both A6C and A6S compatible", but I'm not yet convinced that NON_ATOMIC is useful. If it is, I would really like to see a more precise definition.

@jrtc27
Copy link
Collaborator

jrtc27 commented Jun 12, 2023

What's the point of 0? Can that not just be encoded as "not present", given that is how every binary to date is?

@jrtc27
Copy link
Collaborator

jrtc27 commented Jun 12, 2023

Thanks for the clarification on Tag 0. That now makes sense to me.

On the number of different attribute values: Concretely, I think that we really need to be able to distinguish three flavors of A6C and A6S compliance, in addition to the UNKNOWN/none option we have:

a) A6C compliant, but incompatible with A6S executable that uses the Note 3 mappings (e.g. that use the A.7 mappings). b) A6C and A6S compliant (and hence also A.7 compatible) c) A6S compatible, but not A6C compatible (i.e. uses the Note 3 / A.7 mappings).

The current gcc mapping should be described as (b), so that it can be linked against either A.6 classic (a) or A.7 (c) object files. Future A6S executables that use the Note 3 mappings should be described as (c), to prevent linkage against A.6 classic files. Currently, I don't think we have a way to make that necessary distinction.

Not all of the Note 3 mappings require new hardware. Thus this distinction is likely to be relevant soon for code compiled on platforms that never supported A6C.

NON_ATOMIC_ABI: To what extent do embedded processors without the A extension rely on a standard ABI? I always thought of micro-controller code as being sufficiently self-contained that standard ABIs are much less of a concern. My concern is that currently I have no idea what the NON_ATOMIC ABI looks like. Which one of the 4 options at the top of my first message are used? How is x.fetch_add(1) or flag.test_and_set() implemented? I don't think that implementing fetch_add as simply load; add; store is neither thread- not interrupt-/signal-safe. So it doesn't strike me as a useful implementation of atomics.

Thus I still feel that we need another value to identify "both A6C and A6S compatible", but I'm not yet convinced that NON_ATOMIC is useful. If it is, I would really like to see a more precise definition.

Anything that's RMW will be a libcall, so there's no memory model baked into the binary there. The only issue is that I think atomic loads and stores that just need fences will not use a libcall and just emit the fences directly, given the fences are RVI (which is a bit weird... but they do have a purpose outside of atomics). So non-A code still needs to encode what it's doing.

@hboehm
Copy link
Contributor

hboehm commented Jun 12, 2023

It seems to me that we would need a specification for those library calls someplace?

The assumption is that those library calls internally use some combination of locks/kernel support/vendor-specific hardware magic? (Or presumably just the last two where lock-freedom is required by language standards.)

I still don't see how to make that work on a somewhat vanilla multiprocessor while using ordinary stores as atomic stores. How do you ensure atomicity between a fetch_add library call and a store instruction, making sure that the concurrent store isn't seen between the load and the store of the fetch_add? Certainly that doesn't work for a user-space lock-based implementation.

E.g. with (x initially zero), how do you preclude

Thread 1: x.store(5);
Thread 2: r = x.fetch_add(1);

with final outcome r == 1 && x == 1, thus "losing" the store?

Can you point to a working example for this sort of thing?

Normal lock-based C++ atomics implementations also lock for load/store. Aside from atomic_flag/atomic_signed_lock_free/atomic_unsigned_lock_free, that's fine. But I think you also need more magic of some sort to support those three without the A extension.

@kito-cheng
Copy link
Collaborator Author

@hboehm

It seems to me that we would need a specification for those library calls someplace?

The assumption is that those library calls internally use some combination of locks/kernel support/vendor-specific hardware magic? (Or presumably just the last two where lock-freedom is required by language standards.)

I still don't see how to make that work on a somewhat vanilla multiprocessor while using ordinary stores as atomic stores. How do you ensure atomicity between a fetch_add library call and a store instruction, making sure that the concurrent store isn't seen between the load and the store of the fetch_add? Certainly that doesn't work for a user-space lock-based implementation.

Yes, I agree with that, we should add a spec for those library calls IF we going to define that as NON_ATOMIC_ABI, but I realized that is kind of broken for existing non-A code gen, because we are kind mixing two flavor now - expanding atomic load/store to ordinary load/store, and expanding other atomic operation to library calls, so here is two options for this case IMO: 1) expand all atomic operations to ordinary operation, including fetch_add, expand it to load, add and store, 2) expand all atomic operations to library call, including atomic_load/store.

First option means NO atomic at all, which is used for single processor code gen, that should be the scenario of non-A.
And the second one could be used to those system which used system has provide some syscall doing that, but honestly I don't think it's useful RISC-V world since RISC-V has A-extension.

So IMO we should define NON_ATOMIC_ABI as expand all atomic operation as ordinary operation, but this is an incompatible change again, so we need more inputs for this.

cc. @asb @patrick-rivos

@asb
Copy link
Collaborator

asb commented Jul 7, 2023

Lowering atomic loads/stores to load/store+fences while lowering other atomic operations to libcalls is a gcc issue. We don't do that in LLVM (see llvm/test/CodeGen/RISCV/atomic-load-store.ll). We do as of LLVM 16 support a forced-atomics option which will lower atomic load/store to load/store+fence and lower other atomic operations to __sync_* libcalls.

@kito-cheng
Copy link
Collaborator Author

@asb Thanks for the info, seems LLVM is going to expand all atomic operations to library call, then I think might be reasonable to rename that into SOFT_ATOMIC_ABI, fix gcc land and document that down the library functions into ABI.

@patrick-rivos
Copy link
Contributor

Fixing GCC to only emit library calls or ordinary ops & __sync_* libcalls makes sense.

Afterwards, we'll have 2 cases:

  1. Only library calls (LLVM binaries)
  2. Ordinary operations & __sync_* libcalls (LLVM's forced-atomics binaries)

@asb @kito-cheng These two cases are incompatible, so how do we represent this in the ABI?

@hboehm
Copy link
Contributor

hboehm commented Jul 7, 2023

I found the last section in https://llvm.org/docs/Atomics.html. They give two example uses of --forced-atomics, the first one of which doesn't apply to RISC-V. The second one applies only to uniprocessors, and has the _sync primitives implemented as user-level functions known to the kernel, so that they can be restarted from the beginning on a context switch. That's a well-known technique (dating back to at least a 1992 paper by Bershad et al). I can't think of another context in which library functions could be implemented like this while maintaining atomicity with respect to load and store instructions. The question in my mind is whether it's worth supporting a uniprocessor-only convention in 2023. In my context, if you want to be able to link code to something else, you want to be able to link it to code runs on multiple cores. Android has been removing old uniprocessor optimizations since about 2015.

I don't understand why "only library calls" would require a separate tag here. Such code is compatible with any other atomics ABI under consideration (so long as the library implementing the _sync primitives also is). It's essentially code that just doesn't use atomics, and is thus ABI-independent in this respect.

Expanding all atomic operations to non-atomic instruction sequences takes us from a single HART world to one in which there is only a single thread and there are no asynchronous signals or interrupts. I don't think any of those are required by the C standard, so you could probably have a conforming implementation under those conditions. But that seems even less interesting to me than the uniprocessor case. Certainly atomics support in this scenario would be useful only if the code is also used on a platform that actually supports concurrency.

It seems to me that if we really want NON_ATOMIC_ABI at all, it should target the single HART, restartable code sequences implementation. I'm not at all convinced that's warranted, but on very low-end embedded devices, I think it makes more sense than the others.

@patrick-rivos
Copy link
Contributor

I don't understand why "only library calls" would require a separate tag here. Such code is compatible with any other atomics ABI under consideration (so long as the library implementing the _sync primitives also is). It's essentially code that just doesn't use atomics, and is thus ABI-independent in this respect.

IIUC, "only library calls" isn't guaranteed to be compatible unless it only uses __sync_* calls (guaranteed to be lock-free). Otherwise there could be a lock-based implementation that gets mixed with regular loads/stores (from the load/store & __sync_* case).

It's mentioned here: https://llvm.org/docs/Atomics.html

Code using +forced-atomics is ABI-incompatible with code not using the feature, if atomic variables cross the ABI boundary.

If it doesn't make sense to support this case in the psABI, then my point is moot :)

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 7, 2023

https://reviews.llvm.org/D130621 is the context for why this was added to LLVM. If it does not make sense then please take it up with the author of the patch.

@asb
Copy link
Collaborator

asb commented Jul 7, 2023

The question in my mind is whether it's worth supporting a uniprocessor-only convention in 2023. In my context, if you want to be able to link code to something else, you want to be able to link it to code runs on multiple cores. Android has been removing old uniprocessor optimizations since about 2015

forced-atomics is a niche feature that you'd definitely never use on a modern application class core, and yes, only makes any sense on an embedded uniprocessor system as far as I can see.

In terms of whether this needs representing in ELF attribute in some way...I've got no objection, but I'm also not sure it's a worthwhile endeavour to try to protect people from every conceivable case of linking together code that shouldn't be linked. Handling cases people are likely to run into on Linux-capable systems and getting the low-hanging fruit for common cases on embedded targets (mismatched ABIs or -march) seems sensible, but perhaps it's diminishing returns beyond that.

@hboehm
Copy link
Contributor

hboehm commented Jul 7, 2023

I don't understand why "only library calls" would require a separate tag here. Such code is compatible with any other atomics ABI under consideration (so long as the library implementing the _sync primitives also is). It's essentially code that just doesn't use atomics, and is thus ABI-independent in this respect.

IIUC, "only library calls" isn't guaranteed to be compatible unless it only uses __sync_* calls (guaranteed to be lock-free). Otherwise there could be a lock-based implementation that gets mixed with regular loads/stores (from the load/store & __sync_* case).

That seems to depend on how you read this? I read it as "the compiler calls _sync functions, which are implemented by a user-provided library". With that reasoning, compatibility depends on whether the _sync library is compatible with the other ABI. Which presumably should be indicated by the tag associated with the _sync library. I don't really see the difference between this and the user code explicitly calling into a library that implements atomics.

It's mentioned here: https://llvm.org/docs/Atomics.html

Code using +forced-atomics is ABI-incompatible with code not using the feature, if atomic variables cross the ABI boundary.

If it doesn't make sense to support this case in the psABI, then my point is moot :)

@hboehm
Copy link
Contributor

hboehm commented Jul 7, 2023

https://reviews.llvm.org/D130621 is the context for why this was added to LLVM. If it does not make sense then please take it up with the author of the patch.

I think it makes sense, but only if the _sync functions are atomic with respect to ordinary load and store instructions. And I know of only one plausible implementation that satisfies that constraint. And that implementations is only correct on uniprocessors.

So it sounds like we're kind of, generally, in agreement that the question here is "Is that niche large enough to support?"

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 7, 2023

https://reviews.llvm.org/D130621 is the context for why this was added to LLVM. If it does not make sense then please take it up with the author of the patch.

I think it makes sense, but only if the _sync functions are atomic with respect to ordinary load and store instructions. And I know of only one plausible implementation that satisfies that constraint. And that implementations is only correct on uniprocessors.

So it sounds like we're kind of, generally, in agreement that the question here is "Is that niche large enough to support?"

@nikic?

@patrick-rivos
Copy link
Contributor

That seems to depend on how you read this? I read it as "the compiler calls _sync functions, which are implemented by a user-provided library". With that reasoning, compatibility depends on whether the _sync library is compatible with the other ABI. Which presumably should be indicated by the tag associated with the _sync library. I don't really see the difference between this and the user code explicitly calling into a library that implements atomics.

Loads and stores would hopefully be implemented as specified by our ABI, i.e. with fences on both sides.

Ah. I had it in my mind that __atomic_* library calls could depend on a globally shared lock or other shenanigans. I forgot that the library will have its own tag that could be checked against other library's tags. Since we can perform those checks with the existing bits, I don't have concerns about the __atomic_* vs __sync_* cases anymore. Sorry for the noise :)

@kito-cheng
Copy link
Collaborator Author

Changes:

  • Drop NON_ATOMIC_ABI

@kito-cheng
Copy link
Collaborator Author

I think we do need atomic ABI for non-A configuration, but their is no spec yet, so let drop that and define that once we have spec for that.

riscv-elf.adoc Outdated Show resolved Hide resolved
@kito-cheng
Copy link
Collaborator Author

Changes:

  • Adjust merge rule, made it more clearly on the merge result
  • Add note for the program without A ext.

@kito-cheng
Copy link
Collaborator Author

Changes:

  • Added A7_ATOMIC_ABI

riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
@kito-cheng
Copy link
Collaborator Author

Changes:

  • Suffix _ATOMIC_ABI dropped.
  • Applied several suggestion from @hboehm

@hboehm
Copy link
Contributor

hboehm commented Aug 7, 2023

Looks good to me. Thanks!

riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
@kito-cheng
Copy link
Collaborator Author

binutils PoC: kito-cheng/riscv-binutils-gdb@3020782

@kito-cheng
Copy link
Collaborator Author

Changes:

  • Rebase and squash patches.

Alternative version of #383

Defined 4 atomic abi variant.

- unknown, nonatomic, A6C and A6S

Unknown is compatilbe with exising object file.
Nonatomic is used when A-extension is absent
A6C is the table A.6 mapping defined in ISA spec.
And the A6S is the mapping defined in psABI spec.
@kito-cheng
Copy link
Collaborator Author

Gonna merge because we reached a consensus and have a PoC now.

@kito-cheng kito-cheng merged commit 2d77081 into master Sep 6, 2023
3 checks passed
@kito-cheng kito-cheng deleted the atomic-abi branch September 6, 2023 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants