-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define new tag for atomic ABI #385
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a definition of "6c" and "6s" to go with the "tso" definition?
Do we want this in e_flags or do we want this in .riscv.attributes? TSO is there, but that's because it predates .riscv.attributes (and the existence of the psABI TG). I don't see a good reason why it needs to be in e_flags. |
Good question, the e_flags should be allocated if we want to check that easier/faster during the dynamic linker, but atomic ABI should be checked at the static linker stage, so yes, that could be using .riscv.attributes to handle that, I guess I intuitively used e_flags because Palmer used that, let me update that. |
I would like defer this to @hboehm, but I think we might need also include the TSO mapping into the psABI spec. |
Why are you defining e_flags bits and an attribute? The whole point of the attribute is to not use up flags bits for this minor thing. |
Changes:
|
@jrtc27 my bad, I am not intend to define both, but just git operation error, will fix soon |
Changes:
|
I'm not sure what NON_ATOMIC_ABI means. Possible interpretations:
My preference would be just to have two bits, defined as follows, using roughly kito-cheng's terminology:
UNKNOWN_ATOMIC_ABI would correspond to both bits clear. We would probably make that compatible with itself, though that's not guaranteed to be valid. Any implementation that uses the recommended, non-Note-3 mappings would have both bits set, and would be compatible with anything other than UNKNOWN. An A.6 implementation would only set bit 1. A future A.7 implementation would set only bit 2. In an ideal world, the linker check would be to and all bit pairs together, and check for a nonzero value, indicating that all objects are compatible with one of the two conventions. As a concession to practice, we would also accept all 0 "UNKNOWN" values, on the optimistic assumption that they all use the same "UNKNOWN" convention. Interpretation (1) of NON_ATOMIC_ABI above would map to both bits (vacuously) set. The others would map to UNKNOWN (0). I cannot think of a kind of system that would (a) really care about ABI compatibility, but (b) not support the A extension. Getting a conforming and practical C or C++ atomics implementation on a multicore processor without the A extension seems somewhere between very difficult and impossible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @kito-cheng . I've left a couple suggestions and a question about compatibility.
riscv-elf.adoc
Outdated
|=== | ||
| Value | Symbolic Name | Compatibility | Description | ||
|
||
| 0 | UNKNOWN_ATOMIC_ABI | Compatible with all other value. | This object use unknown atomic ABI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be compatible w/ all other values? Anything that predates support for this tag will fall into this category, correct? So that would be the current LLVM behavior(A6C), the old GCC behavior(some kind of broken), and the new GCC behavior(A6S), right?
These compatibility concerns have been brought up a few times, so maybe it would be good to outline the choices we've arrived at in a note with our rationale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is right, That's good point, we should have some note to mention that!
@hboehm let me explain my thought behind the design, and start from the 0/ The 0 value is kind of special value in RISC-V attribute, object without define this tag will treat as define this tag as 0, which is little bit implementation detail, but it's de facto on both LLVM and GNU toolchain, and mostly this spec has rely on this de facto, I think we should have more word to describe this on the spec. Then what's NON_ATOMIC_ABI? the 4 is my target for this value, and why I define that is because RISC-V is a highly configurable ISA, so we could have a configuration without atomic instruction (e.g. -march=rv32imc), and that's the target user of this value/atomic ABI, and this configuration is existing in embedded world, but it's not fit in either A6C or A6S, and using So why not just using two bits? And I would like to prevent mention A7 mapping at this moment in spec, it's stable for a while, but not ratified, so...I would be conservatively on this, since everything could be changed before ratified :P |
Thanks for the clarification on Tag 0. That now makes sense to me. On the number of different attribute values: a) A6C compliant, but incompatible with A6S executable that uses the Note 3 mappings (e.g. that use the A.7 mappings). The current gcc mapping should be described as (b), so that it can be linked against either A.6 classic (a) or A.7 (c) object files. Future A6S executables that use the Note 3 mappings should be described as (c), to prevent linkage against A.6 classic files. Currently, I don't think we have a way to make that necessary distinction. Not all of the Note 3 mappings require new hardware. Thus this distinction is likely to be relevant soon for code compiled on platforms that never supported A6C. NON_ATOMIC_ABI: Thus I still feel that we need another value to identify "both A6C and A6S compatible", but I'm not yet convinced that NON_ATOMIC is useful. If it is, I would really like to see a more precise definition. |
What's the point of 0? Can that not just be encoded as "not present", given that is how every binary to date is? |
Anything that's RMW will be a libcall, so there's no memory model baked into the binary there. The only issue is that I think atomic loads and stores that just need fences will not use a libcall and just emit the fences directly, given the fences are RVI (which is a bit weird... but they do have a purpose outside of atomics). So non-A code still needs to encode what it's doing. |
It seems to me that we would need a specification for those library calls someplace? The assumption is that those library calls internally use some combination of locks/kernel support/vendor-specific hardware magic? (Or presumably just the last two where lock-freedom is required by language standards.) I still don't see how to make that work on a somewhat vanilla multiprocessor while using ordinary stores as atomic stores. How do you ensure atomicity between a fetch_add library call and a store instruction, making sure that the concurrent store isn't seen between the load and the store of the fetch_add? Certainly that doesn't work for a user-space lock-based implementation. E.g. with (x initially zero), how do you preclude Thread 1: x.store(5); with final outcome r == 1 && x == 1, thus "losing" the store? Can you point to a working example for this sort of thing? Normal lock-based C++ atomics implementations also lock for load/store. Aside from atomic_flag/atomic_signed_lock_free/atomic_unsigned_lock_free, that's fine. But I think you also need more magic of some sort to support those three without the A extension. |
Yes, I agree with that, we should add a spec for those library calls IF we going to define that as First option means NO atomic at all, which is used for single processor code gen, that should be the scenario of non-A. So IMO we should define cc. @asb @patrick-rivos |
Lowering atomic loads/stores to load/store+fences while lowering other atomic operations to libcalls is a gcc issue. We don't do that in LLVM (see llvm/test/CodeGen/RISCV/atomic-load-store.ll). We do as of LLVM 16 support a |
@asb Thanks for the info, seems LLVM is going to |
Fixing GCC to only emit library calls or ordinary ops & Afterwards, we'll have 2 cases:
@asb @kito-cheng These two cases are incompatible, so how do we represent this in the ABI? |
I found the last section in https://llvm.org/docs/Atomics.html. They give two example uses of --forced-atomics, the first one of which doesn't apply to RISC-V. The second one applies only to uniprocessors, and has the _sync primitives implemented as user-level functions known to the kernel, so that they can be restarted from the beginning on a context switch. That's a well-known technique (dating back to at least a 1992 paper by Bershad et al). I can't think of another context in which library functions could be implemented like this while maintaining atomicity with respect to load and store instructions. The question in my mind is whether it's worth supporting a uniprocessor-only convention in 2023. In my context, if you want to be able to link code to something else, you want to be able to link it to code runs on multiple cores. Android has been removing old uniprocessor optimizations since about 2015. I don't understand why "only library calls" would require a separate tag here. Such code is compatible with any other atomics ABI under consideration (so long as the library implementing the _sync primitives also is). It's essentially code that just doesn't use atomics, and is thus ABI-independent in this respect. Expanding all atomic operations to non-atomic instruction sequences takes us from a single HART world to one in which there is only a single thread and there are no asynchronous signals or interrupts. I don't think any of those are required by the C standard, so you could probably have a conforming implementation under those conditions. But that seems even less interesting to me than the uniprocessor case. Certainly atomics support in this scenario would be useful only if the code is also used on a platform that actually supports concurrency. It seems to me that if we really want NON_ATOMIC_ABI at all, it should target the single HART, restartable code sequences implementation. I'm not at all convinced that's warranted, but on very low-end embedded devices, I think it makes more sense than the others. |
IIUC, "only library calls" isn't guaranteed to be compatible unless it only uses It's mentioned here: https://llvm.org/docs/Atomics.html
If it doesn't make sense to support this case in the psABI, then my point is moot :) |
https://reviews.llvm.org/D130621 is the context for why this was added to LLVM. If it does not make sense then please take it up with the author of the patch. |
forced-atomics is a niche feature that you'd definitely never use on a modern application class core, and yes, only makes any sense on an embedded uniprocessor system as far as I can see. In terms of whether this needs representing in ELF attribute in some way...I've got no objection, but I'm also not sure it's a worthwhile endeavour to try to protect people from every conceivable case of linking together code that shouldn't be linked. Handling cases people are likely to run into on Linux-capable systems and getting the low-hanging fruit for common cases on embedded targets (mismatched ABIs or -march) seems sensible, but perhaps it's diminishing returns beyond that. |
That seems to depend on how you read this? I read it as "the compiler calls _sync functions, which are implemented by a user-provided library". With that reasoning, compatibility depends on whether the _sync library is compatible with the other ABI. Which presumably should be indicated by the tag associated with the _sync library. I don't really see the difference between this and the user code explicitly calling into a library that implements atomics.
|
I think it makes sense, but only if the _sync functions are atomic with respect to ordinary load and store instructions. And I know of only one plausible implementation that satisfies that constraint. And that implementations is only correct on uniprocessors. So it sounds like we're kind of, generally, in agreement that the question here is "Is that niche large enough to support?" |
|
Ah. I had it in my mind that |
Changes:
|
I think we do need atomic ABI for non-A configuration, but their is no spec yet, so let drop that and define that once we have spec for that. |
Changes:
|
Changes:
|
Changes:
|
Looks good to me. Thanks! |
binutils PoC: kito-cheng/riscv-binutils-gdb@3020782 |
Changes:
|
Alternative version of #383 Defined 4 atomic abi variant. - unknown, nonatomic, A6C and A6S Unknown is compatilbe with exising object file. Nonatomic is used when A-extension is absent A6C is the table A.6 mapping defined in ISA spec. And the A6S is the mapping defined in psABI spec.
5668541
to
8a4b743
Compare
Gonna merge because we reached a consensus and have a PoC now. |
Alternative version of #383
Defined 4 atomic abi variant.
Unknown is compatilbe with exising object file.
Nonatomic is used when A-extension is absent
A6C is the table A.6 mapping defined in ISA spec.
And the A6S is the mapping defined in psABI spec.