-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define contrainsts for ULEB128 #403
Define contrainsts for ULEB128 #403
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, although I don't really like that constraint, but that's constraint for current implementation, so I think it's good to just document down.
Just add few more linker expert to view, but will commit this Friday IF no further comments. |
I don't think they have to be consecutive pair. We probably should say that two relocations that refer to the same location are processed in-order. That means if a SET_ULEB128 appears before a SUB_ULEB128 in the relocation table, we'll get the desired outcome. |
For |
Can anybody shed light on why GNU ld and lld require them to appear as a pair? In mold, we don't have such restrictions, and it seems they are working just fine; SET_ULEB128 writes a value, and SUB_ULEB128 reads the value, subtracts another value from it and then writes back the result. |
It looks like that's how lld handles SET6 and SUB6 relocations; lld only requires SET6 precedes SUB6 if they refer to the same location, and they don't have to be consecutive. I wonder why lld can't do the same for SET_ULEB128 and SUB_ULEB128. |
There’s only enough space guaranteed for the final result, but the SET might need more bytes. |
I don't believe the wording I have proposed specifies that the SET must come immediately before.
This wording I used was meant to be interpreted to mean that the SET can come anywhere before the SUB in the list of produced relocations. Should the wording be updated to enforce that they must be consecutive? |
That's not limited to SET_ULEB128. We have {SET,SUB}_{6,8,16,32,64,ULEB128}. What's special about ULEB128? |
I would say that's kind of implementation limitation, and this limitation is make implementation easier, since we must handle two relocation together, we can write more complicated logic to handle that right but seems not gain to much rather than give it few more constraint. But why no similar issue for |
Thank you @kito-cheng, so here is my understanding. Can you confirm if we're on the same page?
If that's the case, I think mandating them to appear as a pair makes sense. (I wasn't against the idea; I was just trying to understand the rationale behind it.) |
@rui314 I believe your understanding is correct |
Thanks for the clarification! Then, perhaps we should specify that the two relocations must appear consecutively? By doing so, we can essentially treat the two relocations as one, which I think would greatly simplify the implementation. |
That is fine with me, I can change the wording if other people are okay with that @kito-cheng |
Sorry for the late reply, I was out of office after attending LLVM dev conf, and back now :) I think it's OK to constrain that more stricter: 1) Current assembler implementations are satisfied that constraint, 2) Current and proposed linker implementations are work well with that, 3) easier handling. @rui314 thanks for the suggestion! |
Binutils [1] and a currently in-flight LLD patch [2] define ULEB128 such that a SUB_ULEB128 must be preceded by a matching SET_ULEB128. This means that if the first SET_ULEB128 that comes before a given SUB_ULEB128 does not have the same relocation address, the linkers will fail. A SET_ULEB128 also cannot appear alone. Define this behavior in psABI such that libraries that read relocations are able to make this same assumption. This change enforces that SET_ULEB128 must always immediately preceed the associated SUB_ULEB128, and vice versa. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-riscv.c;h=09aa7be225ef3cba23fefd2ee977b3556a51368c;hb=HEAD#l2506 [2] https://reviews.llvm.org/D142880 Signed-off-by: Charlie Jenkins <[email protected]>
2384bd8
to
e7bd2a3
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, too.
This pairwise usage is IMO what the two relocations were intended for.
The constraints gives leeway for linkers to implement the feature possibly in a more efficient way.
Related question: since this is unsigned LEB, the result of a SET minus a SUB should be positive, but how does the compiler guarantee it if it doesn't know the final layout? What exactly is the use case of this relocation? |
DWARF call sites are encoded as an address or unsigned offset, typically ULEB128 (unless not present, in which case they're an absolute 0 address). Toolchains previously had to turn off using ULEB128 for RISC-V and use less efficient encodings instead, but this allows them to make RISC-V look more normal and use ULEB128 again. |
If a pair of SET and SUB relocations are used to calculate an unsigned offset, wouldn't the compiler already know the maximum possible offset? I'm wondering if we can trigger a relocation overflow error for the pair without it being a compiler bug. |
Yes, hence how it's possible to know how much to pad the .uleb128 by to ensure there are enough bytes to avoid overflow. Otherwise it just wouldn't work. |
Ah, if that's the case, we don't need to check for overflow in the linker specifically for ULEB128 relocations, do we? While detecting a compiler bug early on is beneficial, the psABI spec shouldn't require the linker to do that. Instead, it should be an optional measure the linker can adopt for all SET/SUB relocation pairs, not just ULEB but other types as well, if they chose to verify the input rigorously. I believe we should revert this change. |
s/should/can/? This patch is correct and adds more constraints to producers: "Must be placed immediately before a SUB_ULEB128 with the same offset. Local label assignment <<uleb128-note,*note>>" |
Then why stop here and impose this restriction only for ULEB? If it's recommended to find a compiler bug, it looks like it is hard to justify doing it only for ULEB and not for {SET,SUB}{8,16,32,64} relocations. |
More context for the demand of ULEB128: dwarf5 has add few new records are only available with ULEB128 encoding, unlike other record can using alternative encoding.
Both LLD (proposed patch) and ld.bfd has implement the check, will report error and abort.
Yeah, we may also implement overflow check for |
Yeah, checking for overflow in the linker if SET and SUB are consecutive in the relocation table is perhaps a good practice to detect compiler bugs. However, isn't it odd that the psABI mandates the linker to do it in this specific case? I believe the confusion arises from the misconception that only {SET, SUB}_ULEB128 can overflow in a valid use case. This isn't true unless the compiler is buggy. |
The reason I was interested in this was that I noticed that LLVM and binutils both implemented this constrained version and I am writing a library that consumes relocations that would like to use the same assumption. The library is module loading in the Linux kernel: https://lore.kernel.org/linux-mm/[email protected]/T/. It is preferable to have the same behavior of the module linker as a user would expect from using binutils or LLVM. However, without this psABI requirement, it would not be valid to implement this restriction and the behavior could be different. |
@charlie-rivos It turned out that the relocation overflow should never occur because the compiler knows the maximum length of a ULEB value when they create object files. So the exact behavior would be the same if you handle For the mold linker, I'll keep handling @MaskRay Maybe you want to simplify the logic for lld too? These relocations can be handled individually just like other SET/SUB relocations. You might want to keep the overflow check as an assertion that works only when SET and SUB happen to be consecutive, but if you do that, you probably should do for all SET/SUB relocation paris (or don't do it at all for all SET/SUB). |
Agree.
Agree. My mindset is: unless the wording says that no-checking is performed (like AArch64
lld hasn't implement |
LD also doesn't check overflows for ADD/SET/SUB. Currently we write the value back to the data filed immediately when handing each ADD/SET/SUB. I tried to add the overflow checks for each of them, but met troubles since gcc may generate s ADD32/SET32/SUB32 under medany, which means the symbols may be 64-bit and will break the overflow checking every time, but the final values will fit into 32-bits so these should be legal. A solution is that we should have an internal structures, including linked list or hash tables, to record the changes of values for each data field, and then write back the final values and check overflows until reviewed every ADD/SET/SUB relocations.
Agreed. Since we haven't supported the overflow checks for ADD/SET/SUB, so the easier and fastest way is to add the limitations to make SET_ULEB128/SUB_ULEB128 relocations work. It's fine to remove those limitations from gnu toolchain, and maybe it's time to also add the overflow checks for ADD/SET/SUB/ULEB128 since they are the same issue from the implementation aspect. |
Binutils [1] and a currently in-flight LLD patch [2] define ULEB128 such that a SUB_ULEB128 must be preceded by a matching SET_ULEB128. This means that if the first SET_ULEB128 that comes before a given SUB_ULEB128 does not have the same relocation address, the linkers will fail. A SET_ULEB128 also cannot appear alone.
Define this behavior in psABI such that libraries that read relocations are able to make this same assumption.
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-riscv.c;h=09aa7be225ef3cba23fefd2ee977b3556a51368c;hb=HEAD#l2506
[2] https://reviews.llvm.org/D142880