Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for Vector Calling Convention #389
Proposal for Vector Calling Convention #389
Changes from all commits
4b4b8c5
27deeb2
6c0fda8
4b1cab1
37be8dd
e50e469
a55384b
a0add09
8776417
d7b1e51
78c9230
c1c6635
21b9592
0437e69
2c7a11f
b89beab
4724a1c
814a846
07b4a5a
0283e38
0fc69ed
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to allocate the register in order? For example,
func foo(vint32m8 a, vint32m1 b, vint32m8 c)
,if we allocate in order, we can just allocate
a
tov8-v15
,b
tov16
andc
by reference,if we would like to maximize register usage, we have to allocate
a
tov8-v15
,c
tov16-v23
andb
by reference.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about this before, and I thought it would make the allocation rules very complicated (the allocation algorithm needs to choose the optimal allocation results while maintaining certainty), but I didn't see much benefit. Even these current rules are actually a little complicated, the actual user will write the interface when the function of such arguments? More discussion is welcome about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we didn't sort the arguments, it's potentially violation for tuple type consecutive constraint as the example shown in line 401, since
c
is split into twovint32m1_t
in backend, the normal ordered allocation would assign firstvint32m1_t
tov9
and the second one tov12
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which backend? We treat
c
as a whole argument(not split), just like avint32m2_t
type, but without needing align 2.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example,
void foo(vint32m1x2_t a) {}
is lowered todefine void @foo(<vscale x 2 x i32> %a.coerce0, <vscale x 2 x i32> %a.coerce1) { entry: ret void }
in LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So LLVM can't allocate tuple argument as a whole argument? If this is the case, this proposal is now unimplementable for LLVM. The current specification allows a certain amount of REORDER, but not full REORDER. you can see an example of the second NOTE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without attempting to read all the details here:
LLVM can be made to implement whatever you want it to do (though some things are easier than others). Talking about whatever IR Clang happens to emit today and how the backend deals with that today isn't particularly meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kito-cheng what do you think? Do you think we should implement tuple type as a single type instead of splitting to multiple single type in LLVM IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add note to mention those function/symbol which use standard vector calling convention variant must mark with STO_RISCV_VARIANT_CC, this should address the concern of glibc resolver trampoline and resolver - since it will require no lazy binding.
e.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also one more note about the setjmp/longjmp:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not have been done like this. STO_RISCV_VARIANT_CC is an ELF-specific thing (a flag in ElfXX_Sym's st_other) detailed in riscv-elf.adoc, but riscv-cc.adoc is a separate document that describes the calling convention independently from the underlying file format (and could be used by Windows or macOS if they choose to adopt RISC-V, which use PE/COFF and Mach-O respectively).