Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling convention for vector arguments #38

Closed
PkmX opened this issue Sep 1, 2020 · 4 comments
Closed

Calling convention for vector arguments #38

PkmX opened this issue Sep 1, 2020 · 4 comments

Comments

@PkmX
Copy link

PkmX commented Sep 1, 2020

Currently the vector spec only defines all vector registers/CSRs as caller saved, but it does not specify how to pass vectors as arguments.

We propose a calling convention where named vector arguments are passed from v1 to v31, and for vector types with LMUL > 1, it must be allocated to the next vector register that is aligned to their LMUL. Vector types with fractional LMULs and vector mask types (vbool*_t) are treated as occupying one register. Segment vector types should be passed in consecutive vector registers aligned to the base vector's LMUL. Vector types are returned in the same manner as the first vector argument. If all vector registers for argument passing are exhausted, then the rest of the vector arguments are passed on stack as whole vector register by pointers.

Some examples (the argument name corresponds to the vector register it uses):

// Vector arguments are passed from v1, v2, ..., v31
void f(vint8m1_t v1, vint8m1_t v2);

// For LMUL=8 types, they are passed in v8, v16, v24, and the rest on stack
void f(vint8m8_t v8_v15, vint8m8_t v16_v23);
void f(vint8m8_t v8_v15, vint8m8_t v16_v23, vint8m8_t v24_v31, vint8m8_t on_stack);

// For arguments with mixed LMUL, the vector register number is aligned to LMUL
void f(vint16m2_t v2_v3, vint8m1_t v4, vint64m8_t v8_v15);

// Returning grouped vector should be aligned to its LMUL (v8~v15 in this case)
vint32m8_t f(vint16m4_t v4_v7);

// fractional LMUL or vbool types are treated as LMUL=1
void f(vint8mf2_t v1, vbool8_t v2, vint8mf8_t v3);

// Segment types are aligned to the base LMUL
void f(vint8m1_t v1, vint16m2x3 v2_v7);

We avoid allocating v0 in the calling convention due to its ubiquitous purpose as the mask register, so callee do not have to move the first argument off v0 if it needs to use masked instructions.

The spec already defines all vector registers as caller-saved, so all of them may be allocated either as argument-passing registers or as temporary registers. The proposal right now chooses all for passing arguments so it is possible to pass up to 3 m8 arguments via register, but it may be up for debate anyway.

There is also a small optimization opportunity where smaller LMUL arguments can fill holes left by alignments due to previous larger LMUL arguments, for example:

void f(vint8m1_t v1, vint8m8_t v8_v15, vint8m1_t v2_or_v16);

Since v2 to v7 remain unused in the example, the m1 argument following m8 may be packed into v2 instead of using the next v16 register. This uses the registers more efficiently at the cost of slightly more complexity in the calling convention.

Any thoughts?

@rofirrim
Copy link
Collaborator

rofirrim commented Sep 8, 2020

Hi @PkmX thanks for the proposal. Seems reasonable.

In the EPI project we implemented the following interim calling convention:

Registers v0 and v16...v23 can be used to pass vector values.

The process is as follows (parameters would be processed in the order of the C declaration):

  • mark registers v0 and v16...v23 as free.
  • if the parameter is a mask type and v0 is free, then assign the parameter to v0 (mark the register as used)
  • if the parameter does not have mask type or it does have mask type but v0 is not free then, use the following algorithm
    • determine the register group size (RGS) of the type
    • find the lowest numbered register vX in the set v16...v23 such that the register is free and the RGS divides X. If there are RGS consecutive free registers (starting from and including vX) then assign all of them to that parameter (mark all the registers as used)
    • if there are no free registers eligible under the case above, then the vector is passed via the stack.

We initially believed it made sense to have callee-saved registers, hence the very limited range from v16 to v23. However the experience so far is that it may not be very valuable to have callee-saved registers (we could have them in an alternative, more specialized, calling convention such as one used by a vectorizer). The algorithm also prioritizes the mask to be in v0 (if any).

If we take the above algorithm and we use v0, v1...v31 (instead of v0, v16...v23) then I think that would be similar to your proposal. It also addresses the question of the holes you make at the end.

I wouldn't be very worried about the 3 m8. In general, if there is a mask around, 3 is also the maximum number of useable m8 register groups (because the mask in v0 prevents us from using the register group v0...v7).

One thing missing from the algorithm above is that segment vectors. In that case I understand X needs to divide RGS but we need X * nf registers available. Also, as you mentioned, fractional LMULs imply a RGS=1.

@ebahapo
Copy link

ebahapo commented Sep 8, 2020

I think that the merger of proposals resulting in v0 and v1..31 as very interesting, including filling prior registers after an argument with LMUL > 1. However, IMO, 3 arguments is the practical minimum and more than this enters in the "nice to have" territory, which leaves just one m8 register available for consideration for callee saved ones, seemingly too few to make much of a difference.

Therefore, methinks that it's rather premature to nail down the calling convention. I'd prefer to have a working implementation upstream so that we can then model the best calling convention, especially on the issue of callee saved registers.

@zakk0610
Copy link
Collaborator

@PkmX @rofirrim please take a look the calling convention PR. riscv-non-isa/riscv-elf-psabi-doc#171

@eopXD
Copy link
Collaborator

eopXD commented Jul 29, 2022

This is discussion ongoing in the PSABI working group and out of scope of this repository. Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants