From 4b4b8c51fd0825ed21b3849ce29dd24ebd731e1d Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Fri, 16 Jun 2023 21:37:53 +0800
Subject: [PATCH 01/21] add vector abi propsal

---
 riscv-cc.adoc | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 25a3b3ab..f74ea23e 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -104,17 +104,29 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 |===
 | Name    | ABI Mnemonic | Meaning                      | Preserved across calls?
 
-| v0-v31  |              | Temporary registers          | No
+| v0      |              | First vector mask argument or mask return value | No
+| v1-v31  |              | Other vector mask and data Argument registers   | No
 | vl      |              | Vector length                | No
 | vtype   |              | Vector data type register    | No
 | vxrm    |              | Vector fixed-point rounding mode register    | No
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
+The first vector mask type scalar argument and mask return value use v0 to pass.
+Other vectors mask type scalar arguments pass like vector data scalar arguments,
+See the fellowing description about vector data type scalar arguments.
 
-Vector registers are not used for passing arguments or return values; we
-intend to define a new calling convention variant to allow that as a future
-software optimization.
+vector data type scalar arguments has property LMUL. If it's LMUL greater then 1,
+the allocated first register number must multipe of LMUL. If it is possible to
+find unused registers segment starting from v1 and the first register in segment
+is LMUL-aligned, use these registers to pass the argument. Otherwise, pass 
+through the function stack.
+
+Aggregates cannot pass with vector registers.
+
+Variadic vector arguments are passed by reference.
+
+NOTE: vector mask type and data type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]. Data type include tuple type.
 
 The `vxrm` and `vxsat` fields of `vcsr` are not preserved across calls and their
 values are unspecified upon entry.

From 27deeb2c6665f5e09f5c02c7a645594661d8412d Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Fri, 16 Jun 2023 23:02:57 +0800
Subject: [PATCH 02/21] fix typo

---
 riscv-cc.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index f74ea23e..440e3e28 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -114,7 +114,7 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 
 The first vector mask type scalar argument and mask return value use v0 to pass.
 Other vectors mask type scalar arguments pass like vector data scalar arguments,
-See the fellowing description about vector data type scalar arguments.
+See the following description about vector data type scalar arguments.
 
 vector data type scalar arguments has property LMUL. If it's LMUL greater then 1,
 the allocated first register number must multipe of LMUL. If it is possible to

From 6c0fda8be742f6650c5d1affb86a0a0fabb80c43 Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Sat, 17 Jun 2023 12:12:43 +0800
Subject: [PATCH 03/21] improve more about vector calling convention

---
 riscv-cc.adoc | 80 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 20 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 440e3e28..8696fe6a 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -112,22 +112,6 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
-The first vector mask type scalar argument and mask return value use v0 to pass.
-Other vectors mask type scalar arguments pass like vector data scalar arguments,
-See the following description about vector data type scalar arguments.
-
-vector data type scalar arguments has property LMUL. If it's LMUL greater then 1,
-the allocated first register number must multipe of LMUL. If it is possible to
-find unused registers segment starting from v1 and the first register in segment
-is LMUL-aligned, use these registers to pass the argument. Otherwise, pass 
-through the function stack.
-
-Aggregates cannot pass with vector registers.
-
-Variadic vector arguments are passed by reference.
-
-NOTE: vector mask type and data type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]. Data type include tuple type.
-
 The `vxrm` and `vxsat` fields of `vcsr` are not preserved across calls and their
 values are unspecified upon entry.
 
@@ -341,6 +325,30 @@ type would be passed.
 Floating-point registers fs0-fs11 shall be preserved across procedure calls,
 provided they hold values no more than ABI_FLEN bits wide.
 
+=== Hardware Vector Calling Convention
+
+The first named scalar argument of vector mask type use v0 to pass. Other named
+scalar arguments of the same type pass like the named scalar arguments of
+vector data type. See the following description about named scalar arguments
+of vector data type.
+
+Named scalar arguments of vector data type has property LMUL. If LMUL is less
+than 1, treat it as 1. If it is possible to find unused registers segment
+starting from v1 and the first register in segment is LMUL-aligned, use these
+registers to pass the argument. Otherwise, pass through the function stack.
+
+Aggregates cannot pass with vector registers.
+
+Variadic vector arguments are passed by reference.
+
+Vector mask value is rturned in the same manner as a first named argument of the
+same type would be passed.
+
+Vector data value is returnd in the same manner as a first named argument of the
+same type would be passed.
+
+NOTE: vector mask type and data type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]. Data type include tuple type.
+
 === ILP32E Calling Convention
 
 IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the
@@ -386,6 +394,22 @@ ILP32E:: <<ILP32E Calling Convention,ILP32E calling-convention>> only,
 hardware floating-point calling convention is not used (i.e. <<ELFCLASS32,ELFCLASS32>>,
 <<EF_RISCV_FLOAT_ABI_SOFT,EF_RISCV_FLOAT_ABI_SOFT>>, and <<EF_RISCV_RVE,EF_RISCV_RVE>>).
 
+[[abi-ilp32v]]
+ILP32V: ILP32 with hardware vector calling convention (i.e.
+<<ELFCLASS32,ELFCLASS32>> and <<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
+
+[[abi-ilp32fv]]
+ILP32FV: ILP32 with hardware floating-point for ABI_FLEN=32 and hardware vector
+calling convention (i.e. <<ELFCLASS32,ELFCLASS32>>,
+<<EF_RISCV_FLOAT_ABI_SINGLE,EF_RISCV_FLOAT_ABI_SINGLE>> and
+<<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
+
+[[abi-ilp32dv]]
+ILP32DV: ILP32 with hardware floating-point for ABI_FLEN=64 and hardware vector
+calling convention (i.e. <<ELFCLASS32,ELFCLASS32>>,
+<<EF_RISCV_FLOAT_ABI_DOUBLE,EF_RISCV_FLOAT_ABI_DOUBLE>> and
+<<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
+
 [[abi-lp64]]
 LP64:: Integer calling-convention only, hardware
 floating-point calling convention is not used (i.e. <<ELFCLASS64,ELFCLASS64>> and
@@ -406,6 +430,22 @@ LP64Q:: LP64 with hardware floating-point calling
 convention for ABI_FLEN=128 (i.e. <<ELFCLASS64,ELFCLASS64>> and
 <<EF_RISCV_FLOAT_ABI_QUAD,EF_RISCV_FLOAT_ABI_QUAD>>).
 
+[[abi-lp64v]]
+LP64V: LP64 with hardware vector calling convention (i.e.
+<<ELFCLASS64,ELFCLASS64>> and <<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
+
+[[abi-lp64fv]]
+LP64FV: LP64 with hardware floating-point for ABI_FLEN=32 and hardware vector
+calling convention (i.e. <<ELFCLASS64,ELFCLASS64>>,
+<<EF_RISCV_FLOAT_ABI_SINGLE,EF_RISCV_FLOAT_ABI_SINGLE>> and
+<<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
+
+[[abi-lp64dv]]
+LP64DV: LP64 with hardware floating-point for ABI_FLEN=64 and hardware vector
+calling convention (i.e. <<ELFCLASS64,ELFCLASS64>>,
+<<EF_RISCV_FLOAT_ABI_DOUBLE,EF_RISCV_FLOAT_ABI_DOUBLE>> and
+<<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
+
 The ILP32* ABIs are only compatible with RV32* ISAs, and the LP64* ABIs are
 only compatible with RV64* ISAs. A future version of this specification may
 define an ILP32 ABI for the RV64 ISA, but currently this is not a supported
@@ -445,8 +485,8 @@ Please refer to the documentation of the RISC-V execution environment interface
 
 There are two conventions for C/{Cpp} type sizes and alignments.
 
-ILP32, ILP32F, ILP32D, and ILP32E:: Use the following type sizes and
-alignments (based on the ILP32 convention):
+ILP32, ILP32F, ILP32D, ILP32E, ILP32V, ILP32FV, ILP32DV:: Use the following type
+sizes and alignments (based on the ILP32 convention):
 +
 .C/{Cpp} type sizes and alignments for RV32
 [cols="4,>2,>3"]
@@ -471,8 +511,8 @@ alignments (based on the ILP32 convention):
 | long double _Complex | 32            | 16
 |===
 
-LP64, LP64F, LP64D, and LP64Q:: Use the following type sizes and
-alignments (based on the LP64 convention):
+LP64, LP64F, LP64D, LP64Q, LP64V, LP64FV, LP64DV:: Use the following type sizes
+and alignments (based on the LP64 convention):
 +
 .C/{Cpp} type sizes and alignments for RV64
 [cols="4,>2,>3"]

From 4b1cab132053460b92e5354caa21215bf2a577f6 Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Tue, 20 Jun 2023 14:23:23 +0800
Subject: [PATCH 04/21] add struct pass rule and update some description

---
 riscv-cc.adoc | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 8696fe6a..8e65d288 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -327,27 +327,31 @@ provided they hold values no more than ABI_FLEN bits wide.
 
 === Hardware Vector Calling Convention
 
-The first named scalar argument of vector mask type use v0 to pass. Other named
-scalar arguments of the same type pass like the named scalar arguments of
-vector data type. See the following description about named scalar arguments
-of vector data type.
+This section applies only to named arguments. Variadic arguments of vector type are passed by reference.
 
-Named scalar arguments of vector data type has property LMUL. If LMUL is less
-than 1, treat it as 1. If it is possible to find unused registers segment
-starting from v1 and the first register in segment is LMUL-aligned, use these
-registers to pass the argument. Otherwise, pass through the function stack.
+The hardware vector calling convention adds 1 argument register for vector mask type argument and 31 argument registers for vector data and tuple type argument which are v0 and v1-v31, respectively. v0 is used for the first vector mask type argument and the vector mask type return value, the rest of the mask type arguments are treated as vector data type arguments. v1-v31 are also used for the vector data and tuple type return value.
 
-Aggregates cannot pass with vector registers.
+Vector data type arguments have properties LMUL and NREGS, the current LMUL can be 1/8, 1/4, 1/2, 1, 2, 4, 8, the current NREGS can be 1, 2, 4, 8. For arguments with LMUL less than 1, their LMUL is treated as 1. The LMUL of the vector mask type argument is treated as 1. The NREGS property means the number of registers needed for this argument. For vector data type, NREGS is 1 when LMUL is less than 1, otherwise NREGS is equal to LMUL. If it is possible to find NREGS unused continuous vector register set starting from v1 and its first register is LMUL-aligned, use these registers to pass the argument. Otherwise, the argument is passed by reference.
 
-Variadic vector arguments are passed by reference.
+vector tuple type arguments have the same LMUL and NREGS properties as the vector data type, but also have the NF property. NREGS equals NF multiplied by LMUL, but cannot exceed 8. The process of finding the argument registers is the same as for the vector data type.
 
-Vector mask value is rturned in the same manner as a first named argument of the
-same type would be passed.
+NOTE: The vector mask type, data type and tuple type are defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here].
 
-Vector data value is returnd in the same manner as a first named argument of the
-same type would be passed.
+A struct containing just one vector type value is passed as though it were a standalone vector type value.
 
-NOTE: vector mask type and data type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]. Data type include tuple type.
+A struct containing two vector type values is passed in vector registers, if there are suitable unused continuous vector register sets for them.
+
+A struct containing one vector type value and one integer (or bitfield), in either order, is passed in vector registers and an integer register, provided the integer is no more than XLEN bits wide, and at least one unused continuous vector register set and at least one integer argument register are available.
+
+A struct containing one vector type value and one floating-point real, in either order, is passed in vector registers and an floating-point register, provided the floating-point real is no more than ABI_FLEN bits wide, and at least one unused continuous vector register set and at least one floating-point argument register are available.
+
+A struct is not passed in the above manner, then it is passed according to the integer calling convention.
+
+NOTE: See the Hardware Floating-point Calling Convention section for the definition of struct.
+
+Unions are never flattened and are always passed according to the integer calling convention.
+
+Values are returned in the same manner as the first named argument of the same type would be passed.
 
 === ILP32E Calling Convention
 

From 37be8ddbd8824850d0817ed33d1db5a321002a9c Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Thu, 22 Jun 2023 14:51:39 +0800
Subject: [PATCH 05/21] remove new abi names and add STO_RISC_VARIANT_CC info

---
 riscv-cc.adoc | 34 ++--------------------------------
 1 file changed, 2 insertions(+), 32 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 8e65d288..92f35f2e 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -353,6 +353,8 @@ Unions are never flattened and are always passed according to the integer callin
 
 Values are returned in the same manner as the first named argument of the same type would be passed.
 
+If a function uses vector registers to pass arguments or return value, the function needs to be annotated with `STO_RISCV_VARIANT_CC`. See <<Dynamic Linking>> for more details.
+
 === ILP32E Calling Convention
 
 IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the
@@ -398,22 +400,6 @@ ILP32E:: <<ILP32E Calling Convention,ILP32E calling-convention>> only,
 hardware floating-point calling convention is not used (i.e. <<ELFCLASS32,ELFCLASS32>>,
 <<EF_RISCV_FLOAT_ABI_SOFT,EF_RISCV_FLOAT_ABI_SOFT>>, and <<EF_RISCV_RVE,EF_RISCV_RVE>>).
 
-[[abi-ilp32v]]
-ILP32V: ILP32 with hardware vector calling convention (i.e.
-<<ELFCLASS32,ELFCLASS32>> and <<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
-
-[[abi-ilp32fv]]
-ILP32FV: ILP32 with hardware floating-point for ABI_FLEN=32 and hardware vector
-calling convention (i.e. <<ELFCLASS32,ELFCLASS32>>,
-<<EF_RISCV_FLOAT_ABI_SINGLE,EF_RISCV_FLOAT_ABI_SINGLE>> and
-<<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
-
-[[abi-ilp32dv]]
-ILP32DV: ILP32 with hardware floating-point for ABI_FLEN=64 and hardware vector
-calling convention (i.e. <<ELFCLASS32,ELFCLASS32>>,
-<<EF_RISCV_FLOAT_ABI_DOUBLE,EF_RISCV_FLOAT_ABI_DOUBLE>> and
-<<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
-
 [[abi-lp64]]
 LP64:: Integer calling-convention only, hardware
 floating-point calling convention is not used (i.e. <<ELFCLASS64,ELFCLASS64>> and
@@ -434,22 +420,6 @@ LP64Q:: LP64 with hardware floating-point calling
 convention for ABI_FLEN=128 (i.e. <<ELFCLASS64,ELFCLASS64>> and
 <<EF_RISCV_FLOAT_ABI_QUAD,EF_RISCV_FLOAT_ABI_QUAD>>).
 
-[[abi-lp64v]]
-LP64V: LP64 with hardware vector calling convention (i.e.
-<<ELFCLASS64,ELFCLASS64>> and <<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
-
-[[abi-lp64fv]]
-LP64FV: LP64 with hardware floating-point for ABI_FLEN=32 and hardware vector
-calling convention (i.e. <<ELFCLASS64,ELFCLASS64>>,
-<<EF_RISCV_FLOAT_ABI_SINGLE,EF_RISCV_FLOAT_ABI_SINGLE>> and
-<<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
-
-[[abi-lp64dv]]
-LP64DV: LP64 with hardware floating-point for ABI_FLEN=64 and hardware vector
-calling convention (i.e. <<ELFCLASS64,ELFCLASS64>>,
-<<EF_RISCV_FLOAT_ABI_DOUBLE,EF_RISCV_FLOAT_ABI_DOUBLE>> and
-<<EF_RISCV_VECTOR_ABI,EF_RISCV_VECTOR_ABI>>).
-
 The ILP32* ABIs are only compatible with RV32* ISAs, and the LP64* ABIs are
 only compatible with RV64* ISAs. A future version of this specification may
 define an ILP32 ABI for the RV64 ISA, but currently this is not a supported

From e50e46978d322b96eb0ea910905fab37c70fa27b Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Thu, 22 Jun 2023 15:57:24 +0800
Subject: [PATCH 06/21] add rule of struct with vector type field

---
 riscv-cc.adoc | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 92f35f2e..754a6f88 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -105,7 +105,7 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 | Name    | ABI Mnemonic | Meaning                      | Preserved across calls?
 
 | v0      |              | First vector mask argument or mask return value | No
-| v1-v31  |              | Other vector mask and data Argument registers   | No
+| v1-v31  |              | Other vector mask and data argument registers   | No
 | vl      |              | Vector length                | No
 | vtype   |              | Vector data type register    | No
 | vxrm    |              | Vector fixed-point rounding mode register    | No
@@ -172,6 +172,10 @@ available, the aggregate is passed on the stack. Bits unused due to
 padding, and bits past the end of an aggregate whose size in bits is not
 divisible by XLEN, are undefined.
 
+The size of the vector type is considered to be unknown. So if structs contain a field of vector type, they always are passed by reference.
+
+NOTE: The vector type mentioned here refers to the type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]
+
 Aggregates or scalars passed on the stack are aligned to the greater of the
 type alignment and XLEN bits, but never more than the stack alignment.
 
@@ -459,7 +463,7 @@ Please refer to the documentation of the RISC-V execution environment interface
 
 There are two conventions for C/{Cpp} type sizes and alignments.
 
-ILP32, ILP32F, ILP32D, ILP32E, ILP32V, ILP32FV, ILP32DV:: Use the following type
+ILP32, ILP32F, ILP32D, ILP32E:: Use the following type
 sizes and alignments (based on the ILP32 convention):
 +
 .C/{Cpp} type sizes and alignments for RV32
@@ -485,7 +489,7 @@ sizes and alignments (based on the ILP32 convention):
 | long double _Complex | 32            | 16
 |===
 
-LP64, LP64F, LP64D, LP64Q, LP64V, LP64FV, LP64DV:: Use the following type sizes
+LP64, LP64F, LP64D, LP64Q:: Use the following type sizes
 and alignments (based on the LP64 convention):
 +
 .C/{Cpp} type sizes and alignments for RV64

From a55384b9f20abaae758e2bdb946cf207261559ed Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Thu, 22 Jun 2023 22:22:44 +0800
Subject: [PATCH 07/21] add rule for vector type scalar in integer calling
 convention

---
 riscv-cc.adoc | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 754a6f88..05e76310 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -162,6 +162,10 @@ the high-order XLEN bits are passed on the stack.
 Scalars wider than 2×XLEN bits are passed by reference and are replaced in the
 argument list with the address.
 
+The size of the vector type is considered to be unknown. So vector type scalar always is passed by reference.
+
+NOTE: The vector type mentioned here refers to the type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]
+
 Aggregates whose total size is no more than XLEN bits are passed in
 a register, with the fields laid out as though they were passed in memory. If
 no register is available, the aggregate is passed on the stack.
@@ -170,11 +174,7 @@ of registers; if only one register is available, the first XLEN bits are passed
 in a register and the remaining bits are passed on the stack. If no registers are
 available, the aggregate is passed on the stack. Bits unused due to
 padding, and bits past the end of an aggregate whose size in bits is not
-divisible by XLEN, are undefined.
-
-The size of the vector type is considered to be unknown. So if structs contain a field of vector type, they always are passed by reference.
-
-NOTE: The vector type mentioned here refers to the type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]
+divisible by XLEN, are undefined. If the aggregate contains a field of vector type, it always is passed by reference.
 
 Aggregates or scalars passed on the stack are aligned to the greater of the
 type alignment and XLEN bits, but never more than the stack alignment.

From a0add096ff85ec36c9bc81d227b69120a5fed359 Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Sat, 24 Jun 2023 14:35:14 +0800
Subject: [PATCH 08/21] Rollback unrelated changes

---
 riscv-cc.adoc | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 05e76310..59fb30ee 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -463,8 +463,8 @@ Please refer to the documentation of the RISC-V execution environment interface
 
 There are two conventions for C/{Cpp} type sizes and alignments.
 
-ILP32, ILP32F, ILP32D, ILP32E:: Use the following type
-sizes and alignments (based on the ILP32 convention):
+ILP32, ILP32F, ILP32D, and ILP32E:: Use the following type sizes and
+alignments (based on the ILP32 convention):
 +
 .C/{Cpp} type sizes and alignments for RV32
 [cols="4,>2,>3"]
@@ -489,8 +489,8 @@ sizes and alignments (based on the ILP32 convention):
 | long double _Complex | 32            | 16
 |===
 
-LP64, LP64F, LP64D, LP64Q:: Use the following type sizes
-and alignments (based on the LP64 convention):
+LP64, LP64F, LP64D, and LP64Q:: Use the following type sizes and
+alignments (based on the LP64 convention):
 +
 .C/{Cpp} type sizes and alignments for RV64
 [cols="4,>2,>3"]

From 87764172d5eec7ba6645e01596bc8cdd2dcf5b8e Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Tue, 27 Jun 2023 15:18:23 +0800
Subject: [PATCH 09/21] Fix according to kito's comment

---
 riscv-cc.adoc | 25 ++++++-------------------
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 59fb30ee..deda4ed4 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -104,14 +104,15 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 |===
 | Name    | ABI Mnemonic | Meaning                      | Preserved across calls?
 
-| v0      |              | First vector mask argument or mask return value | No
-| v1-v31  |              | Other vector mask and data argument registers   | No
+| v0-v31  |              | Temporary registers          | No
 | vl      |              | Vector length                | No
 | vtype   |              | Vector data type register    | No
 | vxrm    |              | Vector fixed-point rounding mode register    | No
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
+Vector registers are not used for passing arguments or return values; use <<Standard Vector Calling Convention Variant>> if want to passing arguments or return values in vector register.
+
 The `vxrm` and `vxsat` fields of `vcsr` are not preserved across calls and their
 values are unspecified upon entry.
 
@@ -162,7 +163,7 @@ the high-order XLEN bits are passed on the stack.
 Scalars wider than 2×XLEN bits are passed by reference and are replaced in the
 argument list with the address.
 
-The size of the vector type is considered to be unknown. So vector type scalar always is passed by reference.
+The size of scalars with vector type is considered to be unknown, so these scalars always is passed by reference.
 
 NOTE: The vector type mentioned here refers to the type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]
 
@@ -329,7 +330,7 @@ type would be passed.
 Floating-point registers fs0-fs11 shall be preserved across procedure calls,
 provided they hold values no more than ABI_FLEN bits wide.
 
-=== Hardware Vector Calling Convention
+=== Standard Vector Calling Convention Variant
 
 This section applies only to named arguments. Variadic arguments of vector type are passed by reference.
 
@@ -341,23 +342,9 @@ vector tuple type arguments have the same LMUL and NREGS properties as the vecto
 
 NOTE: The vector mask type, data type and tuple type are defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here].
 
-A struct containing just one vector type value is passed as though it were a standalone vector type value.
-
-A struct containing two vector type values is passed in vector registers, if there are suitable unused continuous vector register sets for them.
-
-A struct containing one vector type value and one integer (or bitfield), in either order, is passed in vector registers and an integer register, provided the integer is no more than XLEN bits wide, and at least one unused continuous vector register set and at least one integer argument register are available.
-
-A struct containing one vector type value and one floating-point real, in either order, is passed in vector registers and an floating-point register, provided the floating-point real is no more than ABI_FLEN bits wide, and at least one unused continuous vector register set and at least one floating-point argument register are available.
-
-A struct is not passed in the above manner, then it is passed according to the integer calling convention.
-
-NOTE: See the Hardware Floating-point Calling Convention section for the definition of struct.
-
-Unions are never flattened and are always passed according to the integer calling convention.
-
 Values are returned in the same manner as the first named argument of the same type would be passed.
 
-If a function uses vector registers to pass arguments or return value, the function needs to be annotated with `STO_RISCV_VARIANT_CC`. See <<Dynamic Linking>> for more details.
+If a function uses standard vector calling convention variant, the function needs to be annotated with `STO_RISCV_VARIANT_CC`. See xref:riscv-elf.adoc#Dynamic Linking[Dynamic Linking] for more details.
 
 === ILP32E Calling Convention
 

From d7b1e510a2453159fb2c6357e948cfe0bac478b6 Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Tue, 27 Jun 2023 15:20:13 +0800
Subject: [PATCH 10/21] Fix cross document link

---
 riscv-cc.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index deda4ed4..47651b5c 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -344,7 +344,7 @@ NOTE: The vector mask type, data type and tuple type are defined https://github.
 
 Values are returned in the same manner as the first named argument of the same type would be passed.
 
-If a function uses standard vector calling convention variant, the function needs to be annotated with `STO_RISCV_VARIANT_CC`. See xref:riscv-elf.adoc#Dynamic Linking[Dynamic Linking] for more details.
+If a function uses standard vector calling convention variant, the function needs to be annotated with `STO_RISCV_VARIANT_CC`. See xref:riscv-elf.adoc#dynamic-linking[Dynamic Linking] for more details.
 
 === ILP32E Calling Convention
 

From 78c9230fa274069db072dacff7a200bb7d75840d Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Thu, 29 Jun 2023 14:57:42 +0800
Subject: [PATCH 11/21] address some comments by sorear

---
 riscv-cc.adoc | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 47651b5c..6187a288 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -163,8 +163,6 @@ the high-order XLEN bits are passed on the stack.
 Scalars wider than 2×XLEN bits are passed by reference and are replaced in the
 argument list with the address.
 
-The size of scalars with vector type is considered to be unknown, so these scalars always is passed by reference.
-
 NOTE: The vector type mentioned here refers to the type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]
 
 Aggregates whose total size is no more than XLEN bits are passed in
@@ -175,7 +173,7 @@ of registers; if only one register is available, the first XLEN bits are passed
 in a register and the remaining bits are passed on the stack. If no registers are
 available, the aggregate is passed on the stack. Bits unused due to
 padding, and bits past the end of an aggregate whose size in bits is not
-divisible by XLEN, are undefined. If the aggregate contains a field of vector type, it always is passed by reference.
+divisible by XLEN, are undefined.
 
 Aggregates or scalars passed on the stack are aligned to the greater of the
 type alignment and XLEN bits, but never more than the stack alignment.
@@ -334,7 +332,7 @@ provided they hold values no more than ABI_FLEN bits wide.
 
 This section applies only to named arguments. Variadic arguments of vector type are passed by reference.
 
-The hardware vector calling convention adds 1 argument register for vector mask type argument and 31 argument registers for vector data and tuple type argument which are v0 and v1-v31, respectively. v0 is used for the first vector mask type argument and the vector mask type return value, the rest of the mask type arguments are treated as vector data type arguments. v1-v31 are also used for the vector data and tuple type return value.
+The hardware vector calling convention adds 1 argument register for argument of vector mask type and 31 argument registers for arguments of vector data type and vector tuple type which are v0 and v1-v31, respectively. v0 is used for the first argument of vector mask type and return values of vector mask type, the rest of the arguments of mask type are treated as arguments of vector data type. v1-v31 are also used for return values of vector data type and vector tuple type.
 
 Vector data type arguments have properties LMUL and NREGS, the current LMUL can be 1/8, 1/4, 1/2, 1, 2, 4, 8, the current NREGS can be 1, 2, 4, 8. For arguments with LMUL less than 1, their LMUL is treated as 1. The LMUL of the vector mask type argument is treated as 1. The NREGS property means the number of registers needed for this argument. For vector data type, NREGS is 1 when LMUL is less than 1, otherwise NREGS is equal to LMUL. If it is possible to find NREGS unused continuous vector register set starting from v1 and its first register is LMUL-aligned, use these registers to pass the argument. Otherwise, the argument is passed by reference.
 
@@ -342,6 +340,8 @@ vector tuple type arguments have the same LMUL and NREGS properties as the vecto
 
 NOTE: The vector mask type, data type and tuple type are defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here].
 
+Vector type are not allowed in struct or union.
+
 Values are returned in the same manner as the first named argument of the same type would be passed.
 
 If a function uses standard vector calling convention variant, the function needs to be annotated with `STO_RISCV_VARIANT_CC`. See xref:riscv-elf.adoc#dynamic-linking[Dynamic Linking] for more details.

From c1c6635a3685e19d3d03f7d37306861725a9878c Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Thu, 29 Jun 2023 18:27:53 +0800
Subject: [PATCH 12/21] udpate to a new proposal by kito

---
 riscv-cc.adoc | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 6187a288..d09a3751 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -104,14 +104,19 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 |===
 | Name    | ABI Mnemonic | Meaning                      | Preserved across calls?
 
-| v0-v31  |              | Temporary registers          | No
+| v0      |              | Argument register            | No
+| v1-v7   |              | Callee-saved registers       | Yes*
+| v8-v23  |              | Argument registers           | No
+| v24-v31 |              | Callee-saved registers       | Yes*
 | vl      |              | Vector length                | No
 | vtype   |              | Vector data type register    | No
 | vxrm    |              | Vector fixed-point rounding mode register    | No
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
-Vector registers are not used for passing arguments or return values; use <<Standard Vector Calling Convention Variant>> if want to passing arguments or return values in vector register.
+*: Only functions with arguments or return values of scalable vector types need to preserve these registers, other functions treat these registers as temporary registers.
+
+NOTE: The scalable vector types are defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here] which includes scalable mask type, scalable data type, and scalable tuple type.
 
 The `vxrm` and `vxsat` fields of `vcsr` are not preserved across calls and their
 values are unspecified upon entry.
@@ -163,8 +168,6 @@ the high-order XLEN bits are passed on the stack.
 Scalars wider than 2×XLEN bits are passed by reference and are replaced in the
 argument list with the address.
 
-NOTE: The vector type mentioned here refers to the type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]
-
 Aggregates whose total size is no more than XLEN bits are passed in
 a register, with the fields laid out as though they were passed in memory. If
 no register is available, the aggregate is passed on the stack.
@@ -330,21 +333,29 @@ provided they hold values no more than ABI_FLEN bits wide.
 
 === Standard Vector Calling Convention Variant
 
-This section applies only to named arguments. Variadic arguments of vector type are passed by reference.
+This section applies only to named arguments. Variadic scalable vector arguments are passed by reference.
 
-The hardware vector calling convention adds 1 argument register for argument of vector mask type and 31 argument registers for arguments of vector data type and vector tuple type which are v0 and v1-v31, respectively. v0 is used for the first argument of vector mask type and return values of vector mask type, the rest of the arguments of mask type are treated as arguments of vector data type. v1-v31 are also used for return values of vector data type and vector tuple type.
+RISC-V V extension defines an set of thirty-two scalable vector registers, v0-v31. v0 is used to pass the first scalable mask argument to a function, and to return scalable mask result from a function. v8-v23 are used to pass scalable data arguments, scalable tuple arguments and the rest scalable mask arguments to a function, and to return scalable data and scalable tuple results from a function.
 
-Vector data type arguments have properties LMUL and NREGS, the current LMUL can be 1/8, 1/4, 1/2, 1, 2, 4, 8, the current NREGS can be 1, 2, 4, 8. For arguments with LMUL less than 1, their LMUL is treated as 1. The LMUL of the vector mask type argument is treated as 1. The NREGS property means the number of registers needed for this argument. For vector data type, NREGS is 1 when LMUL is less than 1, otherwise NREGS is equal to LMUL. If it is possible to find NREGS unused continuous vector register set starting from v1 and its first register is LMUL-aligned, use these registers to pass the argument. Otherwise, the argument is passed by reference.
+If a function takes at least one argument in scalable vector registers, or if it returns results in such registers, it must ensure that the entire contents of v1-v7 and v24-v31 are preserved across the call. In other cases it no need to preserve.
 
-vector tuple type arguments have the same LMUL and NREGS properties as the vector data type, but also have the NF property. NREGS equals NF multiplied by LMUL, but cannot exceed 8. The process of finding the argument registers is the same as for the vector data type.
+Each scalable data and tuple type has an LMUL attribute that indicates a vector register group. The value of LMUL indicates the number of vector registers in the vector register group and requires the first vector register number in the vector register group must be a multiple of it. For example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be allocated to this type, but v9-v16 can not because the v9 register number is not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a scalable mask type, its LMUL is 1.
 
-NOTE: The vector mask type, data type and tuple type are defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here].
+Each scalable tuple type also has a NFIELDS attribute, which indicates how many vector register groups the type contains. Thus a scalable tuple type takes up LMUL×NFIELDS registers.
 
-Vector type are not allowed in struct or union.
+The rules for passing the scalable vector argument are as follows:
 
-Values are returned in the same manner as the first named argument of the same type would be passed.
+1. For the first scalable mask parameter, use v0 to pass it. The argument has now been allocated.
+
+2. For scalable data argument or rest scalable mask argument, if a vector register group can be found from v8 to v23 that has not yet been allocated and the first register number is a multiple of LMUL, then allocate the vector register group to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
 
-If a function uses standard vector calling convention variant, the function needs to be annotated with `STO_RISCV_VARIANT_CC`. See xref:riscv-elf.adoc#dynamic-linking[Dynamic Linking] for more details.
+3. For scalable tuple argument, if NFIELDS vector register groups can be found from v8 to v23 that have not yet been allocated, and the first register number is a multiple of LMUL, then allocate these vector register groups to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
+
+NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous scalable vector argument. Therefore, it is possible that the vector register number allocated to a scalable vector argument can be less than the vector register number allocated to previous scalable vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`.
+
+For now, vector types are not allowed in struct or union.
+
+Values are returned in the same manner as the first named argument of the same type would be passed.
 
 === ILP32E Calling Convention
 

From 21b9592e8aff769f57bd321b8d53e720d8582ba5 Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Thu, 29 Jun 2023 18:44:14 +0800
Subject: [PATCH 13/21] rule for unprototype function

---
 riscv-cc.adoc | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index d09a3751..fa130a5e 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -335,6 +335,8 @@ provided they hold values no more than ABI_FLEN bits wide.
 
 This section applies only to named arguments. Variadic scalable vector arguments are passed by reference.
 
+For now, vector types are not allowed in struct or union.
+
 RISC-V V extension defines an set of thirty-two scalable vector registers, v0-v31. v0 is used to pass the first scalable mask argument to a function, and to return scalable mask result from a function. v8-v23 are used to pass scalable data arguments, scalable tuple arguments and the rest scalable mask arguments to a function, and to return scalable data and scalable tuple results from a function.
 
 If a function takes at least one argument in scalable vector registers, or if it returns results in such registers, it must ensure that the entire contents of v1-v7 and v24-v31 are preserved across the call. In other cases it no need to preserve.
@@ -353,10 +355,10 @@ The rules for passing the scalable vector argument are as follows:
 
 NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous scalable vector argument. Therefore, it is possible that the vector register number allocated to a scalable vector argument can be less than the vector register number allocated to previous scalable vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`.
 
-For now, vector types are not allowed in struct or union.
-
 Values are returned in the same manner as the first named argument of the same type would be passed.
 
+Scalable vector arguments and return values cannot be passed to an unprototyped function.
+
 === ILP32E Calling Convention
 
 IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the

From 0437e69d516a37868a3ce6aa030ef020ab9ad6af Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Thu, 29 Jun 2023 18:53:54 +0800
Subject: [PATCH 14/21] add some note

---
 riscv-cc.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index fa130a5e..0540d1fd 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -353,7 +353,7 @@ The rules for passing the scalable vector argument are as follows:
 
 3. For scalable tuple argument, if NFIELDS vector register groups can be found from v8 to v23 that have not yet been allocated, and the first register number is a multiple of LMUL, then allocate these vector register groups to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
 
-NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous scalable vector argument. Therefore, it is possible that the vector register number allocated to a scalable vector argument can be less than the vector register number allocated to previous scalable vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`.
+NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous scalable vector argument. Therefore, it is possible that the vector register number allocated to a scalable vector argument can be less than the vector register number allocated to previous scalable vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`. This approach allows more vector registers to be allocated to arguments in some cases.
 
 Values are returned in the same manner as the first named argument of the same type would be passed.
 

From 2c7a11f6c0f1f8c4a6b63cdebf74d5b77bf4a3c2 Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Thu, 29 Jun 2023 23:09:39 +0800
Subject: [PATCH 15/21] adjust some description

---
 riscv-cc.adoc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 0540d1fd..14432bc8 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -345,13 +345,13 @@ Each scalable data and tuple type has an LMUL attribute that indicates a vector
 
 Each scalable tuple type also has a NFIELDS attribute, which indicates how many vector register groups the type contains. Thus a scalable tuple type takes up LMUL×NFIELDS registers.
 
-The rules for passing the scalable vector argument are as follows:
+The rules for passing scalable vector arguments are as follows:
 
 1. For the first scalable mask parameter, use v0 to pass it. The argument has now been allocated.
 
-2. For scalable data argument or rest scalable mask argument, if a vector register group can be found from v8 to v23 that has not yet been allocated and the first register number is a multiple of LMUL, then allocate the vector register group to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
+2. For scalable data arguments or rest scalable mask arguments, starting from the v8 register, if a vector register group between v8-v23 that has not been allocated can be found and the first register number is a multiple of LMUL, then allocate this vector register group to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
 
-3. For scalable tuple argument, if NFIELDS vector register groups can be found from v8 to v23 that have not yet been allocated, and the first register number is a multiple of LMUL, then allocate these vector register groups to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
+3. For scalable tuple arguments, starting from the v8 register, if NFIELDS consecutive vector register groups between v8-v23 that have not been allocated can be found and the first register number is a multiple of LMUL, then allocate these vector register groups to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
 
 NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous scalable vector argument. Therefore, it is possible that the vector register number allocated to a scalable vector argument can be less than the vector register number allocated to previous scalable vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`. This approach allows more vector registers to be allocated to arguments in some cases.
 

From b89beab3e010492c1bbae82519a5cc9e97684197 Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Fri, 30 Jun 2023 14:43:01 +0800
Subject: [PATCH 16/21] adjust some description

---
 riscv-cc.adoc | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 14432bc8..e4c45430 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -114,9 +114,7 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
-*: Only functions with arguments or return values of scalable vector types need to preserve these registers, other functions treat these registers as temporary registers.
-
-NOTE: The scalable vector types are defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here] which includes scalable mask type, scalable data type, and scalable tuple type.
+*: Only functions using vector registers to pass arguments or return values need to preserve these registers, other functions treat these registers as temporary registers.
 
 The `vxrm` and `vxsat` fields of `vcsr` are not preserved across calls and their
 values are unspecified upon entry.
@@ -333,25 +331,27 @@ provided they hold values no more than ABI_FLEN bits wide.
 
 === Standard Vector Calling Convention Variant
 
-This section applies only to named arguments. Variadic scalable vector arguments are passed by reference.
+The https://github.com/riscv/riscv-v-spec[RISC-V V Vector Extension] defines a set of thirty-two scalable vector registers, v0-v31. The https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[RISC-V V Vector Extension Intrinsics] defines scalable vector types which include scalable mask types, scalable data types, and scalable tuple types. A scalable vector value can be stored in vector register groups.
+
+The remainder of this section applies only to named arguments. Variadic arguments of scalable vector type are passed by reference.
 
-For now, vector types are not allowed in struct or union.
+For now, scalable vector types are not allowed in struct or union.
 
-RISC-V V extension defines an set of thirty-two scalable vector registers, v0-v31. v0 is used to pass the first scalable mask argument to a function, and to return scalable mask result from a function. v8-v23 are used to pass scalable data arguments, scalable tuple arguments and the rest scalable mask arguments to a function, and to return scalable data and scalable tuple results from a function.
+v0 is used to pass the first scalable mask argument to a function, and to return scalable mask result from a function. v8-v23 are used to pass scalable data arguments, scalable tuple arguments and the rest scalable mask arguments to a function, and to return scalable data and scalable tuple results from a function.
 
 If a function takes at least one argument in scalable vector registers, or if it returns results in such registers, it must ensure that the entire contents of v1-v7 and v24-v31 are preserved across the call. In other cases it no need to preserve.
 
 Each scalable data and tuple type has an LMUL attribute that indicates a vector register group. The value of LMUL indicates the number of vector registers in the vector register group and requires the first vector register number in the vector register group must be a multiple of it. For example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be allocated to this type, but v9-v16 can not because the v9 register number is not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a scalable mask type, its LMUL is 1.
 
-Each scalable tuple type also has a NFIELDS attribute, which indicates how many vector register groups the type contains. Thus a scalable tuple type takes up LMUL×NFIELDS registers.
+Each scalable tuple type also has an NFIELDS attribute that indicates how many vector register groups the type contains. Thus a scalable tuple type needs to take up LMUL×NFIELDS registers.
 
 The rules for passing scalable vector arguments are as follows:
 
-1. For the first scalable mask parameter, use v0 to pass it. The argument has now been allocated.
+1. For the first scalable mask argument, use v0 to pass it. The argument has now been allocated.
 
-2. For scalable data arguments or rest scalable mask arguments, starting from the v8 register, if a vector register group between v8-v23 that has not been allocated can be found and the first register number is a multiple of LMUL, then allocate this vector register group to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
+2. For scalable data arguments or rest scalable mask arguments, starting from the v8 register, if a vector register group between v8-v23 that has not been allocated can be found and the first register number is a multiple of LMUL, then allocate this vector register group to the argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
 
-3. For scalable tuple arguments, starting from the v8 register, if NFIELDS consecutive vector register groups between v8-v23 that have not been allocated can be found and the first register number is a multiple of LMUL, then allocate these vector register groups to this argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
+3. For scalable tuple arguments, starting from the v8 register, if NFIELDS consecutive vector register groups between v8-v23 that have not been allocated can be found and the first register number is a multiple of LMUL, then allocate these vector register groups to the argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
 
 NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous scalable vector argument. Therefore, it is possible that the vector register number allocated to a scalable vector argument can be less than the vector register number allocated to previous scalable vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`. This approach allows more vector registers to be allocated to arguments in some cases.
 

From 4724a1c7b599809a8222b92a4b5fe38b06e2c7fd Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Tue, 4 Jul 2023 11:30:22 +0800
Subject: [PATCH 17/21] address some comments

---
 riscv-cc.adoc | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index e4c45430..4fe3f7ef 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -99,22 +99,34 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 
 === Vector Register Convention
 
-.Vector register convention
+.Vector register convention for standard calling convention
+[%autowidth]
+|===
+| Name    | ABI Mnemonic | Meaning                      | Preserved across calls?
+
+| v0-v31  |              | Temporary registers          | No
+| vl      |              | Vector length                | No
+| vtype   |              | Vector data type register    | No
+| vxrm    |              | Vector fixed-point rounding mode register    | No
+| vxsat   |              | Vector fixed-point saturation flag register  | No
+|===
+
+.Vector register convention for standard vector calling convention variant
 [%autowidth]
 |===
 | Name    | ABI Mnemonic | Meaning                      | Preserved across calls?
 
 | v0      |              | Argument register            | No
-| v1-v7   |              | Callee-saved registers       | Yes*
+| v1-v7   |              | Callee-saved registers       | Yes
 | v8-v23  |              | Argument registers           | No
-| v24-v31 |              | Callee-saved registers       | Yes*
+| v24-v31 |              | Callee-saved registers       | Yes
 | vl      |              | Vector length                | No
 | vtype   |              | Vector data type register    | No
 | vxrm    |              | Vector fixed-point rounding mode register    | No
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
-*: Only functions using vector registers to pass arguments or return values need to preserve these registers, other functions treat these registers as temporary registers.
+Please refer to the <<Standard Vector Calling Convention Variant>> section for more details about standard vector calling convention variant.
 
 The `vxrm` and `vxsat` fields of `vcsr` are not preserved across calls and their
 values are unspecified upon entry.
@@ -128,8 +140,7 @@ Any procedure that does explicitly write `vstart` to a nonzero value must zero
 
 == Procedure Calling Convention
 
-This chapter defines standard calling conventions, and describes how to pass
-parameters and return values.
+This chapter defines standard calling conventions and standard calling convention variants and describes how to pass arguments and return values.
 
 Functions must follow the register convention defined in calling convention: the
 contents of any register without specifying it as an argument register
@@ -333,7 +344,7 @@ provided they hold values no more than ABI_FLEN bits wide.
 
 The https://github.com/riscv/riscv-v-spec[RISC-V V Vector Extension] defines a set of thirty-two scalable vector registers, v0-v31. The https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[RISC-V V Vector Extension Intrinsics] defines scalable vector types which include scalable mask types, scalable data types, and scalable tuple types. A scalable vector value can be stored in vector register groups.
 
-The remainder of this section applies only to named arguments. Variadic arguments of scalable vector type are passed by reference.
+The remainder of this section applies only to named scalable vector arguments, other named arguments and return values follow the standard calling convention. Variadic scalable vector arguments are passed by reference.
 
 For now, scalable vector types are not allowed in struct or union.
 
@@ -341,7 +352,7 @@ v0 is used to pass the first scalable mask argument to a function, and to return
 
 If a function takes at least one argument in scalable vector registers, or if it returns results in such registers, it must ensure that the entire contents of v1-v7 and v24-v31 are preserved across the call. In other cases it no need to preserve.
 
-Each scalable data and tuple type has an LMUL attribute that indicates a vector register group. The value of LMUL indicates the number of vector registers in the vector register group and requires the first vector register number in the vector register group must be a multiple of it. For example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be allocated to this type, but v9-v16 can not because the v9 register number is not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a scalable mask type, its LMUL is 1.
+Each scalable data type and scalable tuple type has an LMUL attribute that indicates a vector register group. The value of LMUL indicates the number of vector registers in the vector register group and requires the first vector register number in the vector register group must be a multiple of it. For example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be allocated to this type, but v9-v16 can not because the v9 register number is not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a scalable mask type, its LMUL is 1.
 
 Each scalable tuple type also has an NFIELDS attribute that indicates how many vector register groups the type contains. Thus a scalable tuple type needs to take up LMUL×NFIELDS registers.
 

From 814a846f6894587702fc8fbb84bce3b4fd8eecd1 Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Wed, 5 Jul 2023 17:29:56 +0800
Subject: [PATCH 18/21] address some comments by Kito

---
 riscv-cc.adoc | 110 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 81 insertions(+), 29 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 4fe3f7ef..a6a07e5e 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -126,7 +126,8 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
-Please refer to the <<Standard Vector Calling Convention Variant>> section for more details about standard vector calling convention variant.
+Please refer to the <<Standard Vector Calling Convention Variant>> section for
+more details about standard vector calling convention variant.
 
 The `vxrm` and `vxsat` fields of `vcsr` are not preserved across calls and their
 values are unspecified upon entry.
@@ -140,7 +141,8 @@ Any procedure that does explicitly write `vstart` to a nonzero value must zero
 
 == Procedure Calling Convention
 
-This chapter defines standard calling conventions and standard calling convention variants and describes how to pass arguments and return values.
+This chapter defines standard calling conventions and standard calling
+convention variants, and describes how to pass arguments and return values.
 
 Functions must follow the register convention defined in calling convention: the
 contents of any register without specifying it as an argument register
@@ -342,33 +344,73 @@ provided they hold values no more than ABI_FLEN bits wide.
 
 === Standard Vector Calling Convention Variant
 
-The https://github.com/riscv/riscv-v-spec[RISC-V V Vector Extension] defines a set of thirty-two scalable vector registers, v0-v31. The https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[RISC-V V Vector Extension Intrinsics] defines scalable vector types which include scalable mask types, scalable data types, and scalable tuple types. A scalable vector value can be stored in vector register groups.
-
-The remainder of this section applies only to named scalable vector arguments, other named arguments and return values follow the standard calling convention. Variadic scalable vector arguments are passed by reference.
-
-For now, scalable vector types are not allowed in struct or union.
-
-v0 is used to pass the first scalable mask argument to a function, and to return scalable mask result from a function. v8-v23 are used to pass scalable data arguments, scalable tuple arguments and the rest scalable mask arguments to a function, and to return scalable data and scalable tuple results from a function.
-
-If a function takes at least one argument in scalable vector registers, or if it returns results in such registers, it must ensure that the entire contents of v1-v7 and v24-v31 are preserved across the call. In other cases it no need to preserve.
-
-Each scalable data type and scalable tuple type has an LMUL attribute that indicates a vector register group. The value of LMUL indicates the number of vector registers in the vector register group and requires the first vector register number in the vector register group must be a multiple of it. For example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be allocated to this type, but v9-v16 can not because the v9 register number is not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a scalable mask type, its LMUL is 1.
-
-Each scalable tuple type also has an NFIELDS attribute that indicates how many vector register groups the type contains. Thus a scalable tuple type needs to take up LMUL×NFIELDS registers.
-
-The rules for passing scalable vector arguments are as follows:
-
-1. For the first scalable mask argument, use v0 to pass it. The argument has now been allocated.
-
-2. For scalable data arguments or rest scalable mask arguments, starting from the v8 register, if a vector register group between v8-v23 that has not been allocated can be found and the first register number is a multiple of LMUL, then allocate this vector register group to the argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
-
-3. For scalable tuple arguments, starting from the v8 register, if NFIELDS consecutive vector register groups between v8-v23 that have not been allocated can be found and the first register number is a multiple of LMUL, then allocate these vector register groups to the argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated.
-
-NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous scalable vector argument. Therefore, it is possible that the vector register number allocated to a scalable vector argument can be less than the vector register number allocated to previous scalable vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`. This approach allows more vector registers to be allocated to arguments in some cases.
-
-Values are returned in the same manner as the first named argument of the same type would be passed.
-
-Scalable vector arguments and return values cannot be passed to an unprototyped function.
+The _RISC-V V Vector Extension_<<riscv-v-extension>> defines a set of thirty-two
+vector registers, v0-v31. The _RISC-V Vector Extension Intrinsic
+Document_<<rvv-intrinsic-doc>> defines vector types which include vector mask
+types, vector data types, and vector tuple types. A value of vector type can be
+stored in vector register groups.
+
+The remainder of this section applies only to named vector arguments, other
+named arguments and return values follow the standard calling convention.
+Variadic vector arguments are passed by reference.
+
+v0 is used to pass the first vector mask argument to a function, and to return
+vector mask result from a function. v8-v23 are used to pass vector data
+arguments, vector tuple arguments and the rest vector mask arguments to a
+function, and to return vector data and vector tuple results from a function.
+
+It must ensure that the entire contents of v1-v7 and v24-v31 are preserved
+across the call.
+
+Each vector data type and vector tuple type has an LMUL attribute that
+indicates a vector register group. The value of LMUL indicates the number of
+vector registers in the vector register group and requires the first vector
+register number in the vector register group must be a multiple of it. For
+example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
+allocated to this type, but v9-v16 can not because the v9 register number is
+not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
+vector mask type, its LMUL is 1.
+
+Each vector tuple type also has an NFIELDS attribute that indicates how many
+vector register groups the type contains. Thus a vector tuple type needs to
+take up LMUL×NFIELDS registers.
+
+The rules for passing vector arguments are as follows:
+
+1. For the first vector mask argument, use v0 to pass it. The argument has now
+been allocated.
+
+2. For vector data arguments or rest vector mask arguments, starting from the
+v8 register, if a vector register group between v8-v23 that has not been
+allocated can be found and the first register number is a multiple of LMUL,
+then allocate this vector register group to the argument and mark these
+registers as allocated. Otherwise, pass it by reference. The argument has now
+been allocated.
+
+3. For vector tuple arguments, starting from the v8 register, if NFIELDS
+consecutive vector register groups between v8-v23 that have not been allocated
+can be found and the first register number is a multiple of LMUL, then allocate
+these vector register groups to the argument and mark these registers as
+allocated. Otherwise, pass it by reference. The argument has now been allocated.
+
+NOTE: It should be stressed that the search for the appropriate vector register
+groups starts at v8 each time and does not start at the next register after the
+registers are allocated for the previous vector argument. Therefore, it is
+possible that the vector register number allocated to a vector argument can be
+less than the vector register number allocated to previous vector arguments.
+For example, for the function
+`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
+of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
+and v9 will be allocated to `c`. This approach allows more vector registers to
+be allocated to arguments in some cases.
+
+Vector values are returned in the same manner as the first named argument of
+the same type would be passed.
+
+Vector types are disallowed in struct or union.
+
+Vector arguments and return values are disallowed to pass to an unprototyped
+function.
 
 === ILP32E Calling Convention
 
@@ -596,3 +638,13 @@ The following definitions apply for all ABIs defined in this document. Here
 there is no differentiation between ILP32 and LP64 ABIs.
 
 `wchar_t` is signed.  `wint_t` is unsigned.
+
+[bibliography]
+== References
+
+* [[[riscv-v-extension]]] "RISC-V V vector extension specification"
+https://github.com/riscv/riscv-v-spec
+
+* [[[rvv-intrinsic-doc]]] "RISC-V Vector Extension Intrinsic Document"
+https://github.com/riscv-non-isa/rvv-intrinsic-doc
+

From 07b4a5a99f9127654a4de32de001dfb43402e7ca Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Mon, 10 Jul 2023 09:54:29 +0800
Subject: [PATCH 19/21] add the scope of application of vector cc

---
 riscv-cc.adoc | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index a6a07e5e..73551184 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -111,7 +111,7 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
-.Vector register convention for standard vector calling convention variant
+.Vector register convention for standard vector calling convention variant*
 [%autowidth]
 |===
 | Name    | ABI Mnemonic | Meaning                      | Preserved across calls?
@@ -126,6 +126,11 @@ duration in accordance with C11 section 7.6 "Floating-point environment
 | vxsat   |              | Vector fixed-point saturation flag register  | No
 |===
 
+*: Functions that use vector registers to pass arguments and return values must
+follow this calling convention. Some programming languages can require extra
+functions to follow this calling convention (e.g. C/C++ functions with
+attribute `riscv_vector_cc`).
+
 Please refer to the <<Standard Vector Calling Convention Variant>> section for
 more details about standard vector calling convention variant.
 

From 0283e389ac4d7b0934dac31ba1cac108b65f6b0b Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Wed, 6 Sep 2023 10:23:10 +0800
Subject: [PATCH 20/21] Rename vector tuple argument to tuple vector data
 argument and add a note

---
 riscv-cc.adoc | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 73551184..c959efae 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -352,8 +352,8 @@ provided they hold values no more than ABI_FLEN bits wide.
 The _RISC-V V Vector Extension_<<riscv-v-extension>> defines a set of thirty-two
 vector registers, v0-v31. The _RISC-V Vector Extension Intrinsic
 Document_<<rvv-intrinsic-doc>> defines vector types which include vector mask
-types, vector data types, and vector tuple types. A value of vector type can be
-stored in vector register groups.
+types, vector data types, and tuple vector data types. A value of vector type can
+be stored in vector register groups.
 
 The remainder of this section applies only to named vector arguments, other
 named arguments and return values follow the standard calling convention.
@@ -361,7 +361,7 @@ Variadic vector arguments are passed by reference.
 
 v0 is used to pass the first vector mask argument to a function, and to return
 vector mask result from a function. v8-v23 are used to pass vector data
-arguments, vector tuple arguments and the rest vector mask arguments to a
+arguments, tuple vector data arguments and the rest vector mask arguments to a
 function, and to return vector data and vector tuple results from a function.
 
 It must ensure that the entire contents of v1-v7 and v24-v31 are preserved
@@ -392,12 +392,18 @@ then allocate this vector register group to the argument and mark these
 registers as allocated. Otherwise, pass it by reference. The argument has now
 been allocated.
 
-3. For vector tuple arguments, starting from the v8 register, if NFIELDS
+3. For tuple vector data arguments, starting from the v8 register, if NFIELDS
 consecutive vector register groups between v8-v23 that have not been allocated
 can be found and the first register number is a multiple of LMUL, then allocate
 these vector register groups to the argument and mark these registers as
 allocated. Otherwise, pass it by reference. The argument has now been allocated.
 
+NOTE: The registers assigned to the tuple vector data argument must be
+consecutive. For example, for the function
+`void foo(vint32m1_t a, vint32m2_t b, vint32m1x2_t c)`, v8 will be allocated
+to `a`, v10-v11 will be allocated to `b`, v12-v13 instead of v9 and v12 will
+beallocated to `c`.
+
 NOTE: It should be stressed that the search for the appropriate vector register
 groups starts at v8 each time and does not start at the next register after the
 registers are allocated for the previous vector argument. Therefore, it is

From 0fc69ed227b7250ca78e84f3b1886a25a8b91c4e Mon Sep 17 00:00:00 2001
From: Lehua Ding <lehua.ding@rivai.ai>
Date: Fri, 22 Dec 2023 16:26:44 +0800
Subject: [PATCH 21/21] Address Kito's Comments

---
 riscv-cc.adoc | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index c959efae..efe03e2a 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -382,21 +382,21 @@ take up LMUL×NFIELDS registers.
 
 The rules for passing vector arguments are as follows:
 
-1. For the first vector mask argument, use v0 to pass it. The argument has now
-been allocated.
+1. For the first vector mask argument, use v0 to pass it.
 
 2. For vector data arguments or rest vector mask arguments, starting from the
 v8 register, if a vector register group between v8-v23 that has not been
 allocated can be found and the first register number is a multiple of LMUL,
 then allocate this vector register group to the argument and mark these
-registers as allocated. Otherwise, pass it by reference. The argument has now
-been allocated.
+registers as allocated. Otherwise, pass it by reference and are replaced in
+the argument list with the address.
 
 3. For tuple vector data arguments, starting from the v8 register, if NFIELDS
 consecutive vector register groups between v8-v23 that have not been allocated
 can be found and the first register number is a multiple of LMUL, then allocate
 these vector register groups to the argument and mark these registers as
-allocated. Otherwise, pass it by reference. The argument has now been allocated.
+allocated. Otherwise, pass it by reference and are replaced in the argument list
+with the address.
 
 NOTE: The registers assigned to the tuple vector data argument must be
 consecutive. For example, for the function
@@ -423,6 +423,14 @@ Vector types are disallowed in struct or union.
 Vector arguments and return values are disallowed to pass to an unprototyped
 function.
 
+NOTE: Functions that use the standard vector calling convention variant must be
+marked with `STO_RISCV_VARIANT_CC`, see <<Dynamic Linking>> for the meaning of
+`STO_RISCV_VARIANT_CC`.
+
+NOTE: `setjmp`/`longjmp` follow the standard calling convention, which clobbers
+all vector registers. Hence, the standard vector calling convention variant
+won't disrupt the `jmp_buf` ABI.
+
 === ILP32E Calling Convention
 
 IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the