Skip to content

Commit 3b1f7af

Browse files
Add FP8alt, low and mixed-precision SDOTP with stochastic rounding support, and compressed vector cmp (#3)
Added support for: - FP8alt (1, 4, 3) - low and mixed-precision SDOTP with stochastic rounding support - compressed vector compare results (one bit per comparison in the LSBs) --------- Co-authored-by: Gianna Paulin <[email protected]>
1 parent 16c1d2f commit 3b1f7af

18 files changed

+2464
-114
lines changed

Bender.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,12 @@ sources:
2929
- src/fpnew_divsqrt_multi.sv
3030
- src/fpnew_fma.sv
3131
- src/fpnew_fma_multi.sv
32+
- src/fpnew_sdotp_multi.sv
33+
- src/fpnew_sdotp_multi_wrapper.sv
3234
- src/fpnew_noncomp.sv
3335
- src/fpnew_opgroup_block.sv
3436
- src/fpnew_opgroup_fmt_slice.sv
3537
- src/fpnew_opgroup_multifmt_slice.sv
3638
- src/fpnew_rounding.sv
39+
- src/lfsr_sr.sv
3740
- src/fpnew_top.sv

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,23 @@ If you use FPnew in your work, you can cite us:
165165
}
166166
```
167167

168+
If you use FPnew SDOTP in your work, you can cite us:
169+
170+
<details>
171+
<summary>SDOTP Publication</summary>
172+
<p>
173+
174+
```
175+
@inproceedings{bertaccini2022minifloat,
176+
title={MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V Cores},
177+
author={Bertaccini, Luca and Paulin, Gianna and Fischer, Tim and Mach, Stefan and Benini, Luca},
178+
booktitle={2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)},
179+
pages={1--8},
180+
year={2022},
181+
organization={IEEE}
182+
}
183+
```
184+
168185
</p>
169186
</details>
170187

docs/CHANGELOG-PULP.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
6+
7+
In this sense, we interpret the "Public API" of a hardware module as its port/parameter list.
8+
Versions of the IP in the same major relase are "pin-compatible" with each other. Minor relases are permitted to add new parameters as long as their default bindings ensure backwards compatibility.
9+
10+
## [0.1.0] - 2023-05-04
11+
12+
### Added
13+
- Add low and mixed-precision SDOTP with support for stochastic rounding
14+
- Add `FP8alt (1,4,3)` format
15+
- Add support for compressed vector compare results (one bit per comparison in the LSBs)

docs/CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ Versions of the IP in the same major relase are "pin-compatible" with each other
1010

1111
## [Unreleased]
1212

13+
### Added
14+
- Add support for alternative FP32-only DivSqrt unit
15+
16+
## [0.7.0] - 2023-03-20
17+
1318
### Added
1419
- Citation file `CITATION.cff`
1520
- Add support for RISC-V compliant classify in vectorial mode when the vector element width is at least 10 bits

docs/README.md

Lines changed: 53 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ For more in-depth explanations on how to configure the unit and the layout of th
4040
| `TagType` | The SystemVerilog data type of the operation tag |
4141
| `TrueSIMDClass` | If enabled, the result of a classify operation in vectorial mode will be RISC-V compliant if each output has at least 10 bits|
4242
| `EnableSIMDMask` | Enable the RISC-V floating-point status flags masking of inactive vectorial lanes. When disabled, `simd_mask_i` is inactive |
43+
| `StochasticRndImplementation` | Enable stochastic rounding support for SDOTP, define LFSR bitwidth and number of trailing bits considered for the SR decision |
44+
| `CompressedVecCmpResult` | Compress the result of a vector compare in the LSBs, conceived for RV32FD cores |
4345

4446
### Ports
4547

@@ -50,6 +52,7 @@ As the width of some input/output signals is defined by the configuration, it is
5052
|------------------|-----------|----------------------|----------------------------------------------------------------|
5153
| `clk_i` | in | `logic` | Clock, synchronous, rising-edge triggered |
5254
| `rst_ni` | in | `logic` | Asynchronous reset, active low |
55+
| `hart_id_i` | in | `logic [31:0]` | Core ID, used only when stochastic rounding is enabled |
5356
| `operands_i` | in | `logic [2:0][W-1:0]` | Operands, henceforth referred to as `op[`*i*`]` |
5457
| `rnd_mode_i` | in | `roundmode_e` | Floating-point rounding mode |
5558
| `op_i` | in | `operation_e` | Operation select |
@@ -79,15 +82,16 @@ Default values from the package are listed.
7982

8083
Enumeration of type `logic [2:0]` holding available rounding modes, encoded for use in RISC-V cores:
8184

82-
| Enumerator | Value | Rounding Mode |
83-
|------------|----------|------------------------------------------------------|
84-
| `RNE` | `3'b000` | To nearest, tie to even (default) |
85-
| `RTZ` | `3'b001` | Toward zero |
86-
| `RDN` | `3'b010` | Toward negative infinity |
87-
| `RUP` | `3'b011` | Toward positive infinity |
88-
| `RMM` | `3'b100` | To nearest, tie away from zero |
89-
| `ROD` | `3'b101` | To odd |
90-
| `DYN` | `3'b111` | *RISC-V Dynamic RM, invalid if passed to operations* |
85+
| Enumerator | Value | Rounding Mode |
86+
|------------|----------|----------------------------------------------------------|
87+
| `RNE` | `3'b000` | To nearest, tie to even (default) |
88+
| `RTZ` | `3'b001` | Toward zero |
89+
| `RDN` | `3'b010` | Toward negative infinity |
90+
| `RUP` | `3'b011` | Toward positive infinity |
91+
| `RMM` | `3'b100` | To nearest, tie away from zero |
92+
| `ROD` | `3'b101` | To odd |
93+
| `RSR` | `3'b110` | Stochastic Rounding (available only on SDOTP operations) |
94+
| `DYN` | `3'b111` | *RISC-V Dynamic RM, invalid if passed to operations* |
9195

9296
##### `operation_e` - FP Operation
9397

@@ -104,6 +108,8 @@ Unless noted otherwise, the first operand `op[0]` is used for the operation.
104108
| `ADD` | `0` | Addition (`op[1] + op[2]`) *note the operand indices* |
105109
| `ADD` | `1` | Subtraction (`op[1] - op[2]`) *note the operand indices* |
106110
| `MUL` | `0` | Multiplication (`op[0] * op[1]`) |
111+
| `SDOTP` | `0` | Sum of dot product ) |
112+
| `VSUM` | `0` | Vector Inner Sum ) |
107113
| `DIV` | `0` | Division (`op[0] / op[1]`) |
108114
| `SQRT` | `0` | Square root |
109115
| `SGNJ` | `0` | Sign injection, operation encoded in rounding mode<br>`RNE`: `op[0]` with `sign(op[1])`<br>`RTZ`: `op[0]` with `~sign(op[1])`<br>`RDN`: `op[0]` with `sign(op[0]) ^ sign(op[1])`<br>`RUP`: `op[0]` (passthrough) |
@@ -132,10 +138,11 @@ Enumeration of type `logic [2:0]` holding the supported FP formats.
132138
| `FP16` | IEEE binary16 | 16 bit | 5 | 10 |
133139
| `FP8` | binary8 | 8 bit | 5 | 2 |
134140
| `FP16ALT` | binary16alt | 16 bit | 8 | 7 |
141+
| `FP8ALT` | binary8alt | 8 bit | 4 | 3 |
135142

136143
The following global parameters associated with FP formats are set in `fpnew_pkg`:
137144
```SystemVerilog
138-
localparam int unsigned NUM_FP_FORMATS = 5;
145+
localparam int unsigned NUM_FP_FORMATS = 6;
139146
localparam int unsigned FP_FORMAT_BITS = $clog2(NUM_FP_FORMATS);
140147
```
141148

@@ -230,7 +237,7 @@ typedef struct packed {
230237
```
231238
The fields of this struct behave as follows:
232239

233-
##### `Width` - Datapath Wdith
240+
##### `Width` - Datapath Width
234241

235242
Specifies the width of the FPU datapath and of the input and output data ports (`operands_i`/`result_o`).
236243
It must be larger or equal to the width of the widest enabled FP and integer format.
@@ -278,7 +285,7 @@ Otherwise, synthesis tools can optimize away any logic associated with this form
278285

279286
#### `Implementation` - Implementation Options
280287

281-
The FPU is divided into four operation groups, `ADDMUL`, `DIVSQRT`, `NONDOMP`, and `CONV` (see [Architecture: Top-Level](#top-level)).
288+
The FPU is divided into five operation groups, `ADDMUL`, `DIVSQRT`, `NONDOMP`, `CONV`, and `DOTP` (see [Architecture: Top-Level](#top-level)).
282289
The `Implementation` parameter controls the implementation of these operation groups.
283290
It is of type `fpu_implementation_t` which is defined as:
284291
```SystemVerilog
@@ -320,17 +327,18 @@ The unit type `unit_type_t` is an enumeration of type `logic [1:0]` holding the
320327
The `UnitTypes` parameter allows to control resources used for the FPU by either removing operation units for certain formats and operations, or merging multiple formats into one.
321328
Currently, the follwoing unit types are available for the FPU operation groups:
322329

323-
| | `ADDMUL` | `DIVSQRT` | `NONCOMP` | `CONV` |
324-
|------------|--------------------|--------------------|--------------------|--------------------|
325-
| `PARALLEL` | :heavy_check_mark: | | :heavy_check_mark: | |
326-
| `MERGED` | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: |
330+
| | `ADDMUL` | `DIVSQRT` | `NONCOMP` | `CONV` | `DOTP` |
331+
|------------|--------------------|--------------------|--------------------|--------------------|--------------------|
332+
| `PARALLEL` | :heavy_check_mark: | | :heavy_check_mark: | | |
333+
| `MERGED` | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: | :heavy_check_mark: |
327334

328335
*Default*:
329336
```SystemVerilog
330337
'{'{default: PARALLEL}, // ADDMUL
331338
'{default: MERGED}, // DIVSQRT
332339
'{default: PARALLEL}, // NONCOMP
333-
'{default: MERGED}} // CONV`
340+
'{default: MERGED}, // CONV`
341+
'{default: DISABLED}} // DOTP`
334342
```
335343
(all formats within operation group use same type)
336344

@@ -350,7 +358,33 @@ The configuration `pipe_config_t` is an enumeration of type `logic [1:0]` holdi
350358
| `INSIDE` | All registers are inserted at roughly the middle of the operational unit (if not possible, `BEFORE`) |
351359
| `DISTRIBUTED` | Registers are evenly distributed to `INSIDE`, `BEFORE`, and `AFTER` (if no `INSIDE`, all `BEFORE`) |
352360

361+
### `Stochastic Rounding Implementation`
353362

363+
The `StochasticRndImplementation` parameter is used to configure the RSR support.
364+
It is of type `rsr_impl_t` which is defined as:
365+
```SystemVerilog
366+
typedef struct packed {
367+
logic EnableRSR;
368+
int unsigned RsrPrecision;
369+
int unsigned LfsrInternalPrecision;
370+
} rsr_impl_t;
371+
```
372+
The fields of this struct behave as follows:
373+
374+
##### `EnableRSR` - Enable RSR support
375+
Enables stochastic rounding support in the `DOTP` operation group block. It instantiates an `LFSR` in the rounding module.
376+
377+
*Default*: `1'b0`
378+
379+
##### `RsrPrecision`
380+
Specifies the number of trailing bits considered for the stochastic rounding decision.
381+
382+
*Default*: `12`
383+
384+
##### `LfsrInternalPrecision`
385+
Specifies the LFSR internal bitwidth, thus controlling the pseudorandom number periodicity.
386+
387+
*Default*: `32`
354388

355389
### Adding Custom Formats
356390

@@ -391,14 +425,15 @@ The *operation group* is the highest level of grouping within FPnew and signifie
391425

392426
![FPnew](fig/top_block.png)
393427

394-
There are currently four operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:
428+
There are currently five operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:
395429

396430
| Enumerator | Description | Associated Operations |
397431
|------------|-----------------------------------------------|---------------------------------------|
398432
| `ADDMUL` | Addition and Multiplication | `FMADD`, `FNMSUB`, `ADD`, `MUL` |
399433
| `DIVSQRT` | Division and Square Root | `DIV`, `SQRT` |
400434
| `NONCOMP` | Non-Computational Operations like Comparisons | `SGNJ`, `MINMAX`, `CMP`, `CLASS` |
401435
| `CONV` | Conversions | `F2I`, `I2F`, `F2F`, `CPKAB`, `CPKCD` |
436+
| `DOTP` | Dot Products | `SDOTP`, `EXVSUM`, `VSUM` |
402437

403438
Most architectural decisions for FPnew are made at very fine granularity.
404439
The big exception to this is the generation of vectorial hardware which is decided at top level through the `EnableVectors` parameter.

src/fpnew_cast_multi.sv

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -544,11 +544,17 @@ module fpnew_cast_multi #(
544544
assign pre_round_abs = dst_is_int_q ? ifmt_pre_round_abs[int_fmt_q2] : fmt_pre_round_abs[dst_fmt_q2];
545545

546546
fpnew_rounding #(
547-
.AbsWidth ( WIDTH )
547+
.AbsWidth ( WIDTH ),
548+
.EnableRSR ( 0 )
548549
) i_fpnew_rounding (
550+
.clk_i,
551+
.rst_ni,
552+
.id_i ( '0 ),
553+
.en_rsr_i ( 1'b0 ),
549554
.abs_value_i ( pre_round_abs ),
550555
.sign_i ( input_sign_q ), // source format
551556
.round_sticky_bits_i ( round_sticky_bits ),
557+
.stochastic_rounding_bits_i ( '0 ),
552558
.rnd_mode_i ( rnd_mode_q ),
553559
.effective_subtraction_i ( 1'b0 ), // no operation happened
554560
.abs_rounded_o ( rounded_abs ),

src/fpnew_fma.sv

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -597,11 +597,17 @@ module fpnew_fma #(
597597

598598
// Perform the rounding
599599
fpnew_rounding #(
600-
.AbsWidth ( EXP_BITS + MAN_BITS )
600+
.AbsWidth ( EXP_BITS + MAN_BITS ),
601+
.EnableRSR ( 0 )
601602
) i_fpnew_rounding (
603+
.clk_i,
604+
.rst_ni,
605+
.id_i ( '0 ),
606+
.en_rsr_i ( 1'b0 ),
602607
.abs_value_i ( pre_round_abs ),
603608
.sign_i ( pre_round_sign ),
604609
.round_sticky_bits_i ( round_sticky_bits ),
610+
.stochastic_rounding_bits_i ( '0 ),
605611
.rnd_mode_i ( rnd_mode_q ),
606612
.effective_subtraction_i ( effective_subtraction_q ),
607613
.abs_rounded_o ( rounded_abs ),

src/fpnew_fma_multi.sv

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -720,11 +720,17 @@ module fpnew_fma_multi #(
720720

721721
// Perform the rounding
722722
fpnew_rounding #(
723-
.AbsWidth ( SUPER_EXP_BITS + SUPER_MAN_BITS )
723+
.AbsWidth ( SUPER_EXP_BITS + SUPER_MAN_BITS ),
724+
.EnableRSR ( 0 )
724725
) i_fpnew_rounding (
726+
.clk_i,
727+
.rst_ni,
728+
.id_i ( '0 ),
729+
.en_rsr_i ( 1'b0 ),
725730
.abs_value_i ( pre_round_abs ),
726731
.sign_i ( pre_round_sign ),
727732
.round_sticky_bits_i ( round_sticky_bits ),
733+
.stochastic_rounding_bits_i ( '0 ),
728734
.rnd_mode_i ( rnd_mode_q ),
729735
.effective_subtraction_i ( effective_subtraction_q ),
730736
.abs_rounded_o ( rounded_abs ),

src/fpnew_opgroup_block.sv

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ module fpnew_opgroup_block #(
2626
parameter fpnew_pkg::pipe_config_t PipeConfig = fpnew_pkg::BEFORE,
2727
parameter type TagType = logic,
2828
parameter int unsigned TrueSIMDClass = 0,
29+
parameter logic CompressedVecCmpResult = 0,
30+
parameter fpnew_pkg::rsr_impl_t StochasticRndImplementation = fpnew_pkg::DEFAULT_NO_RSR,
2931
// Do not change
3032
localparam int unsigned NUM_FORMATS = fpnew_pkg::NUM_FP_FORMATS,
3133
localparam int unsigned NUM_OPERANDS = fpnew_pkg::num_operands(OpGroup),
@@ -34,6 +36,7 @@ module fpnew_opgroup_block #(
3436
) (
3537
input logic clk_i,
3638
input logic rst_ni,
39+
input logic [31:0] hart_id_i,
3740
// Input signals
3841
input logic [NUM_OPERANDS-1:0][Width-1:0] operands_i,
3942
input logic [NUM_FORMATS-1:0][NUM_OPERANDS-1:0] is_boxed_i,
@@ -110,7 +113,8 @@ module fpnew_opgroup_block #(
110113
.NumPipeRegs ( FmtPipeRegs[fmt] ),
111114
.PipeConfig ( PipeConfig ),
112115
.TagType ( TagType ),
113-
.TrueSIMDClass ( TrueSIMDClass )
116+
.TrueSIMDClass ( TrueSIMDClass ),
117+
.CompressedVecCmpResult ( CompressedVecCmpResult )
114118
) i_fmt_slice (
115119
.clk_i,
116120
.rst_ni,
@@ -182,10 +186,12 @@ module fpnew_opgroup_block #(
182186
.PulpDivsqrt ( PulpDivsqrt ),
183187
.NumPipeRegs ( REG ),
184188
.PipeConfig ( PipeConfig ),
185-
.TagType ( TagType )
189+
.TagType ( TagType ),
190+
.StochasticRndImplementation ( StochasticRndImplementation )
186191
) i_multifmt_slice (
187192
.clk_i,
188193
.rst_ni,
194+
.hart_id_i,
189195
.operands_i,
190196
.is_boxed_i,
191197
.rnd_mode_i,

0 commit comments

Comments
 (0)