Skip to content

mtravaillard/rv_tracer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rv_tracer

SHL-0.51 license

rv_tracer is developed as part of the PULP project, a joint effort between ETH Zurich and the University of Bologna.

License

Unless specified otherwise in the respective file headers, all code checked into this repository is made available under a permissive license. All hardware sources and tool scripts are licensed under the Solderpad Hardware License 0.51 (see LICENSE) or compatible licenses, except for files contained in the img directory, which are licensed under the Creative Commons Attribution-NoDerivates 4.0 International license (CC BY-ND 4.0). The file generate_do.py is licensed under Apache 2.0 license.

Short summary

Supported features

Feature Implemented
Branch Trace
Delta address mode
Full address mode
Implicit exception mode
Sequentially inferable jump mode
Implicit return mode
Branch prediction mode
Jump target cache mode
Instruction Trace Interface
Single-retirement
Multiple-retirement
Trigger unit
Data Trace
Instruction Trace Output Packets
Time
Context

Testing progress

Module Tested
rv_tracer
te_branch_map
te_filter
te_packet_emitter ✅ (no format 0 packets)
te_priority
te_reg
te_resync_counter

How to run it

After downloading the repo, move inside the repo:

cd rv_tracer

Then, run the simulation:

make run

Between one run and the other launch the command:

make clean

Design

The modules defined are the following:

  • te_reg: stores configuration data, produces clock signal for the other modules;
  • te_resync_counter: counts packets emitted or clock cycles and asks for a resynchronization packet;
  • te_branch_map: counts the branches and keeps track if they were taken or not;
  • te_filter: declares input blocks as "qualified" - they can be processed by the next modules - based on user-defined values read from te_reg;
  • te_priority: determines which packet needs to be emitted and performs a simple compression on the address that is put inside the payload;
  • te_packet_emitter: populates the payload and can reset both the te_resync_counter and te_branch_map.

High level rv_tracer architecture

Figure 1: high level rv_tracer architecture

te_reg

This module stores the user and non-user definable parameters.

The user-definable parameters are the following:

  • trace_activated: determines if the encoder is waiting for a first block to start tracing;
  • nocontext: determines if the optional context input is used or not;
  • notime: determines if the optional time input is used or not;
  • encoder_mode: determines if the encoder is doing branch tracing or another type of tracing technique (currently only branch trace mode is supported);
  • configuration: determines in which of the 7 modes available in the E-Trace specification v2.0.3 the encoder is operating. Currently, only delta_address (the address in the payload is expressed as a difference between the current address and the last one) and full_address (the address in the payload is the current one) modes are implemented;
  • lossless_trace: determines if the TE stalls the hart if there is back-pressure from the transport interface;
  • shallow_trace: determines if the branch map content has to be flushed after each packet emitted or only after a packet containing the branch map;
  • All the parameters necessary for the te_filter module to perform filtering.

The clock_gated signal depends on the trace_activated signal. It is connected to all the other modules to have a different clock domain and to switch off the parts of the encoder when they are not necessary.

te_reg internal architecture

Figure 2: te_reg internal architecture

APB Protocol

To access the registers, an APB interface is required. This protocol was selected for its ease of use and the minimal amount of data involved in reading or writing operations.

In the first place, the two actors of this protocol must be defined:

  • Requester: The one that requires an APB transfer, usually an APB Bridge between the CPU and the peripherals.
  • Completer: The one that receives a request for an APB transfer.

The following table shows which signals are necessary to perform an access via APB.

Signal Name Source Width Description
PCLK Clock 1 Clock. PCLK is a clock signal. All APB signals are timed against the rising edge of PCLK.
PADDR Requester ADDR_WIDTH Address. PADDR is the APB address bus. PADDR can be up to 32 bits wide.
PSELx Requester 1 Select. The Requester generates a PSELx signal for each Completer. PSELx indicates that the Completer is selected and that a data transfer is required.
PENABLE Requester 1 Enable. PENABLE indicates the second and subsequent cycles of an APB transfer.
PWRITE Requester 1 Direction. PWRITE indicates an APB write access when HIGH and an APB read access when LOW.
PWDATA Requester DATA_WIDTH Write data. The PWDATA write data bus is driven by the Requester during write cycles when PWRITE is HIGH. PWDATA can be 8 bits, 16 bits, or 32 bits wide.
PREADY Completer 1 Ready. PREADY is used to extend an APB transfer by the Completer.
PRDATA Completer 1 Read data. The PRDATA read data bus is driven by the selected Completer during read cycles when PWRITE is LOW. PRDATA can be 8 bits, 16 bits, or 32 bits wide.

Table 1: Mandatory signals for the APB protocol

The process involves two essential parts:

  1. Setup phase: The requester selects the completer for the transaction by asserting PWRITE and PSEL while setting PENABLE to 0. This signals to the completer that it will be accessed and specifies the type of transfer (read or write).
  2. Access phase: Data is transferred between the requester and the completer by setting PENABLE to 1. If the completer is ready, it asserts the PREADY signal, enabling the data transfer to occur.

APB read transfer

Figure 3: APB read transfer

APB write transfer

Figure 4: APB write transfer

Programming a single register requires selecting it using the PADDR signal from the requester. Each register is associated with an 8-bit value, which the requester must add to the base address where the completer is mapped.

32 and 64 Bits Registers

The number of available addresses varies depending on whether a 32-bit or 64-bit implementation is chosen. The APB protocol supports only a 32-bit wide data bus, requiring two transactions to access a full 64-bit register. A 64-bit register is split into two 32-bit registers: one for the most significant 32 bits and another for the least significant 32 bits.

te_branch_map

The te_branch_map module serves the purpose of keeping track of branches. It does so using a counter and an array called branch map. The counter counts the received branches, and the branch map saves whether the branches are taken or not. According to the E-Trace specification, the counter counts up to 31 branches (5 bits long), and the branch map is a 31-bit long array saving 1 when the branch is not taken and 0 when it is.

When the counter reaches the value of 31 (branch map is full), the module asserts the is_full_o signal to request the generation of the associated packet (format 1). It remains asserted until it receives the flush_i signal from the te_packet_emitter module that communicates the associated packet has been generated.

The update of both the branch counter and branch map is done in a combinatorial network. The first step loads the number of branches left from the previous cycle, then processes the branches received in the current cycle. If the branch map is full, the remaining branches are stored for the next cycle.

te_branch_map internal architecture

Figure 5: te_branch_map internal architecture

te_filter

The te_filter filters input blocks to trace specific portions of code or specific discontinuities. The E-Trace specification suggests using a pair of comparators arranged to select specific ranges for different parameters. This implementation uses one comparator per input and allows for more flexibility in filtering.

Filtering is enabled by activating the associated filter signal and setting the operative mode (range or match) for each filter. Filtering values are taken from the te_reg module.

te_filter internal architecture

Figure 6: te_filter internal architecture

te_priority

This module implements the flow chart in the E-Trace specification as a combinatorial network to determine the packet type in one clock cycle.

Flow chart determining packet type

Figure 7: flow chart determining packet type

Packets are determined based on:

  • Previous instruction
  • Current instruction
  • Next instruction

Delays are implemented using flip-flops that store input information for up to 2 cycles.

Logic to delay inputs

Figure 8: logic to delay inputs

Address compression is implemented to remove the most significant bits that are all 0s or 1s, except for the last one. For example, compressing 0000001100010 results in 01100010, which can be sign-extended to retrieve the original address.

Address compression logic internal architecture

Figure 9: address compression logic internal architecture

te_priority internal architecture

Figure 10: te_priority internal architecture

te_packet_emitter

The te_packet_emitter module populates the payload according to inputs from te_priority. Dynamic array operations are approximated via chunking, rounding the number of bits to the nearest multiple of 8. Similar operations compress the branch map.

te_packet_emitter internal architecture

Figure 11: te_packet_emitter internal architecture

te_resync_counter

This module counts emitted packets or elapsed clock cycles and signals when a resynchronization packet is required. Configurable parameters include:

  • N: max packets emitted in one cycle;
  • MODE: counts packets or clock cycles;
  • MAX_VALUE: asserts resync signals when reached.

The module can ignore packets or cycles when a resynchronization is pending.

te_resync_counter internal architecture

Figure 12: te_resync_counter internal architecture

Multiple Retirement Support

Many RISC-V cores can retire up to N instructions per cycle. To handle multiple instructions, the following inputs are replicated:

  • itype
  • iaddr
  • iretire
  • ilastsize

Multiple Retirement: Branches Only

This design replicates the te_filter module and adapts the branch map to handle up to N branches per cycle.

rv_tracer architecture for up to N branches in the same cycle

Figure 13: rv_tracer architecture for up to N branches in the same cycle

Multiple Retirement: Not Only Branches

This design replicates the te_filter, te_priority, and te_packet_emitter modules to handle up to N uninferable discontinuities.

rv_tracer architecture for up to N discontinuities in the same cycle

Figure 14: rv_tracer architecture for up to N discontinuities in the same cycle

About

RISC-V E-trace specification RTL implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • SystemVerilog 74.6%
  • Python 21.8%
  • TeX 3.0%
  • Makefile 0.6%