Overall latency estimation #273

Jvafaei · 2024-07-10T17:18:34Z

I have a question regarding how Timeloop calculates the overall execution latency of a layer. In the tutorial videos, it is mentioned that the latency is calculated based on a pipelined concept, though this is not very clear to me.

I would appreciate it if you could explain the latency estimation process for a simple architecture (DRAM + Global Buffer + (1 PE + RF)).

So far, I have noticed that we have latencies in data movements between different memory hierarchies and latencies in computing MACs in each cycle, but I have no clear idea how the overall latency is estimated based on these different latency values and why this estimation is rational.

Thanks.

angshuman-parashar · 2024-07-11T15:04:08Z

Timeloop first computes the amount of data (or compute) that must move through each level of the hierarchy for that specific mapping.

As a cartoon example, let's say that for a given mapping we need to move 100 bytes from DRAM->GB, 1000 bytes from GB->RF and perform 10000 computations.

Now lets say our cartoon architecture's DRAM->GB bandwidth is 5 bytes per clock, GB->RF is 20 bytes per clock, and we have 1000 parallel MACs in the PE.

Timeloop will compute the DRAM->GB link as needing 100/5 = 20 clock cycles total to move all the data, 1000/20 = 50 clock cycles for the GB->RF link, and 10000/1000 = 10 clock cycles for the MACs to do their work.

This means the GB->RF link is the bottleneck, so the overall execution time will be reported as 50 clock cycles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overall latency estimation #273

Overall latency estimation #273

Jvafaei commented Jul 10, 2024

angshuman-parashar commented Jul 11, 2024

Overall latency estimation #273

Overall latency estimation #273

Comments

Jvafaei commented Jul 10, 2024

angshuman-parashar commented Jul 11, 2024