Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall latency estimation #273

Open
Jvafaei opened this issue Jul 10, 2024 · 1 comment
Open

Overall latency estimation #273

Jvafaei opened this issue Jul 10, 2024 · 1 comment

Comments

@Jvafaei
Copy link

Jvafaei commented Jul 10, 2024

I have a question regarding how Timeloop calculates the overall execution latency of a layer. In the tutorial videos, it is mentioned that the latency is calculated based on a pipelined concept, though this is not very clear to me.

I would appreciate it if you could explain the latency estimation process for a simple architecture (DRAM + Global Buffer + (1 PE + RF)).

So far, I have noticed that we have latencies in data movements between different memory hierarchies and latencies in computing MACs in each cycle, but I have no clear idea how the overall latency is estimated based on these different latency values and why this estimation is rational.

Thanks.

@angshuman-parashar
Copy link
Collaborator

Timeloop first computes the amount of data (or compute) that must move through each level of the hierarchy for that specific mapping.

As a cartoon example, let's say that for a given mapping we need to move 100 bytes from DRAM->GB, 1000 bytes from GB->RF and perform 10000 computations.

Now lets say our cartoon architecture's DRAM->GB bandwidth is 5 bytes per clock, GB->RF is 20 bytes per clock, and we have 1000 parallel MACs in the PE.

Timeloop will compute the DRAM->GB link as needing 100/5 = 20 clock cycles total to move all the data, 1000/20 = 50 clock cycles for the GB->RF link, and 10000/1000 = 10 clock cycles for the MACs to do their work.

This means the GB->RF link is the bottleneck, so the overall execution time will be reported as 50 clock cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants