Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where in the code uses "immediate eviction" and "fetched from L2 cache"?? #8

Open
ziyuhuang123 opened this issue Jan 24, 2024 · 2 comments

Comments

@ziyuhuang123
Copy link

Hi! I find your repo very interesting and I gave it a star without hesitation! I am also learning L2 cache recently, so I wonder where it uses "immediate eviction" and "fetched from L2 cache"?? I guess it has relation with discard_memory or L2 persistent API?

Thank you!!

By the way, you mentioned you use ncu to perform and analyze it, also interested how that is done. Maybe you could publish a top conference paper!

@efrantar
Copy link
Member

Hi, the L2 cache is used implicitly whenever global memory is fetched; the immediate eviction cache policy for weight loads is defined here. The key is that we want to reuse activations (which we need to load many times) in L2 cache, but don't care about weights as they are only accessed exactly once.

We are considering a write-up of this work, however I am currently very busy, so this may take quite a while.

@ziyuhuang123
Copy link
Author

I see! Is it possible to use L2 cache better? I know there is an API mentioned here. But I can not find out a way to use it well.... I mean, maybe some random access will squeeze out the useful data in L2? What do you think? Thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants