-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add disk cache infrastructure back with tests #351
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## master #351 +/- ##
==========================================
- Coverage 85.85% 0.00% -85.86%
==========================================
Files 24 24
Lines 2871 2680 -191
==========================================
- Hits 2465 0 -2465
- Misses 406 2680 +2274
... and 22 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
cb25a34
to
49f3f67
Compare
Without caching:
First run:
Second run:
|
In discussion with @williamfgc, maybe we shouldn't make the cache_key static, so that an application can set it at startup? I would most likely put in the git-hash of the application. |
What causes the 5s regression going from 'without cache' to 'first run'? |
We were discussing with @jpsamaroo... |
I think that would be rather hard to do. This is still a stop-gap towards proper precompilation caching support. |
Apply suggestions from code review
On Julia 1.9 and current CUDA#master with no disk-cache first compilation got a lot faster.
|
Now first run with caching:
Second run hitting the cache:
So
|
On an Oceananigans test case from time spent in setup went from 150s spent in |
|
||
!!! warning | ||
The disk cache is not automatically invalidated. It is sharded upon | ||
`cache_key` (see [`set_cache_key``](@ref)), the GPUCompiler version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`cache_key` (see [`set_cache_key``](@ref)), the GPUCompiler version | |
`cache_key` (see [`set_cache_key`](@ref)), the GPUCompiler version |
end | ||
|
||
key(ver::VersionNumber) = "$(ver.major)_$(ver.minor)_$(ver.patch)" | ||
cache_path() = @get_scratch!(cache_key() * "-kernels-" * key(VERSION)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe include "cache" in the directory name? Or make this a subdirectory of the existing compile_cache
scratch directory? That way the cache would also get wiped on reset_runtime
, which is done when recompiling CUDA.jl. Or is that unwanted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also be confusing to have compiled
in the scratch dir, containing the runtime bitcode, and cache
for compiled kernels :-) maybe cache/{runtime,jobs}
?
I know we're bikeshedding here :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way the cache would also get wiped on reset_runtime, which is done when recompiling CUDA.jl. Or is that unwanted?
I was trying to add a dependency on the version of GPUCompiler/CUDA. Cache invalidation is a big potential footgun here.
@@ -173,7 +206,18 @@ end | |||
job = CompilerJob(src, cfg) | |||
|
|||
asm = nothing | |||
# TODO: consider loading the assembly from an on-disk cache here | |||
# can we load from the disk cache? | |||
if disk_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use @static
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was frustrated by the need to recompile GPUCompiler to turn caching on and off.
I originally had it be a compile time preference which is what we would need for it to be @static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize we had non compile-time preferences...
@@ -182,6 +226,10 @@ end | |||
end | |||
|
|||
asm = compiler(job) | |||
|
|||
if disk_cache() && !isfile(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
Found by @simone-silvestri when running with a large number of nodes and a shared filesystem. |
That |
Replaced by #557 |
Using Preferences.jl instead of environment variables and split the cache on a user defined key, GPUCompiler version, and Julia version.