Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to PrecompileTools.jl. #601

Merged
merged 1 commit into from
Jul 18, 2024
Merged

Switch to PrecompileTools.jl. #601

merged 1 commit into from
Jul 18, 2024

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Jul 17, 2024

Testing examples/kernel.jl without a precompile workload at all:

❯ hyperfine -w 1 " julia --project examples/kernel.jl"
Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):      8.131 s ±  0.241 s    [User: 8.139 s, System: 1.156 s]
  Range (min … max):    7.873 s …  8.375 s    10 runs

Current master:

Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):      7.732 s ±  0.085 s    [User: 7.723 s, System: 1.173 s]
  Range (min … max):    7.560 s …  7.818 s    10 runs

Before #600:

Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):      1.782 s ±  0.022 s    [User: 1.825 s, System: 1.130 s]
  Range (min … max):    1.744 s …  1.815 s    10 runs

This PR:

Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):     991.3 ms ±  14.8 ms    [User: 1045.8 ms, System: 1120.8 ms]
  Range (min … max):   968.7 ms … 1013.1 ms    10 runs

So still an improvement. Probably doesn't matter too much for downstream users, as the whole @nospecialize(job::CompilerJob) doesn't really work and I think we end up respecializing the GPUCompiler code anyway (i.e., downstream packages require their own precompile workload).

@maleadt maleadt force-pushed the tb/precompiletools branch 2 times, most recently from 66952ae to 75c2112 Compare July 17, 2024 13:10
@maleadt
Copy link
Member Author

maleadt commented Jul 17, 2024

@vchuravy GPU methods leaking into the package image on Windows/1.11?

lld: error: undefined symbol: gpu_malloc
>>> referenced by D:\a\GPUCompiler.jl\GPUCompiler.jl\src\runtime.jl:88
>>>               jl_B252.tmp(text#0.o):(julia_box_uint8_16861)
>>> referenced by D:\a\GPUCompiler.jl\GPUCompiler.jl\src\runtime.jl:88
>>>               jl_B252.tmp(text#0.o):(julia_box_int8_16844)
>>> referenced by D:\a\GPUCompiler.jl\GPUCompiler.jl\src\runtime.jl:88
>>>               jl_B252.tmp(text#0.o):(julia_box_float64_16886)
>>> referenced 9 more times

@vchuravy
Copy link
Member

Yes but no... This is a tricky design issue/interaction.

I believe this is the same issue as why I want to use the custom method table for all GPU intrinsics and it should be equivalent to adding precompile()

Essentially precompilation tracking discards the CI and then uses the MI alone.

@vchuravy
Copy link
Member

So it would be good to confirm if this is the same as:

JuliaLang/julia#54155

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.

Project coverage is 66.27%. Comparing base (28d96c1) to head (75c2112).

Files Patch % Lines
src/precompile.jl 70.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #601      +/-   ##
==========================================
+ Coverage   65.77%   66.27%   +0.50%     
==========================================
  Files          24       24              
  Lines        3348     3339       -9     
==========================================
+ Hits         2202     2213      +11     
+ Misses       1146     1126      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maleadt
Copy link
Member Author

maleadt commented Jul 17, 2024

So it would be good to confirm if this is the same as:

JuliaLang/julia#54155

Yep, just tested this out, and I can confirm it fixes the precompilation issue.

@vchuravy
Copy link
Member

Okay... Then we need to decide if this is the right fix...
I wasn't convinced at the time.

Also just reminded me of JuliaLang/julia#55072 (comment)

@maleadt
Copy link
Member Author

maleadt commented Jul 17, 2024

It's also surprising that this only triggers on Windows, no?

@vchuravy
Copy link
Member

It's also surprising that this only triggers on Windows, no?

A little bit...

@maleadt maleadt force-pushed the tb/precompiletools branch from 75c2112 to b9bee50 Compare July 18, 2024 14:12
@maleadt
Copy link
Member Author

maleadt commented Jul 18, 2024

Workaround: don't build the runtime, so that the simple kernel compiled as part of the precompilation workload doesn't generate GPU-specific code. Seems to keep most of the latency advantage:

❯ hyperfine -w 1 " julia --project examples/kernel.jl"
Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):     977.3 ms ±  12.8 ms    [User: 1042.0 ms, System: 1110.2 ms]
  Range (min … max):   966.3 ms … 1010.9 ms    10 runs

@maleadt maleadt merged commit 1e49e09 into master Jul 18, 2024
14 of 15 checks passed
@maleadt maleadt deleted the tb/precompiletools branch July 18, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants