Switch to PrecompileTools.jl. #601

maleadt · 2024-07-17T12:52:44Z

Testing examples/kernel.jl without a precompile workload at all:

❯ hyperfine -w 1 " julia --project examples/kernel.jl"
Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):      8.131 s ±  0.241 s    [User: 8.139 s, System: 1.156 s]
  Range (min … max):    7.873 s …  8.375 s    10 runs

Current master:

Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):      7.732 s ±  0.085 s    [User: 7.723 s, System: 1.173 s]
  Range (min … max):    7.560 s …  7.818 s    10 runs

Before #600:

Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):      1.782 s ±  0.022 s    [User: 1.825 s, System: 1.130 s]
  Range (min … max):    1.744 s …  1.815 s    10 runs

This PR:

Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):     991.3 ms ±  14.8 ms    [User: 1045.8 ms, System: 1120.8 ms]
  Range (min … max):   968.7 ms … 1013.1 ms    10 runs

So still an improvement. Probably doesn't matter too much for downstream users, as the whole @nospecialize(job::CompilerJob) doesn't really work and I think we end up respecializing the GPUCompiler code anyway (i.e., downstream packages require their own precompile workload).

maleadt · 2024-07-17T13:15:30Z

@vchuravy GPU methods leaking into the package image on Windows/1.11?

lld: error: undefined symbol: gpu_malloc
>>> referenced by D:\a\GPUCompiler.jl\GPUCompiler.jl\src\runtime.jl:88
>>>               jl_B252.tmp(text#0.o):(julia_box_uint8_16861)
>>> referenced by D:\a\GPUCompiler.jl\GPUCompiler.jl\src\runtime.jl:88
>>>               jl_B252.tmp(text#0.o):(julia_box_int8_16844)
>>> referenced by D:\a\GPUCompiler.jl\GPUCompiler.jl\src\runtime.jl:88
>>>               jl_B252.tmp(text#0.o):(julia_box_float64_16886)
>>> referenced 9 more times

vchuravy · 2024-07-17T13:47:24Z

Yes but no... This is a tricky design issue/interaction.

I believe this is the same issue as why I want to use the custom method table for all GPU intrinsics and it should be equivalent to adding precompile()

Essentially precompilation tracking discards the CI and then uses the MI alone.

vchuravy · 2024-07-17T13:48:42Z

So it would be good to confirm if this is the same as:

JuliaLang/julia#54155

codecov · 2024-07-17T13:53:38Z

Codecov Report

Attention: Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.

Project coverage is 66.27%. Comparing base (28d96c1) to head (75c2112).

Files	Patch %	Lines
src/precompile.jl	70.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #601      +/-   ##
==========================================
+ Coverage   65.77%   66.27%   +0.50%     
==========================================
  Files          24       24              
  Lines        3348     3339       -9     
==========================================
+ Hits         2202     2213      +11     
+ Misses       1146     1126      -20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

maleadt · 2024-07-17T15:10:16Z

So it would be good to confirm if this is the same as:

JuliaLang/julia#54155

Yep, just tested this out, and I can confirm it fixes the precompilation issue.

vchuravy · 2024-07-17T16:05:33Z

Okay... Then we need to decide if this is the right fix...
I wasn't convinced at the time.

Also just reminded me of JuliaLang/julia#55072 (comment)

maleadt · 2024-07-17T17:57:06Z

It's also surprising that this only triggers on Windows, no?

vchuravy · 2024-07-17T18:41:17Z

It's also surprising that this only triggers on Windows, no?

A little bit...

maleadt · 2024-07-18T14:19:16Z

Workaround: don't build the runtime, so that the simple kernel compiled as part of the precompilation workload doesn't generate GPU-specific code. Seems to keep most of the latency advantage:

❯ hyperfine -w 1 " julia --project examples/kernel.jl"
Benchmark 1:  julia --project examples/kernel.jl
  Time (mean ± σ):     977.3 ms ±  12.8 ms    [User: 1042.0 ms, System: 1110.2 ms]
  Range (min … max):   966.3 ms … 1010.9 ms    10 runs

maleadt force-pushed the tb/precompiletools branch 2 times, most recently from 66952ae to 75c2112 Compare July 17, 2024 13:10

Switch to PrecompileTools.jl.

b9bee50

maleadt force-pushed the tb/precompiletools branch from 75c2112 to b9bee50 Compare July 18, 2024 14:12

maleadt merged commit 1e49e09 into master Jul 18, 2024
14 of 15 checks passed

maleadt deleted the tb/precompiletools branch July 18, 2024 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to PrecompileTools.jl. #601

Switch to PrecompileTools.jl. #601

maleadt commented Jul 17, 2024

maleadt commented Jul 17, 2024

vchuravy commented Jul 17, 2024

vchuravy commented Jul 17, 2024

codecov bot commented Jul 17, 2024

maleadt commented Jul 17, 2024 •

edited

Loading

vchuravy commented Jul 17, 2024

maleadt commented Jul 17, 2024

vchuravy commented Jul 17, 2024

maleadt commented Jul 18, 2024

Switch to PrecompileTools.jl. #601

Switch to PrecompileTools.jl. #601

Conversation

maleadt commented Jul 17, 2024

maleadt commented Jul 17, 2024

vchuravy commented Jul 17, 2024

vchuravy commented Jul 17, 2024

codecov bot commented Jul 17, 2024

Codecov Report

maleadt commented Jul 17, 2024 • edited Loading

vchuravy commented Jul 17, 2024

maleadt commented Jul 17, 2024

vchuravy commented Jul 17, 2024

maleadt commented Jul 18, 2024

maleadt commented Jul 17, 2024 •

edited

Loading