Skip to content

Conversation

@KristofferC
Copy link
Member

@KristofferC KristofferC commented Nov 5, 2025

We use a compression here by identifying:

  • most stdlibs have the same dependency list over some time
  • many times the only change an stdlib does is to bump the version

We therefore map a julia version range for a given stdlib to where it has a constant dependency list.
Within that range, we also map to what subranges specific versions of the stdlib are constant.
And finally, UUIDs are named so it makes the file a bit nicer to look at.

The compression and uncompression code were written by an LLM after my instruction of how it should be compressed.

Code for verifying that the compressed data is the same as the old one can be found here:

Verification code
# Load original
module Original
    include("src/StdlibInfo.jl")
    include("src/version_map.jl")
end

# Load compressed
module Compressed
    include("src/StdlibInfo.jl")
    include("src/version_map_compressed.jl")
    include("src/uncompress.jl")
end

# Field-by-field comparison
function compare_fields(o, c)
    o.name == c.name &&
    o.uuid == c.uuid &&
    o.version == c.version &&
    o.deps == c.deps &&
    o.weakdeps == c.weakdeps
end

orig = Original.STDLIBS_BY_VERSION
comp = Compressed.STDLIBS_BY_VERSION

if length(orig) != length(comp)
    println("✗ Length mismatch: orig=$(length(orig)), comp=$(length(comp))")
    exit(1)
end

let all_match = true
    for (i, (ov, cv)) in enumerate(zip(orig, comp))
        if ov[1] != cv[1]
            println("✗ Version mismatch at index $i")
            println("  Original: ", ov[1])
            println("  Compressed: ", cv[1])
            all_match = false
            break
        end

        orig_keys = sort(collect(keys(ov[2])))
        comp_keys = sort(collect(keys(cv[2])))

        if orig_keys != comp_keys
            println("✗ UUID keys mismatch at version $(ov[1])")
            all_match = false
            break
        end

        for uuid in orig_keys
            oinfo = ov[2][uuid]
            cinfo = cv[2][uuid]
            if !compare_fields(oinfo, cinfo)
                println("✗ Field mismatch at version $(ov[1]), stdlib $(oinfo.name)")
                println("  Original: name=$(oinfo.name), uuid=$(oinfo.uuid), ver=$(oinfo.version), deps=$(oinfo.deps), weakdeps=$(oinfo.weakdeps)")
                println("  Compressed: name=$(cinfo.name), uuid=$(cinfo.uuid), ver=$(cinfo.version), deps=$(cinfo.deps), weakdeps=$(cinfo.weakdeps)")
                all_match = false
                break
            end
        end

        if !all_match
            break
        end
    end

    if all_match
        println("✓ Data content verified - all fields match exactly")
    end
end

The reason I did this is I am kind of thinking about copy pasting the version map into Pkg but it was a bit big for that to felt ok...

we use a compression where for a given stdlib with given dependencies we associate that with a julia range. Within that julia range we also store how the version of the stdlib itself has evolved.

The compression and uncompression code were written by an LLM after my instruction of how it should be compressed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants