RFE: stop oc image mirror creating duplicate files when mirroring to disk for an airgap install #1388
Labels
lifecycle/frozen
Indicates that an issue or PR should not be auto-closed due to staleness.
When running a command like:
oc image mirror -f images-mapping-to-filesystem.txt --filter-by-os '.*' --skip-multiple-scopes --max-per-registry=1
some manifest and blob files are duplicated into different folders. For example, if I run this command from inside the root
v2
folder after the mirror is complete I see:This shows that out of ~438Mb downloaded, ~394Mb are duplicates. Obviously this is an extreme case, but over a whole airgap mirror I'm seeing on average that about 1/3 of the size is taken up in duplicate files, and in some I see over 100GB of duplicates for large mirrors.
If the command below is run from the root of a mirrored folder on disk (inside the
v2
folder) it will provide a list of all the duplicates files preceded by a count of how many times each one is duplicated and is followed by the size of each image:The example above shows there are 9 copies of the first blob starting
sha256:5d9ff...
and each one is39235316
bytes in size.Whereas this command below will count all the duplicates and provide a total of the total space lost in duplicates so you can see how big the problem is on different mirrors:
Given that the main purpose of
oc image mirror
is to mirror a registry to prepare for an airgap install, this is a lot of wasted space and time when mirroring large repositories. Therefore, it would be really helpful to eliminate the duplicates, perhaps by using thelink
file mechanism that some registries use internally, such as themanifestTagIndexEntryLinkPathSpec
and thelayerLinkPathSpec
from distribution.Happy to provide more information if required.
MGK
The text was updated successfully, but these errors were encountered: