Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more threads for zstd compression #365

Closed
wants to merge 1 commit into from

Conversation

tonymanou
Copy link

Despite the -T0 option used in the current zstd command telling it to auto-detect the number of threads to use for compression, the current configuration is limited to single-thread compression due to the "ultra" (i.e. > 19) compression level.

Here is the current behavior, compressing the Nvidia module with dkms (it takes forever and 100% of a single thread):

[tonymanou@dell ~]$ time zstd -v -f -T0 -20 --ultra /tmp/545.29.06/nvidia.ko
*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
Note: 6 physical core(s) detected
/tmp/545.29.06/nvidia.ko : 50.42%   (  84.9 MiB =>   42.8 MiB, /tmp/545.29.06/nvidia.ko.zst)

real	0m21,043s
user	0m20,997s
sys	0m0,076s

By lowering the compression level to the highest non-ultra compression level, we become multi-threaded (it takes 100% of three threads in my case):

[tonymanou@dell ~]$ time zstd -v -f -T0 -19 /tmp/545.29.06/nvidia.ko
*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
Note: 6 physical core(s) detected
/tmp/545.29.06/nvidia.ko : 50.59%   (  84.9 MiB =>   43.0 MiB, /tmp/545.29.06/nvidia.ko.zst)

real	0m9,430s
user	0m23,665s
sys	0m0,074s

Lowering the compression level a bit more, we scrounge up some time with a negligible weight gain:

[tonymanou@dell ~]$ time zstd -v -f -T0 -15 /tmp/545.29.06/nvidia.ko
*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
Note: 6 physical core(s) detected
/tmp/545.29.06/nvidia.ko : 50.64%   (  84.9 MiB =>   43.0 MiB, /tmp/545.29.06/nvidia.ko.zst)

real	0m1,950s
user	0m5,810s
sys	0m0,145s

I feel this setting is a good compromise between compression time (also power usage) and shrinking ratio.

We could even lower the compression level to 9, equivalent to gzip's "best" level, but I suppose the compression ratio might suffer depending on the file to shrink (despite IMHO as dkms treats binary modules this should not vary that much...):

[tonymanou@dell ~]$ time zstd -v -f -T0 /tmp/545.29.06/nvidia.ko
*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
Note: 6 physical core(s) detected
/tmp/545.29.06/nvidia.ko : 50.99%   (  84.9 MiB =>   43.3 MiB, /tmp/545.29.06/nvidia.ko.zst)

real	0m0,372s
user	0m1,233s
sys	0m0,090s

If we use the default compression level (3), like mkinitcpio does, the shrinking is blazingly fast but the impact on size becomes noticeable:

[tonymanou@dell ~]$ time zstd -v -f -T0 /tmp/545.29.06/nvidia.ko
*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
Note: 6 physical core(s) detected
/tmp/545.29.06/nvidia.ko : 52.20%   (  84.9 MiB =>   44.3 MiB, /tmp/545.29.06/nvidia.ko.zst)

real	0m0,124s
user	0m0,271s
sys	0m0,091s

Compromise between single-threaded "ultra" compression level and multi-threaded "best" compression level.
Copy link

@TriMoon TriMoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it takes forever and 100% of a single thread)

You do understand that for better compression it needs more time right?
The data you provided only shows consumed times NOT number of threads used.

Your change only lowers compression ratio and thus speeds up the process, you imply in the topic title that you improve thread usage which is NOT what your change does...

@evelikov
Copy link
Collaborator

evelikov commented Nov 27, 2023

Despite the -T0 option used in the current zstd command telling it to auto-detect the number of threads to use for compression, the current configuration is limited to single-thread compression due to the "ultra" (i.e. > 19) compression level.

Reading through man zstd on my system, I see no mention or hint to any of the above. Given that I'm wondering if that's a zstd *code bug or a documentation issue. Please open an zstd bug and based on their input we can reconsider the best option for us. Thanks

For the mid/long run: One thing that I have been wondering, assuming we don't nuke our custom code (see #319), is exposing overrides that the end-user can set eg. see the COMPRESSION_OPTIONS in https://man.archlinux.org/man/core/mkinitcpio/mkinitcpio.conf.5.en

@evelikov
Copy link
Collaborator

evelikov commented Dec 6, 2023

Closing given the above feedback.

Please let us know what the zstd developers say about this - link to their issue/thread would be great. Then we can reopen and re-evaluate.

@evelikov evelikov closed this Dec 6, 2023
@evelikov
Copy link
Collaborator

Aside: I've opted for reducing the -20 to -19 with #390, which should reduce the compression times as indicated above.

Reason being is that recently we got a in-kernel module decompression, which does not support the --ultra levels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants