Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: inline variants by buffer lengths #17

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

CPunisher
Copy link
Contributor

@CPunisher CPunisher commented Sep 7, 2024

  1. Make six versions of Repr::Inline, and increase INLINE_CAP to 32
    This is inspired by Enum field align cause performance degradation about 10x rust-lang/rust#119247 (comment). However, I just choose six versions with buffers of fixed length 1,2,4,8,16,32 (maybe there is a better partition). The purpose is:
    1. Decrease the variants counts. Because I find that too much variants lead to extra instructions in asm (but I don't know the root cause).
    2. Copying data of these lengths requires at most 2 moves on the registers. For lengths in [17, 32], it only takes two trips from memory to the %xmm0 (16 byte) register and back to memory to copy the buffer.
  2. Inline the sizes of variants.
    This is inspired by One more byte for inlined rust-analyzer/smol_str#53, which makes the compiler know the length range of each variant.

From my benchmarking, the performance improves on both x86-64 and aarch64. I need to figure out why cloning empty is also faster. The results of the benchmark look weird. I write a simple for-loop to test the performance, which shows an expected result that the performance of cloning inline string is greatly improved at the cost of some extra instructions for the discriminant judgment.

There are also cons:

  1. This makes code somehow cumbersome (I don't think it's better to use macro).
  2. This increases binary size from 141kb to 205kb on aarch64
  3. I'm not sure if there are any bugs, since the test coverage is not too high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant