diff --git a/blog/2023-12-30-fast-string/index.html b/blog/2023-12-30-fast-string/index.html index c9dad49..4f03048 100644 --- a/blog/2023-12-30-fast-string/index.html +++ b/blog/2023-12-30-fast-string/index.html @@ -53,4 +53,4 @@ ---------------------------------^^ /* Actual capacity data. */ ||| /* Is it owned? */ || - /* Is it large? */ |

Now this allows us to do some very nice optimizations:

  1. Copy constructors are practically free, for both small and large cases
  2. Substring operations are practically free as well

Substring trickery

Alas, there's one big caveat with this approach: shared substrings of larger strings may not be null-terminated! For example, if I have this string:

Life before death. Strength before weakness. Journey before destination.

This is 72 bytes, so definitely a large string. But what if I take a substring of just the second sentence:

Strength before weakness.

Since we're sharing the memory with the larger string, there is no null-terminator there. This is important, since if we use c_str() or data() on our string, the returned C string pointer won't "stop" until the null, which will mean our string will appear to be this:

Strength before weakness. Journey before destination.

So, since I don't care about 100% standard compliance, but I still want the API to make sense, I compromised here that data() doesn't need to return a null-terminated string, but that c_str() always will. I handle this by checking if it's shared and then lazily switching to being owned by allocating and copying the data. In the vast majority of the cases, within jank's code, data() is sufficient, since we also have the size(). We don't need to go looking for nulls. However, compatibility with the C string world is important so we meet in the middle.

The results

I implemented my string from scratch, using benchmark-driven development, taking the best of each of the strings I studied along the way. The entire string is constexpr, which required a change to the Boehm GC's C++ allocator. Ultimately, I'm quite pleased with the results. Between just folly's string and libstdc++'s string, one of them is generally the clear winner in each benchmark, with folly's more complex encoding (and smaller size) making it slower for some large string operations. However, with jank's string, we have the best of both worlds. It's just as small as folly's string, but it either ties or outperforms the fastest in every benchmark. On top of that, it packs another word for the cached hash! Finally, the data sharing for copy construction and substrings leave the other strings in the dust. Take a look!

jank constructs small strings the fastest and ties with std::string for large strings.

jank ties with folly for copying small strings and seriously beats both when copying large strings.

jank ties with std::string for large and small string searches.

jank ties with folly for small substrings and seriously beats both when creating large substrings.

The benchmark source, which uses nanobench, can be found here.

Is sharing large strings a big deal?

This is easy to quantify. When simply compiling clojure.core, with jank, we end up sharing 3,112 large strings. That's 3,112 large string deep copies, and just as many allocations, which we can completely elide. In the span of a larger application, we'll be talking about millions of allocations and deep string copies elided. It's fantastic!

Wrapping up

jank now has a persistent string which is tailored for how Clojure programs work. It shares data, reduces allocations for strings all the way up to 23 bytes (which fits most keywords, I'd bet), and supports fast, memoized hashing. Going forward, I'll be exploring whether keeping that hash around is worth the 8 bytes, but I'm thinking it is and I'd rather bite the bullet for it now than have to add it later. When string building is needed, I've aliased a very capable transient string type called std::string, which you can get to/from a persistent string easily.

There's a lot more detail I could go into about how I made these improvements, to take folly's string design and make it as fast, or faster, than libstdc++'s string in every benchmark. I optimized aspects of data locality, write ordering, branch elimination, tricks to enable constexpr even for complex code (like reinterpret_cast), etc. If you're interested in even more detail in these areas, let me know!

Would you like to join in?

  1. Join the community on Slack
  2. Join the design discussions or pick up a ticket on GitHub
  3. Considering becoming a Sponsor
\ No newline at end of file + /* Is it large? */ |

Now this allows us to do some very nice optimizations:

  1. Copy constructors are practically free, for both small and large cases
  2. Substring operations are practically free as well

Substring trickery

Alas, there's one big caveat with this approach: shared substrings of larger strings may not be null-terminated! For example, if I have this string:

Life before death. Strength before weakness. Journey before destination.

This is 72 bytes, so definitely a large string. But what if I take a substring of just the second sentence:

Strength before weakness.

Since we're sharing the memory with the larger string, there is no null-terminator there. This is important, since if we use c_str() or data() on our string, the returned C string pointer won't "stop" until the null, which will mean our string will appear to be this:

Strength before weakness. Journey before destination.

So, since I don't care about 100% standard compliance, but I still want the API to make sense, I compromised here that data() doesn't need to return a null-terminated string, but that c_str() always will. I handle this by checking if it's shared and then lazily switching to being owned by allocating and copying the data. In the vast majority of the cases, within jank's code, data() is sufficient, since we also have the size(). We don't need to go looking for nulls. However, compatibility with the C string world is important so we meet in the middle.

The results

I implemented my string from scratch, using benchmark-driven development, taking the best of each of the strings I studied along the way. The entire string is constexpr, which required a change to the Boehm GC's C++ allocator. Ultimately, I'm quite pleased with the results. Between just folly's string and libstdc++'s string, one of them is generally the clear winner in each benchmark, with folly's more complex encoding (and smaller size) making it slower for some large string operations. However, with jank's string, we have the best of both worlds. It's just as small as folly's string, but it either ties or outperforms the fastest in every benchmark. On top of that, it packs another word for the cached hash! Finally, the data sharing for copy construction and substrings leave the other strings in the dust. Take a look!

jank constructs small strings the fastest and ties with std::string for large strings.

jank ties with folly for copying small strings and seriously beats both when copying large strings.

jank ties with std::string for large and small string searches.

jank ties with folly for small substrings and seriously beats both when creating large substrings.

The benchmark source, which uses nanobench, can be found here.

Is sharing large strings a big deal?

This is easy to quantify. When simply compiling clojure.core, with jank, we end up sharing 3,112 large strings. That's 3,112 large string deep copies, and just as many allocations, which we can completely elide. In the span of a larger application, we'll be talking about millions of allocations and deep string copies elided. It's fantastic!

Wrapping up

jank now has a persistent string which is tailored for how Clojure programs work. It shares data, reduces allocations for strings all the way up to 23 bytes (which fits most keywords, I'd bet), and supports fast, memoized hashing. Going forward, I'll be exploring whether keeping that hash around is worth the 8 bytes, but I'm thinking it is and I'd rather bite the bullet for it now than have to add it later. When string building is needed, I've aliased a very capable transient string type called std::string, which you can get to/from a persistent string easily.

There's a lot more detail I could go into about how I made these improvements, to take folly's string design and make it as fast, or faster, than libstdc++'s string in every benchmark. I optimized aspects of data locality, write ordering, branch elimination, tricks to enable constexpr even for complex code (like reinterpret_cast), etc. If you're interested in even more detail in these areas, let me know!

You can see the final source of jank's string here.

Would you like to join in?

  1. Join the community on Slack
  2. Join the design discussions or pick up a ticket on GitHub
  3. Considering becoming a Sponsor
\ No newline at end of file diff --git a/blog/feed.xml b/blog/feed.xml index 54ef2f5..c5cbcf7 100644 --- a/blog/feed.xml +++ b/blog/feed.xml @@ -1 +1 @@ -2023-12-30T20:57:03.414339946Zjank bloghttps://jank-lang.org/blog/jank's new persistent string is fast2023-12-30T00:00:00Z2023-12-30T00:00:00Zhttps://jank-lang.org/blog/2023-12-30-fast-stringJeaye Wilkerson<p>One thing I&apos;ve been meaning to do is build a custom string class for jank. I had some time, during the holidays, between wrapping up this quarter&apos;s work and starting on next quarter&apos;s, so I decided to see if I could beat both <code>std::string</code> and <code>folly::fbstring</code>, in terms of performance. After all, if we&apos;re gonna make a string class, it&apos;ll need to be fast. :)</p>jank development update - Load all the modules!2023-12-17T00:00:00Z2023-12-17T00:00:00Zhttps://jank-lang.org/blog/2023-12-17-module-loadingJeaye Wilkerson<p>I&apos;ve been quiet for the past couple of months, finishing up this work on jank&apos;s module loading, class path handling, aliasing, and var referring. Along the way, I ran into some very interesting bugs and we&apos;re in for a treat of technical detail in this holiday edition of jank development updates! A warm shout out to my <a href="https://github.com/sponsors/jeaye">Github sponsors</a> and <a href="https://www.clojuriststogether.org/">Clojurists Together</a> for sponsoring this work.</p>jank development update - Module loading2023-10-14T00:00:00Z2023-10-14T00:00:00Zhttps://jank-lang.org/blog/2023-10-14-module-loadingJeaye Wilkerson<p>For the past month and a half, I&apos;ve been building out jank&apos;s support for <code>clojure.core/require</code>, including everything from class path handling to compiling jank files to intermediate code written to the filesystem. This is a half-way report for the quarter. As a warm note, my work on jank this quarter is being sponsored by <a href="https://www.clojuriststogether.org/">Clojurists Together</a>.</p>jank development update - Object model results2023-08-26T00:00:00Z2023-08-26T00:00:00Zhttps://jank-lang.org/blog/2023-08-26-object-modelJeaye Wilkerson<p>As summer draws to a close, in the Pacific Northwest, so too does my term of sponsored work focused on a faster object model for jank. Thanks so much to <a href="https://www.clojuriststogether.org/">Clojurists Together</a> for funding jank&apos;s development. The past quarter has been quite successful and I&apos;m excited to share the results.</p>jank development update - A faster object model2023-07-08T00:00:00Z2023-07-08T00:00:00Zhttps://jank-lang.org/blog/2023-07-08-object-modelJeaye Wilkerson<p>This quarter, my work on jank is being sponsored by <a href="https://www.clojuriststogether.org/">Clojurists Together</a>. The terms of the work are to research a new object model for jank, with the goal of making jank code faster across the board. This is a half-way report and I&apos;m excited to share my results!</p>jank development update - Optimizing a ray tracer2023-04-07T00:00:00Z2023-04-07T00:00:00Zhttps://jank-lang.org/blog/2023-04-07-ray-tracingJeaye Wilkerson<p>After the <a href="/blog/2023-01-13-optimizing-sequences">last post</a>, which focused on optimizing jank&apos;s sequences, I wanted to get jank running a ray tracer I had previously written in Clojure. In this post, I document what was required to start ray tracing in jank and, more importantly, how I chased down the run time in a fierce battle with Clojure&apos;s performance.</p>jank development update - Optimizing sequences2023-01-13T00:00:00Z2023-01-13T00:00:00Zhttps://jank-lang.org/blog/2023-01-13-optimizing-sequencesJeaye Wilkerson<p>In this episode of jank&apos;s development updates, we follow an exciting few weekends as I was digging deep into Clojure&apos;s sequence implementation, building jank&apos;s equivalent, and then benchmarking and profiling in a dizzying race to the bottom.</p>jank development update - Lots of new changes2022-12-08T00:00:00Z2022-12-08T00:00:00Zhttps://jank-lang.org/blog/2022-12-08-progress-updateJeaye Wilkerson<p>I was previously giving updates only in the <a href="https://clojurians.slack.com/archives/C03SRH97FDK">#jank</a> Slack channel, but some of these are getting large enough to warrant more prose. Thus, happily, I can announce that jank has a new blog and I have a <i>lot</i> of new progress to report! Let&apos;s get into the details.</p> \ No newline at end of file +2023-12-30T21:14:01.778401081Zjank bloghttps://jank-lang.org/blog/jank's new persistent string is fast2023-12-30T00:00:00Z2023-12-30T00:00:00Zhttps://jank-lang.org/blog/2023-12-30-fast-stringJeaye Wilkerson<p>One thing I&apos;ve been meaning to do is build a custom string class for jank. I had some time, during the holidays, between wrapping up this quarter&apos;s work and starting on next quarter&apos;s, so I decided to see if I could beat both <code>std::string</code> and <code>folly::fbstring</code>, in terms of performance. After all, if we&apos;re gonna make a string class, it&apos;ll need to be fast. :)</p>jank development update - Load all the modules!2023-12-17T00:00:00Z2023-12-17T00:00:00Zhttps://jank-lang.org/blog/2023-12-17-module-loadingJeaye Wilkerson<p>I&apos;ve been quiet for the past couple of months, finishing up this work on jank&apos;s module loading, class path handling, aliasing, and var referring. Along the way, I ran into some very interesting bugs and we&apos;re in for a treat of technical detail in this holiday edition of jank development updates! A warm shout out to my <a href="https://github.com/sponsors/jeaye">Github sponsors</a> and <a href="https://www.clojuriststogether.org/">Clojurists Together</a> for sponsoring this work.</p>jank development update - Module loading2023-10-14T00:00:00Z2023-10-14T00:00:00Zhttps://jank-lang.org/blog/2023-10-14-module-loadingJeaye Wilkerson<p>For the past month and a half, I&apos;ve been building out jank&apos;s support for <code>clojure.core/require</code>, including everything from class path handling to compiling jank files to intermediate code written to the filesystem. This is a half-way report for the quarter. As a warm note, my work on jank this quarter is being sponsored by <a href="https://www.clojuriststogether.org/">Clojurists Together</a>.</p>jank development update - Object model results2023-08-26T00:00:00Z2023-08-26T00:00:00Zhttps://jank-lang.org/blog/2023-08-26-object-modelJeaye Wilkerson<p>As summer draws to a close, in the Pacific Northwest, so too does my term of sponsored work focused on a faster object model for jank. Thanks so much to <a href="https://www.clojuriststogether.org/">Clojurists Together</a> for funding jank&apos;s development. The past quarter has been quite successful and I&apos;m excited to share the results.</p>jank development update - A faster object model2023-07-08T00:00:00Z2023-07-08T00:00:00Zhttps://jank-lang.org/blog/2023-07-08-object-modelJeaye Wilkerson<p>This quarter, my work on jank is being sponsored by <a href="https://www.clojuriststogether.org/">Clojurists Together</a>. The terms of the work are to research a new object model for jank, with the goal of making jank code faster across the board. This is a half-way report and I&apos;m excited to share my results!</p>jank development update - Optimizing a ray tracer2023-04-07T00:00:00Z2023-04-07T00:00:00Zhttps://jank-lang.org/blog/2023-04-07-ray-tracingJeaye Wilkerson<p>After the <a href="/blog/2023-01-13-optimizing-sequences">last post</a>, which focused on optimizing jank&apos;s sequences, I wanted to get jank running a ray tracer I had previously written in Clojure. In this post, I document what was required to start ray tracing in jank and, more importantly, how I chased down the run time in a fierce battle with Clojure&apos;s performance.</p>jank development update - Optimizing sequences2023-01-13T00:00:00Z2023-01-13T00:00:00Zhttps://jank-lang.org/blog/2023-01-13-optimizing-sequencesJeaye Wilkerson<p>In this episode of jank&apos;s development updates, we follow an exciting few weekends as I was digging deep into Clojure&apos;s sequence implementation, building jank&apos;s equivalent, and then benchmarking and profiling in a dizzying race to the bottom.</p>jank development update - Lots of new changes2022-12-08T00:00:00Z2022-12-08T00:00:00Zhttps://jank-lang.org/blog/2022-12-08-progress-updateJeaye Wilkerson<p>I was previously giving updates only in the <a href="https://clojurians.slack.com/archives/C03SRH97FDK">#jank</a> Slack channel, but some of these are getting large enough to warrant more prose. Thus, happily, I can announce that jank has a new blog and I have a <i>lot</i> of new progress to report! Let&apos;s get into the details.</p> \ No newline at end of file