Skip to content

Commit

Permalink
Automated publish
Browse files Browse the repository at this point in the history
  • Loading branch information
Github Actions committed Dec 30, 2023
1 parent 1f38e97 commit c45fde1
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion blog/2023-12-30-fast-string/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@
<span class="line"><span style="color:#f92672">---------------------------------^^</span></span>
<span class="line"><span style="color:#88846f"> /* Actual capacity data. */</span><span style="color:#f8f8f2"> </span><span style="color:#f92672">|||</span></span>
<span class="line"><span style="color:#88846f"> /* Is it owned? */</span><span style="color:#f8f8f2"> </span><span style="color:#f92672">||</span></span>
<span class="line"><span style="color:#88846f"> /* Is it large? */</span><span style="color:#f8f8f2"> </span><span style="color:#f92672">|</span></span></code></pre><p><p>Now this allows us to do some very nice optimizations:<ol><li>Copy constructors are practically free, for both small and large cases<li>Substring operations are practically free as well</ol><h3>Substring trickery</h3><p>Alas, there's one big caveat with this approach: shared substrings of larger strings may not be null-terminated! For example, if I have this string:<pre class="shiki"style="background-color:#272822"><code><span class="line"><span style="color:#f8f8f2">Life before death. Strength before weakness. Journey before destination.</span></span></code></pre><p>This is 72 bytes, so definitely a large string. But what if I take a substring of just the second sentence:<pre class="shiki"style="background-color:#272822"><code><span class="line"><span style="color:#f8f8f2">Strength before weakness.</span></span></code></pre><p>Since we're sharing the memory with the larger string, there is no null-terminator there. This is important, since if we use <code>c_str()</code> or <code>data()</code> on our string, the returned C string pointer won't &quot;stop&quot; until the null, which will mean our string will appear to be this:<pre class="shiki"style="background-color:#272822"><code><span class="line"><span style="color:#f8f8f2">Strength before weakness. Journey before destination.</span></span></code></pre><p>So, since I don't care about 100% standard compliance, but I still want the API to make sense, I compromised here that <code>data()</code> doesn't need to return a null-terminated string, but that <code>c_str()</code> always will. I handle this by checking if it's shared and then lazily switching to being owned by allocating and copying the data. In the vast majority of the cases, within jank's code, <code>data()</code> is sufficient, since we also have the <code>size()</code>. We don't need to go looking for nulls. However, compatibility with the C string world is important so we meet in the middle.<h2>The results</h2><p>I implemented my string from scratch, using benchmark-driven development, taking the best of each of the strings I studied along the way. The entire string is <code>constexpr</code>, which required a <a href="https://github.com/ivmai/bdwgc/pull/603">change</a> to the Boehm GC's C++ allocator. Ultimately, I'm quite pleased with the results. Between just folly's string and libstdc++'s string, one of them is generally the clear winner in each benchmark, with folly's more complex encoding (and smaller size) making it slower for some large string operations. However, with jank's string, we have the best of both worlds. It's just as small as folly's string, but it either ties or outperforms the fastest in every benchmark. On top of that, it packs another word for the cached hash! Finally, the data sharing for copy construction and substrings leave the other strings in the dust. Take a look!<p><figure><object data="/img/blog/2023-12-30-fast-string/allocations.plot.svg"type="image/svg+xml"width="50%"><img src="/img/blog/2023-12-30-fast-string/allocations.plot.svg"width="50%"></object><figcaption>jank constructs small strings the fastest and ties with std::string for large strings.</figcaption></figure><p><p><figure><object data="/img/blog/2023-12-30-fast-string/copy.plot.svg"type="image/svg+xml"width="50%"><img src="/img/blog/2023-12-30-fast-string/copy.plot.svg"width="50%"></object><figcaption>jank ties with folly for copying small strings and seriously beats both when copying large strings.</figcaption></figure><p><p><figure><object data="/img/blog/2023-12-30-fast-string/find.plot.svg"type="image/svg+xml"width="50%"><img src="/img/blog/2023-12-30-fast-string/find.plot.svg"width="50%"></object><figcaption>jank ties with std::string for large and small string searches.</figcaption></figure><p><p><figure><object data="/img/blog/2023-12-30-fast-string/substr.plot.svg"type="image/svg+xml"width="50%"><img src="/img/blog/2023-12-30-fast-string/substr.plot.svg"width="50%"></object><figcaption>jank ties with folly for small substrings and seriously beats both when creating large substrings.</figcaption></figure><p><p>The benchmark source, which uses nanobench, can be found <a href="https://gist.github.com/jeaye/306d6aefd7ed6c29fdec6eef2cafbb1f">here</a>.<h2>Is sharing large strings a big deal?</h2><p>This is easy to quantify. When simply compiling <code>clojure.core</code>, with jank, we end up sharing 3,112 large strings. That's 3,112 large string deep copies, and just as many allocations, which we can completely elide. In the span of a larger application, we'll be talking about millions of allocations and deep string copies elided. It's fantastic!<h2>Wrapping up</h2><p>jank now has a persistent string which is tailored for how Clojure programs work. It shares data, reduces allocations for strings all the way up to 23 bytes (which fits most keywords, I'd bet), and supports fast, memoized hashing. Going forward, I'll be exploring whether keeping that hash around is worth the 8 bytes, but I'm thinking it is and I'd rather bite the bullet for it now than have to add it later. When string building is needed, I've aliased a very capable transient string type called <code>std::string</code>, which you can get to/from a persistent string easily.<p>There's a lot more detail I could go into about how I made these improvements, to take folly's string design and make it as fast, or faster, than libstdc++'s string in every benchmark. I optimized aspects of data locality, write ordering, branch elimination, tricks to enable <code>constexpr</code> even for complex code (like <code>reinterpret_cast</code>), etc. If you're interested in even more detail in these areas, let me know!<h2>Would you like to join in?</h2><ol><li>Join the community on <a href="https://clojurians.slack.com/archives/C03SRH97FDK">Slack</a><li>Join the design discussions or pick up a ticket on <a href="https://github.com/jank-lang/jank">GitHub</a><li>Considering becoming a <a href="https://github.com/sponsors/jeaye">Sponsor</a></ol></div></div></section></div><footer class="footer"><div class="container"><div class="columns has-text-centered"><div class="column has-text-centered"><aside class="menu"><p class="menu-label">Resources<ul class="menu-list"><li><a href="https://clojurians.slack.com/archives/C03SRH97FDK">Slack</a><li><a href="https://github.com/jank-lang/jank">Github</a><li><a href="https://jank-lang.org/blog/feed.xml">RSS</a></ul></aside></div></div><div class="container has-text-centered"><div class="content is-small"><p>© 2022 Jeaye Wilkerson | All rights reserved.</div></div></div></footer><noscript><p><img src="//matomo.jeaye.com/matomo.php?idsite=1&amp;rec=1"style="border:0"alt=""></p></noscript><script>for(var coll=document.getElementsByClassName("collapsible"),i=0;i<coll.length;i++)coll[i].addEventListener("click",function(){this.classList.toggle("active");var l=this.nextElementSibling;"block"===l.style.display?l.style.display="none":l.style.display="block"})</script>
<span class="line"><span style="color:#88846f"> /* Is it large? */</span><span style="color:#f8f8f2"> </span><span style="color:#f92672">|</span></span></code></pre><p><p>Now this allows us to do some very nice optimizations:<ol><li>Copy constructors are practically free, for both small and large cases<li>Substring operations are practically free as well</ol><h3>Substring trickery</h3><p>Alas, there's one big caveat with this approach: shared substrings of larger strings may not be null-terminated! For example, if I have this string:<pre class="shiki"style="background-color:#272822"><code><span class="line"><span style="color:#f8f8f2">Life before death. Strength before weakness. Journey before destination.</span></span></code></pre><p>This is 72 bytes, so definitely a large string. But what if I take a substring of just the second sentence:<pre class="shiki"style="background-color:#272822"><code><span class="line"><span style="color:#f8f8f2">Strength before weakness.</span></span></code></pre><p>Since we're sharing the memory with the larger string, there is no null-terminator there. This is important, since if we use <code>c_str()</code> or <code>data()</code> on our string, the returned C string pointer won't &quot;stop&quot; until the null, which will mean our string will appear to be this:<pre class="shiki"style="background-color:#272822"><code><span class="line"><span style="color:#f8f8f2">Strength before weakness. Journey before destination.</span></span></code></pre><p>So, since I don't care about 100% standard compliance, but I still want the API to make sense, I compromised here that <code>data()</code> doesn't need to return a null-terminated string, but that <code>c_str()</code> always will. I handle this by checking if it's shared and then lazily switching to being owned by allocating and copying the data. In the vast majority of the cases, within jank's code, <code>data()</code> is sufficient, since we also have the <code>size()</code>. We don't need to go looking for nulls. However, compatibility with the C string world is important so we meet in the middle.<h2>The results</h2><p>I implemented my string from scratch, using benchmark-driven development, taking the best of each of the strings I studied along the way. The entire string is <code>constexpr</code>, which required a <a href="https://github.com/ivmai/bdwgc/pull/603">change</a> to the Boehm GC's C++ allocator. Ultimately, I'm quite pleased with the results. Between just folly's string and libstdc++'s string, one of them is generally the clear winner in each benchmark, with folly's more complex encoding (and smaller size) making it slower for some large string operations. However, with jank's string, we have the best of both worlds. It's just as small as folly's string, but it either ties or outperforms the fastest in every benchmark. On top of that, it packs another word for the cached hash! Finally, the data sharing for copy construction and substrings leave the other strings in the dust. Take a look!<p><figure><object data="/img/blog/2023-12-30-fast-string/allocations.plot.svg"type="image/svg+xml"width="50%"><img src="/img/blog/2023-12-30-fast-string/allocations.plot.svg"width="50%"></object><figcaption>jank constructs small strings the fastest and ties with std::string for large strings.</figcaption></figure><p><p><figure><object data="/img/blog/2023-12-30-fast-string/copy.plot.svg"type="image/svg+xml"width="50%"><img src="/img/blog/2023-12-30-fast-string/copy.plot.svg"width="50%"></object><figcaption>jank ties with folly for copying small strings and seriously beats both when copying large strings.</figcaption></figure><p><p><figure><object data="/img/blog/2023-12-30-fast-string/find.plot.svg"type="image/svg+xml"width="50%"><img src="/img/blog/2023-12-30-fast-string/find.plot.svg"width="50%"></object><figcaption>jank ties with std::string for large and small string searches.</figcaption></figure><p><p><figure><object data="/img/blog/2023-12-30-fast-string/substr.plot.svg"type="image/svg+xml"width="50%"><img src="/img/blog/2023-12-30-fast-string/substr.plot.svg"width="50%"></object><figcaption>jank ties with folly for small substrings and seriously beats both when creating large substrings.</figcaption></figure><p><p>The benchmark source, which uses nanobench, can be found <a href="https://gist.github.com/jeaye/306d6aefd7ed6c29fdec6eef2cafbb1f">here</a>.<h2>Is sharing large strings a big deal?</h2><p>This is easy to quantify. When simply compiling <code>clojure.core</code>, with jank, we end up sharing 3,112 large strings. That's 3,112 large string deep copies, and just as many allocations, which we can completely elide. In the span of a larger application, we'll be talking about millions of allocations and deep string copies elided. It's fantastic!<h2>Wrapping up</h2><p>jank now has a persistent string which is tailored for how Clojure programs work. It shares data, reduces allocations for strings all the way up to 23 bytes (which fits most keywords, I'd bet), and supports fast, memoized hashing. Going forward, I'll be exploring whether keeping that hash around is worth the 8 bytes, but I'm thinking it is and I'd rather bite the bullet for it now than have to add it later. When string building is needed, I've aliased a very capable transient string type called <code>std::string</code>, which you can get to/from a persistent string easily.<p>There's a lot more detail I could go into about how I made these improvements, to take folly's string design and make it as fast, or faster, than libstdc++'s string in every benchmark. I optimized aspects of data locality, write ordering, branch elimination, tricks to enable <code>constexpr</code> even for complex code (like <code>reinterpret_cast</code>), etc. If you're interested in even more detail in these areas, let me know!<p>You can see the final source of jank's string <a href="https://github.com/jank-lang/jank/blob/main/include/cpp/jank/native_persistent_string.hpp">here</a>.<h2>Would you like to join in?</h2><ol><li>Join the community on <a href="https://clojurians.slack.com/archives/C03SRH97FDK">Slack</a><li>Join the design discussions or pick up a ticket on <a href="https://github.com/jank-lang/jank">GitHub</a><li>Considering becoming a <a href="https://github.com/sponsors/jeaye">Sponsor</a></ol></div></div></section></div><footer class="footer"><div class="container"><div class="columns has-text-centered"><div class="column has-text-centered"><aside class="menu"><p class="menu-label">Resources<ul class="menu-list"><li><a href="https://clojurians.slack.com/archives/C03SRH97FDK">Slack</a><li><a href="https://github.com/jank-lang/jank">Github</a><li><a href="https://jank-lang.org/blog/feed.xml">RSS</a></ul></aside></div></div><div class="container has-text-centered"><div class="content is-small"><p>© 2022 Jeaye Wilkerson | All rights reserved.</div></div></div></footer><noscript><p><img src="//matomo.jeaye.com/matomo.php?idsite=1&amp;rec=1"style="border:0"alt=""></p></noscript><script>for(var coll=document.getElementsByClassName("collapsible"),i=0;i<coll.length;i++)coll[i].addEventListener("click",function(){this.classList.toggle("active");var l=this.nextElementSibling;"block"===l.style.display?l.style.display="none":l.style.display="block"})</script>
Loading

0 comments on commit c45fde1

Please sign in to comment.