You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-04-12-wcoj.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -328,7 +328,7 @@ where each of $R_i(x_i), R_j(x_j), R_k(x_k),...$ is a subatom over a the schema
328
328
329
329
To interpret a free join plan, simply treat each bracketed group [] as a loop level. Within each level, iterate over the first subatom of the group, and use the scanned value to ground (or filter) and look up the remaining subatoms. For example, the triangle query with WCOJ optimizations can be represented in pseudocode as follows:
One bonus contribution of the original Free Join paper is that it also presents an algorithm for implementing the join plan using a novel data structure called the Lazy Generalized Hash Trie (LGHT). Similar to how the sorted trie enables the pipelining of worst-case optimal joins in LFTJ, LGHT makes it possible to fully pipeline hash-based WCOJ.
<p>While the pipelining model is parallelizable, it can failed scale when <em>data skew</em> occurs. For instance, in a large social media graph, some influential nodes may have hundreds of times more followers than other nodes. When such a graph serves as the outer relation in a multi-input relation CQ, the thread processing the influential node will have to handle significantly more work compared to threads processing less-connected nodes. This imbalance can lead to thread idle.</p>
<p>This imbalance in k-way joins makes it difficult to scale pipelined join operations on massively parallel hardware and hinders adaptation to SIMD-based architectures, such as GPUs and AVX-supported CPUs.</p>
<p>A natural way to reason about how each relation and column contributes to the CQ is by constructing a <em>query graph</em>: nodes represent logical variables (i.e., column names), and edges represent the relations. In this model, worst-case join planning can be seen as an <em>edge cover</em> problem, where the goal is to identify a minimal set of vertices that touches every edge. For example, for two foobar query graphs we show earlier in this section, the first one might use the vertex set ${A, D, C}$ to represent the worst case for the first query, while the vertex set ${A, B}$ will be sufficient for the second query, indicating that the worst-case output size is dominated by fewer variables.</p>
@@ -278,7 +278,7 @@ <h2 id="5-worst-case-optimal-join">5. Worst Case Optimal Join</h2>
278
278
279
279
<p>Inspired by AGM bound, Hung Q. Ngo, Christopher Re and Atri Rudra purpose a generic framework (missing reference) for designing worst case optimal join. Their algorithm can be described using the following pseudocode:</p>
<p>At each recursion level, the algorithm selects a join variable—typically chosen based on heuristics such as frequency of occurrence across relations. It then <strong>projects</strong> the selected variable from all participating relations and computes the intersection of these value sets to determine all possible assignments to that logical variable. For each intersected value, the algorithm grounds the query accordingly and recursively applies the same process to the partially grounded query. This recursion continues until all variables in the query are bound, yielding a complete join result. This project-intersect-join pattern, this match the suggestion of original AGM paper.</p>
<p>One notable thing of the generic WCOJ algorithm is that it requires allocating temporary buffers at each level of the recursion (or nested for-loop) to store intermediate results, the partially grounded tuples. This introduces more memory overhead, when compared to traditional left-deep binary joins, where intermediate results are often pipelined or materialized in global buffer. If we use the same data structures for both storage and join processing (as is common in left-deep plans), this memory pressure can severely impact performance. A common solution is using prefix trie as relation data structure.</p>
<p>For example in above relation A stored in trie, operation <codeclass="language-plaintext highlighter-rouge">A[1]</code> can now be implemented as finding the pointer of sub-tree rooted at value one and instead of temporary buffer we only need store a single pointer.</p>
<p>An implementation of this idea is <em>Leapfrog Triejoin (LFTJ)</em><aclass="citation" href="#veldhuizen2014leapfrog">(Veldhuizen)</a>, introduced by Todd L. Veldhuizen and used in the commercial system LogicBlox. LFTJ is specifically designed for scenarios where all column values are integers and each relation is indexed using a sorted trie. In such tries, each level corresponds to a join variable, and the children (subtries) of every node are kept in sorted order. Below pseudo code describe the LFTJ using iterator-model style:</p>
Below is a concrete example illustrating the algorithm’s operation. Initially, the algorithm initializes an iterator over each input relation’s join column. In this example, the iterators for relations A, B, and C are positioned at 0, 0, and 2, respectively. The algorithm then determines the candidate join value by computing the maximum of these initial values as possible lower-bound of next joined value, which is 2—this value is currently held by relation C. Next, the algorithm uses the linear probing function (leapfrog-seek) to search for the candidate value 2 in another relation—in this case, relation A is arbitrarily chosen. During the search in relation A, it is found that 2 is not present; instead, the iterator advances to the smallest value greater than 2, which is 3. With 3 as the new possible lower-bound of next joined value, the algorithm then repeats the search in relation B.This process of advancing the iterators continues until a candidate join value (in the example, 8) is present in all relations. When such a value is found, it confirms that the value lies in the intersection of all join columns, allowing the algorithm to proceed with the inner loop of the generic join operation.</p>
<p>Although LFTJ is a compelling algorithm for pipelining worst-case optimal joins, its reliance on sorted tries for relation storage can be limiting. While sorted tries support efficient sequential iteration, they impose an ordering constraint that can result in non-constant factor access during lookups. For systems that require truly constant-factor indexed value access, a hash-trie based algorithm is more appealing.</p>
<p>To interpret a free join plan, simply treat each bracketed group [] as a loop level. Within each level, iterate over the first subatom of the group, and use the scanned value to ground (or filter) and look up the remaining subatoms. For example, the triangle query with WCOJ optimizations can be represented in pseudocode as follows:</p>
<p>One bonus contribution of the original Free Join paper is that it also presents an algorithm for implementing the join plan using a novel data structure called the Lazy Generalized Hash Trie (LGHT). Similar to how the sorted trie enables the pipelining of worst-case optimal joins in LFTJ, LGHT makes it possible to fully pipeline hash-based WCOJ.</p>
353
353
@@ -356,7 +356,8 @@ <h2 id="whats-next-">What’s Next ?</h2>
356
356
<p>In this article, we have explored a wide range of processing algorithms for conjunctive queries, but most of them only run on single CPU system—especially worst-case optimal join techniques. However, scaling these methods to parallel processing environments remains a complex and open research question. Recent work on adapting worst-case optimal joins to parallel hardware (see <aclass="citation" href="#wu2025honeycomb">(Wu and Suciu)</a><aclass="citation" href="#lai2022accelerating">(Lai et al.)</a>) shows promising directions, though none of these approaches have matured for use in real-world databases. I look forward to discussing these developments and sharing my thoughts on parallelizing conjunctive query processing in future blog articles.</p>
357
357
358
358
<h2id="reference">Reference</h2>
359
-
<olclass="bibliography"><li><spanid="graefe1993volcano">Graefe, Goetz, and William J. McKenna. “The Volcano Optimizer Generator: Extensibility and Efficient Search.” <i>Proceedings of IEEE 9th International Conference on Data Engineering</i>, IEEE, 1993, pp. 209–18.</span></li>
<li><spanid="graefe1993volcano">Graefe, Goetz, and William J. McKenna. “The Volcano Optimizer Generator: Extensibility and Efficient Search.” <i>Proceedings of IEEE 9th International Conference on Data Engineering</i>, IEEE, 1993, pp. 209–18.</span></li>
360
361
<li><spanid="palvo2024multiway">Palvo, Andy. <i>Lecture Note of Andy Palvo on Multi-Way Join Algorithm</i>. 2024, https://15721.courses.cs.cmu.edu/spring2024/notes/10-multiwayjoins.pdf.</span></li>
361
362
<li><spanid="lai2022accelerating">Lai, Zhuohang, et al. “Accelerating Multi-Way Joins on the GPU.” <i>The VLDB Journal</i>, 2022, pp. 1–25.</span></li>
362
363
<li><spanid="atserias2013size">Atserias, Albert, et al. “Size Bounds and Query Plans for Relational Joins.” <i>SIAM Journal on Computing</i>, vol. 42, no. 4, 2013, pp. 1737–67.</span></li>
0 commit comments