Perform immut/{set, sparse_array} #2272

FlyCloudC · 2025-06-14T18:45:58Z

This PR is the first part of the split #2205, with clearer git history.

clean doc in sparse_array
using Path to avoid passing depth
replace Collision(Bucket[A]) with Leaf(A, @list.T[A])
rename T to Node, remove Node::empty, define T as Node?
replace nested Branch({ data:[Branch(...)], .. }) by Flat(key, path)

Parts from #2205 not included in this PR:

adopt CPS style in add/remove to reuse old nodes in special cases (maybe we don't need this)
optimize set operations: union, intersection, difference
Improve test coverage by using MyInt, which only use the lower 8 bits for hash
apply same improvements to immut/map

Overview

-priv enum Bucket[T] {
-  JustOne(T) // must be non-empty
-  More(T, Bucket[T])
-}

-enum T[A] {
+priv enum Node[A] {
-  Empty

-  Leaf(A)
-  Collision(Bucket[A])
+  Leaf(A, @list.T[A])

+  Flat(A, Path)

-  Branch(@sparse_array.SparseArray[T[A]])
+  Branch(@sparse_array.SparseArray[Node[A]])
}

+ type T[A] Node[A]?

peter-jerry-ye-code-review · 2025-06-14T18:46:19Z

Path comparison in Node::contains might cause infinite loop

Category
Correctness
Code Snippet
fn[A : Eq] Node::contains(self : Node[A], key : A, path : Path) -> Bool {
loop (self, path) {
(Leaf(key1, bucket), _) => key == key1 || bucket.contains(key)
(Flat(key1, path1), path) => path == path1 && key == key1
(Branch(children), path) => {
let idx = path.idx()
if children[idx] is Some(child) {
continue (child, path.next())
}
false
}
}
}
Recommendation
Add a path length check to prevent infinite recursion:

(Branch(children), path) => {
  if path.is_last() { return false }
  let idx = path.idx()
  if children[idx] is Some(child) {
    continue (child, path.next())
  }
  false
}

Reasoning
The loop in Node::contains could potentially continue indefinitely if the path structure is malformed. Adding a check for path.is_last() ensures termination.

Unnecessary allocations in union operation

Category
Performance
Code Snippet
fn[K : Eq] T::union(self : T[K], other : T[K]) -> T[K] {
fn go(node1, node2) {
match (node1, node2) {
(node, Flat(key, path)) | (Flat(key, path), node) =>
node.add_with_path(key, path)
Recommendation
Consider reusing nodes when possible:

fn go(node1, node2) {
  match (node1, node2) {
    (node, Flat(key, path)) | (Flat(key, path), node) =>
      if node.contains(key, path) { node } else { node.add_with_path(key, path) }

Reasoning
The current implementation always creates a new node even when the key already exists in the target node. Checking containment first could avoid unnecessary allocations.

Path implementation details are not well documented

Category
Maintainability
Code Snippet
pub(all) type Path UInt derive(Eq)

const SEGMENT_LENGTH : Int = 5
const INDEX_MASK : UInt = (1 << SEGMENT_LENGTH) - 1
const SEGMENT_NUM : Int = 32 / SEGMENT_LENGTH
const HEAD_TAG : UInt = 0xffffffffU << (SEGMENT_LENGTH * SEGMENT_NUM)
Recommendation
Add detailed documentation explaining the Path type and its bit layout:

///| A Path represents a sequence of 5-bit segments in a 32-bit integer.
/// The format is: HEAD_TAG | segment_n | ... | segment_1 | segment_0
/// where each segment is 5 bits and HEAD_TAG ensures the upper bits are set.
pub(all) type Path UInt derive(Eq)

Reasoning
The Path type is central to the new implementation but its internal representation and constraints are not clearly documented. Better documentation would help maintainers understand the design.

coveralls · 2025-06-14T18:48:48Z

Pull Request Test Coverage Report for Build 94

Details

189 of 220 (85.91%) changed or added relevant lines in 5 files are covered.
2 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.1%) to 89.034%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
immut/internal/sparse_array/sparse_array.mbt	49	56	87.5%
immut/hashset/HAMT.mbt	117	141	82.98%

Files with Coverage Reduction	New Missed Lines	%
immut/hashset/HAMT.mbt	1	85.03%
immut/internal/sparse_array/sparse_array.mbt	1	89.04%

Totals
Change from base Build 91:	-0.1%
Covered Lines:	3475
Relevant Lines:	3903

💛 - Coveralls

peter-jerry-ye

I'll leave it to the experts for the implementation of the algorithm, but here's something I've observed.

immut/internal/sparse_array/bitset.mbt

immut/internal/sparse_array/sparse_array_test.mbt

Guest0x0 · 2025-06-17T02:52:45Z

btw from #2205 the optimized set operations seem quite independent, I think we should include that part in this PR too, so that there would be no performance degeneration in those operations

Guest0x0 · 2025-06-17T02:21:39Z

immut/internal/path/path.mbt

+const SEGMENT_NUM : Int = 32 / SEGMENT_LENGTH
+
+///|
+const HEAD_TAG : UInt = 0xffffffffU << (SEGMENT_LENGTH * SEGMENT_NUM)


what's the purpose of this?

So if I understand correctly, the new version only utilize the lower 30 bits of the hash value, and the highest 2 bits are used to identify end of hash value?

Guest0x0 · 2025-06-17T02:34:02Z

immut/hashset/HAMT.mbt

  }
+}


just a note: maybe it's possible to also do path compression for Branch (i.e. add an extra path segment (Path + length) parameter) too?

immut/internal/path/path.mbt

Guest0x0 · 2025-06-17T02:49:16Z

immut/hashset/HAMT.mbt

-    (Collision(xs), Collision(ys)) =>
-      xs.size() == ys.size() && xs.iter().all(fn(x) { ys.find(x) })
+    (Leaf(x, xs), Leaf(y, ys)) =>
+      xs.length() == ys.length() && xs.add(x).iter().all(ys.add(y).contains(_))


ys.add(y) will be run multiple times here

list/list.mbt

FlyCloudC · 2025-06-27T03:46:04Z

(using arrow function)

bobzhang force-pushed the perf-immut-set branch from 40a441c to 44df2f7 Compare June 15, 2025 01:14

bobzhang requested a review from Guest0x0 June 15, 2025 01:14

peter-jerry-ye reviewed Jun 16, 2025

View reviewed changes

immut/internal/sparse_array/bitset.mbt Outdated Show resolved Hide resolved

immut/internal/sparse_array/sparse_array_test.mbt Outdated Show resolved Hide resolved

FlyCloudC force-pushed the perf-immut-set branch from 44df2f7 to ad77ba3 Compare June 16, 2025 08:52

FlyCloudC force-pushed the perf-immut-set branch 2 times, most recently from 3d148b8 to 1c87007 Compare June 21, 2025 01:33

Guest0x0 reviewed Jun 25, 2025

View reviewed changes

Guest0x0 reviewed Jun 26, 2025

View reviewed changes

list/list.mbt Outdated Show resolved Hide resolved

FlyCloudC force-pushed the perf-immut-set branch from 04df5e1 to 558deab Compare June 26, 2025 04:54

FlyCloudC added 8 commits June 27, 2025 11:40

removing redundant function signatures in doc

70be627

refactor(sparse_array): remove has and simplify tests

321d9c6

refactor(HAMT): using Path

489173b

refactor(HAMT): using @list and remove single-element Leaf optimization

684a2a7

refactor(HAMT): separate representation of empty set

4a57dfa

perf(HAMT): flatten nested Branch structure

6a4df5e

perf(HAMT): union, difference, intersection

631afd0

fix(HAMT): duplicate execution of add operation

ca818e3

FlyCloudC force-pushed the perf-immut-set branch from 558deab to ca818e3 Compare June 27, 2025 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perform immut/{set, sparse_array} #2272

Perform immut/{set, sparse_array} #2272

Uh oh!

FlyCloudC commented Jun 14, 2025

Uh oh!

peter-jerry-ye-code-review bot commented Jun 14, 2025 •

edited

Loading

Uh oh!

coveralls commented Jun 14, 2025 •

edited

Loading

Uh oh!

peter-jerry-ye left a comment

Uh oh!

Uh oh!

Uh oh!

Guest0x0 commented Jun 17, 2025

Uh oh!

Guest0x0 Jun 17, 2025

Uh oh!

Guest0x0 Jun 26, 2025

Uh oh!

Guest0x0 Jun 17, 2025

Uh oh!

Uh oh!

Guest0x0 Jun 17, 2025

Uh oh!

Uh oh!

FlyCloudC commented Jun 27, 2025

Uh oh!

Uh oh!

Perform immut/{set, sparse_array} #2272

Are you sure you want to change the base?

Perform immut/{set, sparse_array} #2272

Uh oh!

Conversation

FlyCloudC commented Jun 14, 2025

Overview

Uh oh!

peter-jerry-ye-code-review bot commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 94

Details

💛 - Coveralls

Uh oh!

peter-jerry-ye left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Guest0x0 commented Jun 17, 2025

Uh oh!

Guest0x0 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Guest0x0 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Guest0x0 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Guest0x0 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FlyCloudC commented Jun 27, 2025

Uh oh!

Uh oh!

peter-jerry-ye-code-review bot commented Jun 14, 2025 •

edited

Loading

coveralls commented Jun 14, 2025 •

edited

Loading