feat(arrow/compute): sort support#749
feat(arrow/compute): sort support#749hamilton-earthscope wants to merge 18 commits intoapache:mainfrom
Conversation
| func (c *physicalSortInt8Column) isNullLikeAt(uint64) bool { return false } | ||
| func (c *physicalSortInt8Column) columnHasValidityNulls() bool { | ||
| return c.base.columnHasValidityNulls() | ||
| } |
There was a problem hiding this comment.
can't we use go generics here and just embed the base?
i.e.
type physicalSortColumn[T arrow.ValueTypes] struct { physicalColumnBase }
...
func (c *physicalSortColumn[T]) compareRowsForKey(i, j uint64, key SortKey) int {
ai, aj, li, lj := c.pair(i, j)
a := ai.(arrow.TypedArray[T])
b := aj.(arrow.TypedArray[T])
if c.validityNulls {
if v, stop := compareKeyedNulls(a.IsNull(li), b.IsNull(lj), key); stop {
return v
}
}
return compareOrdered(key.Order, a.Value(li), b.Value(lj))
}
func (c *physicalSortColumn[T]) isNullAt(row uint64) bool { return c.isNullAtGlobal(row) }etc.
There was a problem hiding this comment.
We certainly can and I made several attempts at a generic solution to avoid code duplication. Correctness was straightforward given the test suite, but I failed to match the performance of these verbose single-use classes via go's generics. The closest I got was ~25% slower. I opted for the performance over maintenance thinking this would be in the hot path for some critical operations (sorted iceberg writes/maintenance activities).
When I say performance, I mean the following benchmark test suite. It is rather limited in that it's only testing int64 and string, but the results were consistent across the types.
go test ./arrow/compute/internal/kernels -bench=BenchmarkSortIndices -benchmem
There was a problem hiding this comment.
Interesting, I'm surprised that there was that much of a slowdown using generics. I'll take a look at this but for now I agree that the performance is definitely a higher priority. Maybe we set this up like elsewhere that we do codegen? Just so that we can prevent future bugs if we change this logic
| sortColumns = make([]*arrow.Chunked, len(inputSortKeys)) | ||
| needsRelease = make([]bool, len(inputSortKeys)) | ||
| for i, key := range inputSortKeys { |
There was a problem hiding this comment.
if we're pre-ordering by the sort keys ColumnIndex fields, we should at least document that in the internal kernel implementation that this is assumed to have happened already.
There was a problem hiding this comment.
done. let me know if the docstring on the kernel method is insufficient.
Summary
Implements stable
sort_indices(andsortviatake) for arrays, chunked arrays, record batches, and tables using logical row indices overChunkeddata without concatenating chunks. The control flow and ordering rules are modeled on Apache Arrow C++vector_sort.cc/vector_sort_internal.h, with a few Go- and performance-driven differences called out below.Parity with Arrow C++ (
vector_sort.cc/vector_sort_internal.h)Same overall structure
Single sort key, one column
null_count == 0and there are no null-likes).vector_sort_internal.go).Multiple sort keys
len(keys) <= kMaxRadixSortKeys(8): MSD radix path per record-batch range (radixRecordBatchSortRange↔ ConcreteRecordBatchColumnSorter::SortRange).multipleKeyRecordBatchSortRange).Same ordering semantics (intended match to C++)
slices.SortStableFuncare used so tie-breaking matches the C++ “left before right” stable merge behavior where documented in code.Same “column comparator” role
columnComparatorinterface ↔ C++ColumnComparator:compareRowsForKey, null / null-like metadata,columnHasValidityNulls(skip PartitionNullsOnly when there are no validity nulls).Physical types
vector_sort_physical.go, analogous to C++ConcreteColumnComparator<T>(concrete*array.T+ directValue/Cmp/ special cases for bool and intervals).Intentional differences and rationale
logicalRowMap: onerowMapCell{chunk, local}per logical row whenlen(chunks) > 1;pair(i,j)resolves two rows in one shot. Why: random compares during sort/merge need O(1) resolution; a flat table + co-located fields beats repeated resolver work and improves locality vs separatechunk/localslices.physicalColumnBasemethodspair/isNullAtGlobal/cell. Why: value receivers would copy slice headers (and map state) on every compare.std::stable_sortslices.SortStableFunc(Go 1.21+). Why: library primitive; semantics aligned with stable weak ordering used elsewhere in the port.columnComparatorinterface for “which column” in multi-key and merge loops. Why: idiomatic Go; per-type work stays in concretecompareRowsForKeyimplementations.lessover full row order after per-chunk partitioning/sort. Why: simpler merge while preserving order as long as per-chunk phases match C++; documented invector_sort.gocomments.File Layout
arrow/compute/vector_sort.go—sort_indices/sortregistration and datum dispatch.arrow/compute/vector_sort_test.go— functional tests.arrow/compute/internal/kernels/vector_sort.go— orchestration, merge,SortIndiceskernel.arrow/compute/internal/kernels/vector_sort_internal.go— null partitions, radix / multi-key batch sort.arrow/compute/internal/kernels/vector_sort_support.go—logicalRowMapand ordering helpers.arrow/compute/internal/kernels/vector_sort_physical.go— per-type column comparators.arrow/compute/internal/kernels/vector_sort_bench_test.go— benchmarks.Testing
go test ./arrow/compute -run TestSort -count=1go test ./arrow/compute/internal/kernels -bench=BenchmarkSortIndices -benchmem.References
cpp/src/arrow/compute/kernels/vector_sort.ccandvector_sort_internal.h(and related comparators).Related Issues