Vortex long term roadmap #7792
joseph-isaacs
started this conversation in
Feature Requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Vortex Roadmap
The long term vortex roadmap. This should contain all considerable work items vortex wishes to undertake. This is is not particular order.
GPU
Support direct-to-gpu data loading and decompression. #7712
Vortex types
Ext VTable
Users should be able to easily extend the Vortex type system with their own type constructs that nicely fit their application model and use case. #7683
Variant
Add a Variant logical type with canonical array, zero-copy interop with Arrow
Parquet-Variant, and a
VariantGetexpression for projection. #7717Other Arrow types (Union, …)
Add the final arrow (logical)
data_types to vortex.UniontoDType#7705 — AddUniontoDTypeMapDTypeVector support
Build on the existing Vector extension type and similarity scan to add vector
indexes, ANN, and top-k search so Vortex is a viable backbone for vector
workloads.
Integrations
DataFusion integration
Close the remaining DataFusion gaps: dynamic filter expressions, partition
reporting, casting compatibility, and richer scan metrics.
DuckDB integration
Reach Parquet performance parity in DuckDB and ship the missing pushdown /
partitioning / object-store features.
External Language API
PyVortex
Improvements to the python Vortex API
Java Vortex
Improvements to the Java Vortex API, Iceberg, etc.
C Vortex
Improve and stabilise the C Vortex API
WASM extension (forward compat)
Allow WASM backed extensible Vortex components (e.g. array, layout, ...)
Layout API design
Change the layout API to allow better compute and I/O behaviour for Vortex operations
Sub-segment reads
Allow readers to fetch arbitrary byte ranges of a flat layout instead of always
reading whole segments, cutting read amplification. Requires a change in Layouts
Finish lazy compute migration
Complete the move to the deferred-iterative execute model, deprecate the
canonicalize path, and add CSE.
Improve list handling in Vortex
Have better performance and expressiveness for lists
Fix up compute for List and ListView
List and ListView compute kernels need performance optimisations.
Higher-order functions (HoF)
Support HoF functions for list operation #7334 and add expressions using this.
List Layout
Add a list layout to defer IO for list-typed reads
Push-based writer
Include a simpler explicit-flush, push-based API. This will allow for predictable steaming compression. Keep current pull-based model.
TODO - WIP below this point
File format stability
Improve the systems we have the ensure better backward compat with newer versions of Vortex.
Add checksums #3083 and encryption #1884
Encodings & compression
Expand on array encodings and compression schemes
Delta & RLE
PCodec outline
Outline PCodec element to allow the compress to use this with other scheme. Stop PCodec being only a block based scheme.
I/O subsystem
Improve the handling of I/O in Vortex.
Correctness / fuzzer
Improve the fuzzer.
Documentation update
Improve documentation
New array model
Document the new array VTable, reduce/execute rules and execution system.
Beta Was this translation helpful? Give feedback.
All reactions