Skip to content

Round-trip extension dtypes through Arrow#7726

Open
palaska wants to merge 18 commits intodevelopfrom
bp/vectt
Open

Round-trip extension dtypes through Arrow#7726
palaska wants to merge 18 commits intodevelopfrom
bp/vectt

Conversation

@palaska
Copy link
Copy Markdown
Contributor

@palaska palaska commented Apr 30, 2026

Vortex extension dtypes registered on a session now survive the Arrow boundary in both directions: schema-level (Schema / RecordBatch / nested fields) and leaf-array-level (pa.ExtensionArray over the C ABI).

palaska added 5 commits April 29, 2026 17:43
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 30, 2026

Merging this PR will degrade performance by 17.48%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 5 improved benchmarks
❌ 5 regressed benchmarks
✅ 1188 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime mix[50%_in/50%_out] 338.9 µs 391.2 µs -13.35%
Simulation decompress_rd[f32, (100000, 0.01)] 495.1 µs 582.9 µs -15.07%
Simulation decompress_rd[f32, (100000, 0.0)] 583.5 µs 495.9 µs +17.65%
Simulation decompress_rd[f32, (100000, 0.1)] 495.1 µs 582.9 µs -15.07%
Simulation decompress_rd[f64, (10000, 0.01)] 138.6 µs 122.3 µs +13.41%
Simulation decompress_rd[f64, (10000, 0.1)] 138.7 µs 122.4 µs +13.33%
Simulation decompress_rd[f64, (10000, 0.0)] 138.5 µs 122.4 µs +13.17%
Simulation decompress_rd[f64, (100000, 0.01)] 842.6 µs 1,020.8 µs -17.46%
Simulation decompress_rd[f64, (100000, 0.1)] 842.5 µs 1,021 µs -17.48%
Simulation bitwise_not_vortex_buffer_mut[128] 275.3 ns 246.1 ns +11.85%

Comparing bp/vectt (7cc57ab) with develop (c4feed7)

Open in CodSpeed

@palaska palaska marked this pull request as draft April 30, 2026 10:46
palaska added 4 commits April 30, 2026 11:53
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
@palaska palaska added the changelog/fix A bug fix label Apr 30, 2026
@palaska palaska marked this pull request as ready for review April 30, 2026 14:15
@palaska palaska requested a review from connortsui20 April 30, 2026 14:15

impl FromArrowArray<&ArrowStructArray> for ArrayRef {
fn from_arrow(value: &ArrowStructArray, nullable: bool) -> VortexResult<Self> {
Self::from_arrow_with_session(value, nullable, &LEGACY_SESSION)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we deprecate this method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the ones without session arg? Can do but the surface area is huge, it should probably be a separate mechanical PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. could we do a PR which only does that and another (stacked/following) that does this semantic change?

palaska added 4 commits May 1, 2026 11:54
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
# Conflicts:
#	vortex-array/public-api.lock
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Comment on lines +102 to +108
// Temporal extensions stay wrapped — `to_arrow_temporal` reads their metadata.
// Other extensions unwrap to storage; their identity lives on the Field.
if let DType::Extension(ext) = self.dtype()
&& ext.metadata_opt::<AnyTemporal>().is_none()
{
let ext = self.execute::<ExtensionArray>(ctx)?;
return ext.storage_array().clone().execute_arrow(data_type, ctx);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this impl all arrow ext type have the same layout as vortex ones

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extension is just a wrapper around storage types so yes, they will have the same layout. One difference is there is no extension DataType in arrow, extension types are just Field metadata. We just convert the storage type and attach the right metadata on the field when converting a vortex ext type to arrow.

@palaska palaska mentioned this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants