Skip to content

Implement AnyRee#9959

Open
Rich-T-kid wants to merge 6 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/Ree-AnyArray
Open

Implement AnyRee#9959
Rich-T-kid wants to merge 6 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/Ree-AnyArray

Conversation

@Rich-T-kid
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

closes #9909.

Rationale for this change

makes the API simpler to work with & less code duplication

What changes are included in this PR?

Replace the per-key-type RunEndEncoded match arms in length/bit_length (arrow-string) and date_part (arrow-arith) with a single dispatch through the new AsArray::as_any_ree_opt/as_any_ree returning &dyn AnyRunEndArray, mirroring the existing dictionary handling. This removes the
now-unused ree_map! macro, leaving one trait-object code path for all Int16/Int32/Int64 run-end types.

Are these changes tested?

yes

Are there any user-facing changes?

no

@github-actions github-actions Bot added the arrow Changes to the arrow crate label May 11, 2026
Comment thread arrow-array/src/array/run_array.rs Outdated
Comment thread arrow-array/src/array/run_array.rs
Comment thread arrow-array/src/array/run_array.rs
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Ree-AnyArray branch from f398606 to d626ef8 Compare May 11, 2026 15:04
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Ree-AnyArray branch from d626ef8 to b7e7b73 Compare May 11, 2026 15:04
Comment thread arrow-arith/src/temporal.rs Outdated
Comment thread arrow-string/src/length.rs Outdated
Comment thread arrow-string/src/length.rs Outdated
Comment thread arrow-array/src/array/run_array.rs Outdated
&self.values
}

fn with_values(&self, values: ArrayRef) -> ArrayRef {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just realized we could have delegated to the existing with_values() 🤦

pub fn with_values(&self, values: ArrayRef) -> Self {
assert_eq!(values.len(), self.values().len());
let (run_ends_field, values_field) = match &self.data_type {
DataType::RunEndEncoded(r, v) => (r, v),
_ => unreachable!("RunArray should have type RunEndEncoded"),
};
let data_type =
DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(values_field));
Self {
data_type,
run_ends: self.run_ends.clone(),
values,
}
}

Though it looks like it also doesn't verify the incoming values has the correct datatype 😬

Copy link
Copy Markdown
Contributor Author

@Rich-T-kid Rich-T-kid May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm, I'll replace the current with_values() my implementation of with_values(). This way we dont duplicate existing methods and get datatype validation

@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Ree-AnyArray branch from c352224 to f601fc0 Compare May 12, 2026 13:36
Comment thread arrow-array/src/array/run_array.rs Outdated
DataType::RunEndEncoded(r, v) => (r, v),
DataType::RunEndEncoded(r, v) => (
r,
Arc::new(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bug fix? It seems like it is now (correctly) using the values.data_type rather than the the pre-exsisting data type

I think the new code looks correct but should we add a test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test

Comment thread arrow-array/src/array/run_array.rs Outdated
};
let data_type =
DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(values_field));
let dt = DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(&values_field));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just do this (no need to clone the new value)

        let dt = DataType::RunEndEncoded(Arc::clone(run_ends_field), values_field);

Also if you kept the data_type name it would keep the line below cleaner too

        let data_type = ...

Comment thread arrow-array/src/array/run_array.rs Outdated
.buffers(vec![self.run_ends.inner().inner().clone()])
.build_unchecked()
};
Arc::new(PrimitiveArray::<R>::from(data))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be faster to make the PrimitiveArray directly rather than ArrayDat a-- among other things it doesn't require allocating a vec and it doesn't need unsafe:

    fn run_ends(&self) -> ArrayRef {
        let values  = self.run_ends.inner().clone();
        let nulls = None;
        Arc::new(PrimitiveArray::<R>::new(values, nulls))
    }

@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Ree-AnyArray branch from bbe594c to 931003b Compare May 15, 2026 04:52
Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this is dragging out so long, came back with a fresh set of eyes. Just wanna ensure we're getting this correct from the start 😅

Comment on lines +797 to +806
/// Returns the run ends of this array as a primitive array.
fn run_ends(&self) -> ArrayRef;

/// Returns the values of this array.
fn values(&self) -> &Arc<dyn Array>;

/// Returns a new run-end encoded array with the given values, preserving the
/// existing run ends.
fn with_values(&self, values: ArrayRef) -> ArrayRef;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of it, how do these APIs handle when the original RunArray is sliced? I think they return unsliced original data, and especially for run_ends it erases the logical slicing that may have been applied?

};
let data_type =
DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(values_field));
DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(&values_field));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(&values_field));
DataType::RunEndEncoded(Arc::clone(run_ends_field), values_field);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AnyRunArray trait

3 participants