Implement AnyRee#9959
Conversation
f398606 to
d626ef8
Compare
d626ef8 to
b7e7b73
Compare
| &self.values | ||
| } | ||
|
|
||
| fn with_values(&self, values: ArrayRef) -> ArrayRef { |
There was a problem hiding this comment.
Oh I just realized we could have delegated to the existing with_values() 🤦
arrow-rs/arrow-array/src/array/run_array.rs
Lines 247 to 260 in 7abb225
Though it looks like it also doesn't verify the incoming values has the correct datatype 😬
There was a problem hiding this comment.
hmmmm, I'll replace the current with_values() my implementation of with_values(). This way we dont duplicate existing methods and get datatype validation
c352224 to
f601fc0
Compare
| DataType::RunEndEncoded(r, v) => (r, v), | ||
| DataType::RunEndEncoded(r, v) => ( | ||
| r, | ||
| Arc::new( |
There was a problem hiding this comment.
is this a bug fix? It seems like it is now (correctly) using the values.data_type rather than the the pre-exsisting data type
I think the new code looks correct but should we add a test?
| }; | ||
| let data_type = | ||
| DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(values_field)); | ||
| let dt = DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(&values_field)); |
There was a problem hiding this comment.
I think you could just do this (no need to clone the new value)
let dt = DataType::RunEndEncoded(Arc::clone(run_ends_field), values_field);Also if you kept the data_type name it would keep the line below cleaner too
let data_type = ...| .buffers(vec![self.run_ends.inner().inner().clone()]) | ||
| .build_unchecked() | ||
| }; | ||
| Arc::new(PrimitiveArray::<R>::from(data)) |
There was a problem hiding this comment.
It would be faster to make the PrimitiveArray directly rather than ArrayDat a-- among other things it doesn't require allocating a vec and it doesn't need unsafe:
fn run_ends(&self) -> ArrayRef {
let values = self.run_ends.inner().clone();
let nulls = None;
Arc::new(PrimitiveArray::<R>::new(values, nulls))
}bbe594c to
931003b
Compare
Jefffrey
left a comment
There was a problem hiding this comment.
Sorry this is dragging out so long, came back with a fresh set of eyes. Just wanna ensure we're getting this correct from the start 😅
| /// Returns the run ends of this array as a primitive array. | ||
| fn run_ends(&self) -> ArrayRef; | ||
|
|
||
| /// Returns the values of this array. | ||
| fn values(&self) -> &Arc<dyn Array>; | ||
|
|
||
| /// Returns a new run-end encoded array with the given values, preserving the | ||
| /// existing run ends. | ||
| fn with_values(&self, values: ArrayRef) -> ArrayRef; | ||
| } |
There was a problem hiding this comment.
Now that I think of it, how do these APIs handle when the original RunArray is sliced? I think they return unsliced original data, and especially for run_ends it erases the logical slicing that may have been applied?
| }; | ||
| let data_type = | ||
| DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(values_field)); | ||
| DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(&values_field)); |
There was a problem hiding this comment.
| DataType::RunEndEncoded(Arc::clone(run_ends_field), Arc::clone(&values_field)); | |
| DataType::RunEndEncoded(Arc::clone(run_ends_field), values_field); |
Which issue does this PR close?
closes #9909.
AnyRunArraytrait #9909.Rationale for this change
makes the API simpler to work with & less code duplication
What changes are included in this PR?
Replace the per-key-type RunEndEncoded match arms in length/bit_length (arrow-string) and date_part (arrow-arith) with a single dispatch through the new
AsArray::as_any_ree_opt/as_any_reereturning &dyn AnyRunEndArray, mirroring the existing dictionary handling. This removes thenow-unused
ree_map!macro, leaving one trait-object code path for all Int16/Int32/Int64 run-end types.Are these changes tested?
yes
Are there any user-facing changes?
no