-
Notifications
You must be signed in to change notification settings - Fork 153
[18344690269] Add batch APIs for configurable strings #2763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
135ca7d to
c14b146
Compare
| arrow_string_format_default=default_2, | ||
| arrow_string_format_per_column={ | ||
| "col_2": ArrowOutputStringFormat.CATEGORICAL, | ||
| "col_3": ArrowOutputStringFormat.LARGE_STRING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird to specify this in individual ReadRequests. read_batch_and_join is kind of a hybrid of read and read_batch, but when it comes to the output format, it should look more like read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as discussed offline to instead happen via:
lazy_frames = lib.read_batch(syms, arrow_string_format_default=something arrow_string_format_per_column=something)
concat(lazy_frames).collect() # Here we will use the string configurations for the entire batch to be applied to the joined frame.
6d1e128 to
e4e198a
Compare
| [[nodiscard]] ReadOptions at(size_t idx) const { | ||
| return util::variant_match( | ||
| data_->read_options_per_symbol_, | ||
| [&](const std::vector<ReadOptions>& read_options) { return read_options.at(idx); }, | ||
| [&](ReadOptions read_options) { return read_options; } | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ReadOptions cheap to copy because this will copy them? I'd change the signature to const ReadOptions&. In that case we should capture read_options by const ref as well in the visitor
nit: vector.at will always perform bounds check. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, ReadOptions is a shared_ptr<ReadOptionsData>, so better to pass by value.
Yeah I don't have a strong opinion on the at. It is not on the hot path and thought I could leave the bounds checking to more easily debug while developing if construct the read_options incorectly, So I'm thinking to leave as is.
|
|
||
| struct BatchReadOptionsData { | ||
| ReadOptionsPerSymbol read_options_per_symbol_; | ||
| std::optional<bool> batch_throw_on_error_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that this was moved from ReadOptionsData but optional doesn't make much sense to me in general. No need to change it though I just wanted to complain about the use of it in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree it doesn't make sense to be an optional. Will update.
- Creates a new `BatchReadOptions` which separates batch level options and per symbol level options. - Exposes APIs both on V1 and V2 for `read_batch`, `read_batch_and_join` and tests for them
to be used as an override if the read request doesn't specify it. It is also used to set up per column formatting for `read_batch_and_join`
Instead make it an argument to the constructor of `BatchReadOptions`
ccd25c4 to
acb1887
Compare
Reference Issues/PRs
Monday ref: 18344690269
What does this implement or fix?
BatchReadOptionswhich separates batch level options and per symbol level options.read_batch,read_batch_and_joinand tests for themAny other comments?
Checklist
Checklist for code changes...