Skip to content

Conversation

@IvoDD
Copy link
Collaborator

@IvoDD IvoDD commented Nov 18, 2025

Reference Issues/PRs

Monday ref: 18344690269

What does this implement or fix?

  • Creates a new BatchReadOptions which separates batch level options and per symbol level options.
  • Exposes APIs both on V1 and V2 for read_batch, read_batch_and_join and tests for them

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@IvoDD IvoDD force-pushed the batch-configurable-strings branch from 135ca7d to c14b146 Compare November 18, 2025 12:07
@IvoDD IvoDD added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Nov 18, 2025
arrow_string_format_default=default_2,
arrow_string_format_per_column={
"col_2": ArrowOutputStringFormat.CATEGORICAL,
"col_3": ArrowOutputStringFormat.LARGE_STRING,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to specify this in individual ReadRequests. read_batch_and_join is kind of a hybrid of read and read_batch, but when it comes to the output format, it should look more like read

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as discussed offline to instead happen via:

lazy_frames = lib.read_batch(syms, arrow_string_format_default=something arrow_string_format_per_column=something)
concat(lazy_frames).collect() # Here we will use the string configurations for the entire batch to be applied to the joined frame.

@IvoDD IvoDD force-pushed the batch-configurable-strings branch 2 times, most recently from 6d1e128 to e4e198a Compare November 19, 2025 11:02
Comment on lines +116 to +128
[[nodiscard]] ReadOptions at(size_t idx) const {
return util::variant_match(
data_->read_options_per_symbol_,
[&](const std::vector<ReadOptions>& read_options) { return read_options.at(idx); },
[&](ReadOptions read_options) { return read_options; }
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ReadOptions cheap to copy because this will copy them? I'd change the signature to const ReadOptions&. In that case we should capture read_options by const ref as well in the visitor

nit: vector.at will always perform bounds check. Is that intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, ReadOptions is a shared_ptr<ReadOptionsData>, so better to pass by value.

Yeah I don't have a strong opinion on the at. It is not on the hot path and thought I could leave the bounds checking to more easily debug while developing if construct the read_options incorectly, So I'm thinking to leave as is.


struct BatchReadOptionsData {
ReadOptionsPerSymbol read_options_per_symbol_;
std::optional<bool> batch_throw_on_error_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that this was moved from ReadOptionsData but optional doesn't make much sense to me in general. No need to change it though I just wanted to complain about the use of it in general.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree it doesn't make sense to be an optional. Will update.

- Creates a new `BatchReadOptions` which separates batch level options
  and per symbol level options.
- Exposes APIs both on V1 and V2 for `read_batch`, `read_batch_and_join`
  and tests for them
to be used as an override if the read request doesn't specify it.

It is also used to set up per column formatting for `read_batch_and_join`
Instead make it an argument to the constructor of `BatchReadOptions`
@IvoDD IvoDD force-pushed the batch-configurable-strings branch from ccd25c4 to acb1887 Compare November 20, 2025 15:19
@IvoDD IvoDD merged commit cbde82d into master Nov 21, 2025
191 of 193 checks passed
@IvoDD IvoDD deleted the batch-configurable-strings branch November 21, 2025 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants