Skip to content

Conversation

@phoebusm
Copy link
Collaborator

@phoebusm phoebusm commented Oct 28, 2025

Reference Issues/PRs

https://man312219.monday.com/boards/7852509418/pulses/18298965201

What does this implement or fix?

  1. Makes def write and def write_pickle in v2 API support recursive normalizer
  1. Minor changes on formatter to support formatting individual files

Per discussed, in v2 API, recursive normalizer setting in LibraryOption will be respected. Default library option of recursive normalizer will be False. Existing libraries are unaffected.

Any other comments?

def batch_write in v1 API doesn't support recursive normalizer. Ticket: https://man312219.monday.com/boards/7852509418/pulses/7855436309
Therefore the support of recursive normalizer in corresponding v2 API will not be covered in this PR.

Pickling

pickling is a bit of a mess in V1 API.
For arrow and pandas data, if the normalization fails, it can fallback to msgpack and pickling, depending on whether pickle_on_failure is True.
However, for other kinds of data, it almost certainly fallback to msgpack and pickling. The only option to prevent pickling is a library config strict_mode. But I don't see there is any API to enable this option, in V1/V2 nor internally.

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@phoebusm phoebusm added the minor Feature change, should increase minor version label Oct 28, 2025
@phoebusm phoebusm marked this pull request as ready for review October 29, 2025 15:24
@phoebusm phoebusm changed the title Support recursive normalizer in write in v2 API Support recursive normalizer in v2 API write Oct 29, 2025
@phoebusm phoebusm force-pushed the feature/recursive_normalizer_write_v2 branch from d773062 to b7e0ad0 Compare October 31, 2025 16:09
@phoebusm phoebusm force-pushed the feature/recursive_normalizer_write_v2 branch from b7e0ad0 to c5ecb03 Compare November 7, 2025 16:44
metadata=metadata,
prune_previous_version=prune_previous_versions,
pickle_on_failure=True,
parallel=staged,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V2 API allows staging non-natively normalized data in write_pickle by passing staged=True. Maybe this combination should be blocked in future major release.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think there is a ticket to remove the staged argument from write_pickle as it doesn't make sense. Technically an API break though so needs to wait for a better reason to do 7.0.0

@phoebusm phoebusm force-pushed the feature/recursive_normalizer_write_v2 branch from b206cdb to 8d2c350 Compare November 19, 2025 12:50
index_column: Optional[str], default=None
Optional specification of timeseries index column if data is an Arrow table. Ignored if data is not an Arrow
table.
recursive_normalizers: bool, default None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be a bit more descriptive in the docstring. Even though it's working in V1 almost no one from the outside world knows about it. IMO we should treat it as a new feature. We should also improve the description of the PR as it will go in the release notes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have given a bit more details in the PR desc.
And I have also created a notebook for the feature. In docstring, users will be referred to it for more details

if is_recursive_normalizers_enabled:
if staged:
raise ArcticUnsupportedDataTypeException(
"Staged data must be of a type that can be natively normalized"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be a bit more explicit here and tell the user that they're trying to use recursive normalizers for staged data and that it's not allowed. It will be more useful for the user and for us when a support request comes in.

See documentation on `write`.
recursive_normalizers: bool, default None
See documentation on `write`.
If the leaf nodes cannot be natively normalized, they will be pickled,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting did I miss a discussion on this. I'd expect write pickle to pickle everything all the time regardless of normalizers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good spot. If both are enabled, recursive normalizer has priority over pickling. I have amended to docstring to explain the priority.
It follows V1 API on this.

@phoebusm phoebusm force-pushed the feature/recursive_normalizer_write_v2 branch from 804d2a5 to 2b29c99 Compare November 21, 2025 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor Feature change, should increase minor version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants