Support recursive normalizer in v2 API write #2736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

phoebusm wants to merge 7 commits into master from feature/recursive_normalizer_write_v2

+861 −42

Collaborator

phoebusm commented Oct 28, 2025 •

edited

Loading

Reference Issues/PRs

https://man312219.monday.com/boards/7852509418/pulses/18298965201

What does this implement or fix?

Makes def write and def write_pickle in v2 API support recursive normalizer

Recursive normalizer enables ArcricDB to write and read nest data structures (dict, list, tuple) of dataframes and arrays without having to pickling the entire structure
The data stored is not filterable, as is pickled data
Please refer to https://docs.arcticdb.io/latest/notebooks/arcticdb_demo_recursive_normalizers for more details

Minor changes on formatter to support formatting individual files

Per discussed, in v2 API, recursive normalizer setting in LibraryOption will be respected. Default library option of recursive normalizer will be False. Existing libraries are unaffected.

Any other comments?

def batch_write in v1 API doesn't support recursive normalizer. Ticket: https://man312219.monday.com/boards/7852509418/pulses/7855436309
Therefore the support of recursive normalizer in corresponding v2 API will not be covered in this PR.

Pickling

pickling is a bit of a mess in V1 API.
For arrow and pandas data, if the normalization fails, it can fallback to msgpack and pickling, depending on whether pickle_on_failure is True.
However, for other kinds of data, it almost certainly fallback to msgpack and pickling. The only option to prevent pickling is a library config strict_mode. But I don't see there is any API to enable this option, in V1/V2 nor internally.

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

phoebusm added the minor label

phoebusm marked this pull request as ready for review

October 29, 2025 15:24

phoebusm requested review from IvoDD, alexowens90 and poodlewars as code owners

October 29, 2025 15:24

phoebusm changed the title ~~Support recursive normalizer in write in v2 API~~ Support recursive normalizer in v2 API write

phoebusm force-pushed the feature/recursive_normalizer_write_v2 branch from d773062 to b7e0ad0 Compare

October 31, 2025 16:09

phoebusm force-pushed the feature/recursive_normalizer_write_v2 branch from b7e0ad0 to c5ecb03 Compare

November 7, 2025 16:44

alexowens90 requested changes

View reviewed changes

python/arcticdb/options.py Show resolved Hide resolved

python/arcticdb/version_store/library.py Show resolved Hide resolved

python/arcticdb/version_store/library.py Show resolved Hide resolved

python/tests/unit/arcticdb/version_store/test_recursive_normalizers.py Outdated Show resolved Hide resolved

python/tests/unit/arcticdb/version_store/test_recursive_normalizers.py Outdated Show resolved Hide resolved

phoebusm commented

View reviewed changes

python/arcticdb/version_store/library.py

    
                          metadata=metadata,

                          prune_previous_version=prune_previous_versions,

                          pickle_on_failure=True,

                          parallel=staged,

Collaborator Author

phoebusm Nov 18, 2025

V2 API allows staging non-natively normalized data in write_pickle by passing staged=True. Maybe this combination should be blocked in future major release.

Collaborator

alexowens90 Nov 19, 2025

Agreed, I think there is a ticket to remove the staged argument from write_pickle as it doesn't make sense. Technically an API break though so needs to wait for a better reason to do 7.0.0

phoebusm force-pushed the feature/recursive_normalizer_write_v2 branch from b206cdb to 8d2c350 Compare

November 19, 2025 12:50

alexowens90 approved these changes

View reviewed changes

vasil-pashov reviewed

View reviewed changes

python/arcticdb/version_store/library.py

    
                      index_column: Optional[str], default=None

                          Optional specification of timeseries index column if data is an Arrow table. Ignored if data is not an Arrow

                          table.

                      recursive_normalizers: bool, default None

Collaborator

vasil-pashov Nov 21, 2025

I think we should be a bit more descriptive in the docstring. Even though it's working in V1 almost no one from the outside world knows about it. IMO we should treat it as a new feature. We should also improve the description of the PR as it will go in the release notes.

Collaborator Author

phoebusm Nov 21, 2025

I have given a bit more details in the PR desc.
And I have also created a notebook for the feature. In docstring, users will be referred to it for more details

python/arcticdb/version_store/library.py Outdated

    
                          if is_recursive_normalizers_enabled:

                              if staged:

                                  raise ArcticUnsupportedDataTypeException(

                                      "Staged data must be of a type that can be natively normalized"

Collaborator

vasil-pashov Nov 21, 2025

I'd be a bit more explicit here and tell the user that they're trying to use recursive normalizers for staged data and that it's not allowed. It will be more useful for the user and for us when a support request comes in.

python/arcticdb/version_store/library.py

    
                          See documentation on `write`.

                      recursive_normalizers: bool, default None

                          See documentation on `write`.

                          If the leaf nodes cannot be natively normalized, they will be pickled,

Collaborator

vasil-pashov Nov 21, 2025

Interesting did I miss a discussion on this. I'd expect write pickle to pickle everything all the time regardless of normalizers.

Collaborator Author

phoebusm Nov 21, 2025

Yes good spot. If both are enabled, recursive normalizer has priority over pickling. I have amended to docstring to explain the priority.
It follows V1 API on this.

vasil-pashov approved these changes

View reviewed changes

phoebusm added 7 commits

November 21, 2025 18:20


          Support recursive normalizer in write in v2 API

2eac321


          Make recursive normalizer default to be false in library option

7c87869


          Fix tests

7b13931


          Fix test

d5e8cdb


          Address PR comments

bb6813f


          Notebook for recursive normalizer

a485b59


          Address docstring comments

2b29c99

phoebusm force-pushed the feature/recursive_normalizer_write_v2 branch from 804d2a5 to 2b29c99 Compare

November 21, 2025 18:20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor