Skip to content

Conversation

@edan-bainglass
Copy link
Member

@edan-bainglass edan-bainglass commented Aug 24, 2025

Some parts of the ORM module do not pass a serialization round trip. This PR addresses this, but further discussion is needed regarding what should actually be (de)serialized. If interested, good to read aiidateam/AEP#40 and discussion on #6255. Some discussion regarding mismatch between an object's constructor args and its Model fields can be found here.

Good to also discuss if the use of pydantic should be extended to ORM instance creation in general, not only by way of from_serialized. This is not implemented in this PR (and is out of scope) but can be addressed in a follow-up PR if deemed "correct".

Open questions

  • There is still the matter of repository_content. The current implementation uses base64 encoding/decoding, but this is clearly not ideal for large files. Some discussion was had w.r.t switching to links to some online storage of files. TBD
  • File serialization is TBD in general, for arrays also

Updates

The PR now also introduces the following:

  • Escape hatch for Data plugins via attributes
  • A guard against unhandled attributes in Data plugins (not allowed in the backend node)
  • A dedicated InputModel - a derived view of the defined entity Model suitable for Entity creation

Note for PR reviewers

There are a few changes that are likely out of scope. These will move to dedicated PRs prior to merge of this PR.

@edan-bainglass edan-bainglass marked this pull request as draft August 24, 2025 08:03
@codecov
Copy link

codecov bot commented Aug 24, 2025

Codecov Report

❌ Patch coverage is 84.35754% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.54%. Comparing base (27b52da) to head (60b34b0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/orm/nodes/data/array/kpoints.py 48.72% 20 Missing ⚠️
src/aiida/orm/nodes/repository.py 23.08% 10 Missing ⚠️
src/aiida/orm/nodes/node.py 80.44% 9 Missing ⚠️
src/aiida/orm/nodes/data/array/trajectory.py 77.28% 5 Missing ⚠️
src/aiida/orm/nodes/data/code/installed.py 63.64% 4 Missing ⚠️
src/aiida/orm/nodes/data/singlefile.py 76.48% 4 Missing ⚠️
src/aiida/orm/entities.py 96.43% 2 Missing ⚠️
src/aiida/orm/fields.py 50.00% 1 Missing ⚠️
src/aiida/orm/nodes/data/array/array.py 83.34% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6990      +/-   ##
==========================================
- Coverage   77.60%   77.54%   -0.06%     
==========================================
  Files         566      566              
  Lines       43549    43721     +172     
==========================================
+ Hits        33793    33899     +106     
- Misses       9756     9822      +66     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@edan-bainglass edan-bainglass force-pushed the fix-node-serialization branch 4 times, most recently from 9b7cfbd to d83ced6 Compare August 24, 2025 11:33
@edan-bainglass
Copy link
Member Author

@khsrali regarding one of the failed tests...

tests/orm/nodes/data/test_remote_stash.py::test_constructor_invalid_folder[stash_mode-copy] - ValueError: `RemoteStashFolderData` can only be used with `stash_mode == StashMode.COPY`

In this PR, I lift some validation to the top of the constructor (of stash data classes - see 91e4ad5), to avoid any operations if the object is bound to fail. However, this seems to introduce failures in testing them. It suggests that perhaps there is some order to the operations, though I don't see it. Can you please comment?

@edan-bainglass
Copy link
Member Author

@khsrali regarding one of the failed tests...

tests/orm/nodes/data/test_remote_stash.py::test_constructor_invalid_folder[stash_mode-copy] - ValueError: `RemoteStashFolderData` can only be used with `stash_mode == StashMode.COPY`

In this PR, I lift some validation to the top of the constructor (of stash data classes - see 91e4ad5), to avoid any operations if the object is bound to fail. However, this seems to introduce failures in testing them. It suggests that perhaps there is some order to the operations, though I don't see it. Can you please comment?

It was checking for TypeError, which is what happens when you delay the stash_mode check and provide a non-Enum type for it. But there is no need for it, as the stash_mode check covers that issue as well. I updated the test to parameterize also over error, checking for ValueError in the case of the invalid stash_mode.

@edan-bainglass
Copy link
Member Author

@GeigerJ2 okay, this is ready for others to inspect. I did my best to isolate the commits and provided comments on each. Happy to discuss further.

Pinging also @agoscinski @superstar54 if interested.

Pinging @sphuber for input/feedback, if he has time.

@edan-bainglass
Copy link
Member Author

It may be possible to rely on the post models of aiida-restapi as a reference for defining ORM constructor parameters, as the post models are intended to represent serialized objects passed to the REST API for object construction. Looking into this.

@edan-bainglass
Copy link
Member Author

edan-bainglass commented Oct 8, 2025

@danielhollas what is this about?

Critical: potential `verdi` speed problem: `aiida.orm.fields` module is imported which is not in: ('aiida.brokers', 'aiida.cmdline', 'aiida.common', 'aiida.manage', 'aiida.plugins', 'aiida.restapi')

Nevermind. I see that importing orm in the cmdline raises alarm bells. Removed...

@edan-bainglass edan-bainglass force-pushed the fix-node-serialization branch from 516a6df to 2c3e5df Compare October 8, 2025 16:54
@danielhollas
Copy link
Collaborator

Nevermind. I see that importing orm in the cmdline raises alarm bells. Removed...

Nice, the system works. :-) Feel free to improve the error message here to make it more obvious that this is about the aiida.cmdline module.

@edan-bainglass edan-bainglass force-pushed the fix-node-serialization branch from d6e44ff to 0b37ef6 Compare October 8, 2025 18:17
@edan-bainglass
Copy link
Member Author

Nevermind. I see that importing orm in the cmdline raises alarm bells. Removed...

Nice, the system works. :-) Feel free to improve the error message here to make it more obvious that this is about the aiida.cmdline module.

#7059

@edan-bainglass
Copy link
Member Author

@danielhollas what do you think about ignoring Ruff N806 - "~variable should be lowercase"? See case below. Model is not an instance, but a class. Naming it model would be misleading, hence I went with Model.

Model = cls.Model.as_input_model()
^^^^^ N806

@danielhollas
Copy link
Collaborator

@danielhollas what do you think about ignoring Ruff N806 - "~variable should be lowercase"?

Do you mean ignoring locally (fine) or globally?

@edan-bainglass
Copy link
Member Author

edan-bainglass commented Oct 8, 2025

@danielhollas what do you think about ignoring Ruff N806 - "~variable should be lowercase"?

Do you mean ignoring locally (fine) or globally?

Global. There are many cases when the variable is a class, not an instance. I've pushed this in my last commit just to verify that it works. Okay with removing it in favor of local handling, but would like to hear the reason against a global N806 rule.

Update

Nice. Ignoring N806 globally raised a whole lot of RUF100 due to the codebase being littered with local N806 rules. I'd say that supports my case 🙂

@edan-bainglass
Copy link
Member Author

@danielhollas done for tonight. Will revisit this in the morning 😴

@danielhollas
Copy link
Collaborator

Nice. Ignoring N806 globally raised a whole lot of RUF100 due to the codebase being littered with local N806 rules. I'd say that supports my case 🙂

Yeah, running git grep N806 indeed does get a lot of hits (although most of them are in tests/).

Seems fine to remove it, or alternatively, use the https://docs.astral.sh/ruff/settings/#lint_pep8-naming_extend-ignore-names configuration to automatically ignore some common patterns (e.g. the one you have here, and common class names produced by factory functions (SinglefileData = DataFactory('core.singlefile')

e.g. something like this in pyproject.toml

[tool.ruff.lint.pep8-naming]
ignore-names = ["[A-Z]*Data", "Model"]

In any case please open a separate PR for that so we don't polute this one with bikeshedding discussion and unrelated changes.

@edan-bainglass
Copy link
Member Author

In any case please open a separate PR for that so we don't polute this one with bikeshedding discussion and unrelated changes.

Thanks @danielhollas. Then for this one, since there are only a few cases in my PR, I will locally ignore them. Will open a PR for the pattern handling shortly after.

@edan-bainglass edan-bainglass force-pushed the fix-node-serialization branch 4 times, most recently from 5faafb0 to a4a5b3e Compare October 9, 2025 06:04
@edan-bainglass
Copy link
Member Author

TODO

  • Check if exclude_from_cli is redundant - can we use exclude_to_orm instead? Check attributes and extras
  • Node.Model.computer should not be storing the label (not unique) but rather the uuid
  • Reduce hardcoding w.r.t orm_to_model definition

"""Create a new `Code` instance."""
try:
instance = cls._from_model(cls.Model(**kwargs))
model = cls.InputModel(**kwargs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More is said about InputModel in later comments. For now, understand that this is intended as a derived model from an ORM class's model, stripping away all read-only fields (marked with exclude_to_orm=True) and is used by the CLI and REST API for node creation.

key='exclude_from_cli',
default=False,
):
if get_metadata(field_info, key='exclude_from_cli'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The None default serves the same purpose

Comment on lines +248 to 250
# FIXME resolve this hardcoded special case properly
if field_name == 'filepath_files':
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer here we explicitly hardcode 'filepath_files', as it is the intended exclusion here. is_attribute is used for other purposes, not for marking exclusion from the CLI show command. Reminding @GeigerJ2 to open an issue (optionally reply here with a link to the issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants