Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aliases now also work for nested fields; Only retrieve data required for constructing a response from the database. #1304
base: main
Are you sure you want to change the base?
Aliases now also work for nested fields; Only retrieve data required for constructing a response from the database. #1304
Changes from 22 commits
b949945
7adbf4a
8c05579
6c8294b
0ebc724
4388c9d
c29cdbb
368792a
38fcde7
5cec6d6
cbba008
67f2dea
775009d
1358b9d
bb228ee
8245884
7b2320b
d87bc1b
6b99822
841520d
c22089f
1146273
4c46f55
3a908bc
1e49fb8
596bb73
703a9df
f174850
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to incur a significant performance overhead, but I guess you want to do it so that you don't have to pull e.g., entire trajectories from the database each time, yet you still want to deserialize the JSON into your classes? I think I would suggest we instead have a per-collection deserialization flag, as presumably you only want to deserialize trajectories once (on database insertion) anyway. Does that make sense?
If you want to retain this approach, it might be cleaner to do it at the pydantic level, e.g., a
root_validator
that explicitly sets all missing fields toNone
(see also https://pydantic-docs.helpmanual.io/usage/models/#creating-models-without-validation as an option).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is particularly heavy compared to all the other things we do in the code.
It is what we previously did in handle_response_fields. I only moved it here, so we can do it before the deserialization and added code to handle nested fields.
For biomolecular data, a structure can easily have 10,000 atoms, so retrieving them from the database and putting them in the model would take some time. This way we can avoid that if the species_at_sites and cartesian_site_positions are not in the response_fields. (I also made a patch in the code for Barcelona that allowed them to specify the default response fields, so they can choose to not have these fields in the response by default.)
I did not want to make the change even bigger by bypassing the rest of the validator (as in your second suggestion).
But from a performance viewpoint, bypassing the validation would be good.
Do you want me to add this to this PR or to a future PR ?
I tried the root validator idea, but it seems I already get an error before the root_validator is executed, so I do not think this solution will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, fair enough, just looks a bit scarier as a double for loop alongside the recursive descent into dictionaries to get the nested aliases. It's quite hard to reason about this, so I might set up a separate repo for measuring performance in the extreme limits (1 structure of 10000 atoms vs 10000 structures of a few atoms -- i.e., what we have now, ignoring pagination of course).
Did you use
@root_validator(pre=True)
to make sure it gets run before anything else? Perhaps bypassing the validation altogether can wait for another PR, as you suggest, but I'd like to make the performance tests first at least (doesn't necessarily hold up this PR but it might hold up the next release).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed I made a mistake in my test script.
So it may be possible to do this with a root_validator after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a root_validator to the attributes class that, if a flag is set, checks whether all required fields are present and if not adds them and sets them to 0.
I'll try to put the handling of the other include fields back to the place where it happened originally so the code changes less.