-
-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pydantic_model_creator refactorization #1745
Conversation
Hi! Thank you for looking into it Although before I would be able to give any constructive feedback - you would still need to fix testes (and rebase to actual develop), as in most cases tests were written with holding some edgecase in mind, so I wouldn't want to risk just throwing them all away. I am also okay with changes in API of model creator if there is need in that, as long as it won't be confusing for people updating lib |
should be ready for review
tortoise/fields/base.py
Outdated
@@ -441,3 +463,21 @@ def default_name(default: Any) -> Optional[Union[int, float, str, bool]]: | |||
desc["db_field_types"] = self.get_db_field_types() | |||
|
|||
return desc | |||
|
|||
def describe_by_dataclass(self) -> FieldDescriptionBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I'm quite new to the project but I looked into pydantic_model_creator
a bit and it's obvious that it's convoluted, it has issues and begs for refactoring. So thank you for working on this!
I'm not sure though that extending the core classes like Field
, RelationalField
and others with functionality required for one particular extension is optimal. Basically contrib.pydantic
is not a part of ORM, it's an extension that only some users use. So it seems to me that it would make more sense to have the describe_by_dataclass
logic inside of contrib.pydantic
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. Originally I planned to change the describe() method to use the describe_by_dataclass() method, with something like asdict(self.describe_by_dataclass()). That is because describe() only returns a dict, so typing is not working with describe(). But this change would become a bit more larger and out of focus of the original intention of this PR. with other words: I think describe_by_dataclass() could be helpfull in many points, but I can see why it should be moved to a dedicated place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to tortoise.contrib.pydantic.dataclasses
It is now fully tested, as far as I can test it, since I fail to set up the dev environment via With other words: I think it is ready for a review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, seems like you haven't gotten a review for a while, so I decided to leave a few comments. Thank you for looking into it! pydantic_model_creator
begs for refactoring!
I would like to complain a bit by saying that it took me a while to do a review, even though I haven't really dug deep because it is a lot of changes. Looks like the PR does a few things:
- Introduces dataclasses as a conversion step from tortoise to pydantic
- refactors a lot of the code
- changes the pydantic model naming
It would definitely make a review much easier if the PR was smaller! Anyway, please have a look at my comments and let me know what you think.
@@ -1335,44 +1336,7 @@ def test_schema(self): | |||
"title": "Employee", | |||
"type": "object", | |||
}, | |||
"Employee_obdn4z": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this has to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is the exact same def as "Employee_dkvdqq_leaf". Here, _MODEL_INDEX kicks in and returns the same model. This is because of the different way how the names are generated now. Before, the current stack was also hashed. But for a model-name to be unique, the stack is not needed. What is needed is the information about what fields are included in the model.
tests/contrib/test_pydantic.py
Outdated
"title": "Event_list", | ||
"type": "array", | ||
}, | ||
) | ||
|
||
def test_address_schema(self): | ||
# print(json.dumps(self.Address_Pydantic.model_json_schema(), indent=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will remove the other print statements as well :)
tortoise/contrib/pydantic/creator.py
Outdated
_MODEL_INDEX[self._name] = model | ||
return model | ||
|
||
def process_field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this could be a private method as well as construct_field_map
and others. Having private methods would help to provide a clear interface of the class and make it easier to read the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did as you said, but I left get_name
public, since it might be useful for others
tortoise/contrib/pydantic/creator.py
Outdated
return self.given_name, self.given_name | ||
hashval = ( | ||
f"{self._fqname};{self.meta.exclude};{self.meta.include};{self.meta.computed};" | ||
f"{self.meta.sort_alphabetically}:{self.meta.allow_cycles}:{self._exclude_read_only}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all names are changing anyway, maybe we can decide on having either ; or : as separators :)
|
||
:param _stack: Internal parameter to track recursion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all the documentation got removed.
) | ||
|
||
|
||
def describe_field_by_dataclass(field: Field) -> FieldDescriptionBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please help me to understand what is the benefit of converting the model and field metadata to a dataclass, so then we can convert it to a pydantic model? Not a criticism! Just trying to understand the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before using dataclasses, everything was inside a simple dict. A dict does not provide any typing information, and you also do not know what is inside a dict, with other words: you have no control about what keys are in a dict and also have no clue about of what type the values are. Since in this case it is very predictable what is needed for pydantic_model_creator, I found it much more consistent to rely on dataclasses, that do exactly what dicts don't. In my opinion, it is a good compromise between light weight and structural integrity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another approach would be to not use dataclasses or a dict and directly read the needed values from the Field Instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another approach would be to not use dataclasses or a dict and directly read the needed values from the Field Instances
that's not possible because Field.constraints is modified during the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not possible because Field.constraints is modified during the process.
This is kinda weird. I would assume that the "description" should stay constant and not be modified. Should it be refactored instead of adding dataclasses?
It also looks like when instantiating dataclasses, we just pass into the constructor the field values:
return FieldDescription(
name=field.model_field_name,
field_type=field.__class__,
python_type=field.field_type,
nullable=field.null,
default=field.default,
description=field.description,
docstring=field.docstring,
constraints=field.constraints,
)
Can we just refer to them in the code?
it got lost during refactoring
this is now handled by PydanticModelCreator
Thanks a lot for reviewing!
that is exactly what it is doing, the main part being the refactoring.
That is true, but in my eyes there were many thinks that could be improved, but I did not see the point to split this into multiple PRs because in the end everything would be rewritten, like it is now. Now, I can at least try to justify my changes ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, but in my eyes there were many thinks that could be improved, but I did not see the point to split this into multiple PRs because in the end everything would be rewritten, like it is now. Now, I can at least try to justify my changes ;)
The issue is that it is hard to understand what exactly has been changed and if the change is positive. I'm not familiar with this code close enough and it hard for me to say whether the code quality has improved. I see that the amount of code increased by a few hundreds lines which is not necessarily good, I see that there is more typing which is good. But I also see that tests keep failing and it makes approving it quite a leap of faith. Since it is a massive change, it should be tested thoroughly.
tortoise/contrib/pydantic/creator.py
Outdated
return None | ||
|
||
def _process_json_field_description(self) -> Any: | ||
return Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same as it is right now, also claimed here: #1702
I made it to an actual function so that in the future someone could add the functionality described there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this method
tortoise/contrib/pydantic/creator.py
Outdated
) -> Type[PydanticModel]: | ||
""" | ||
Function to build `Pydantic Model <https://pydantic-docs.helpmanual.io/usage/models/>`__ off Tortoise Model. | ||
@dataclasses.dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be this in dataclasses.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to dataclasses.py
# ForeignKeyFieldInstance -> RelationalField | ||
if isinstance(field, OneToOneFieldInstance): | ||
# OneToOneFieldInstance -> ForeignKeyFieldInstance -> RelationalField | ||
return OneToOneFieldInstanceDescription( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all contructors are almost the same. Can it be changed to be a class method in FieldDescriptionBase
?
class FieldDescriptionBase:
@classmethod
def from_tortoise_field(field):
return cls(...)
This would make this method shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did as you said
@@ -0,0 +1,191 @@ | |||
import dataclasses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that dataclasses
is a great name for a module. It's like calling a file classes.py or functions.py, it does not really convey information about what is exactly in the file. Maybe, descriptions.py
? I know naming is hard.
) | ||
|
||
|
||
def describe_field_by_dataclass(field: Field) -> FieldDescriptionBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not possible because Field.constraints is modified during the process.
This is kinda weird. I would assume that the "description" should stay constant and not be modified. Should it be refactored instead of adding dataclasses?
It also looks like when instantiating dataclasses, we just pass into the constructor the field values:
return FieldDescription(
name=field.model_field_name,
field_type=field.__class__,
python_type=field.field_type,
nullable=field.null,
default=field.default,
description=field.description,
docstring=field.docstring,
constraints=field.constraints,
)
Can we just refer to them in the code?
Could anyone review how I changed the creation of the hash? The hash is needed to guarantee the uniqunes of the created model. So I thought about what makes a model unique:
This should be covering the current state. maybe it is also important to include the optional fields? This is not done right now. The following is producing the same model: Employee_Pydantic = pydantic_model_creator(Employee, exclude=())
Employee_Pydantic_optional = pydantic_model_creator(Employee, optional=("manager",))
print(Employee_Pydantic.__name__, Employee_Pydantic_optional.__name__)
print(Employee_Pydantic is Employee_Pydantic_optional) # True |
I have removed the dataclasses that describe the fields and am now working with the fields directly. Should I push or revert the changes? The tests are passing. Also, I have noticed that a line that should have been removed still exists: #1465 (comment). Therefore, there are also a lot of tests that should be updated if the line is removed. |
pydantic_model_creator now accesses the fields directly
The pydantic_json_schema warning is because of the naming conflict between tortoise.Field and pydantic.Field that is covered by The I was able to set up a test environment with docker, the test does not fail here with python 3.12 or 3.11 and mssql as database. Another problem however I simply want to point out here is that during the test DockerfileFROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
apt-transport-https \
gnupg \
build-essential \
unixodbc-dev \
libpq-dev \
lsb-release\
&& rm -rf /var/lib/apt/lists/*
RUN curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg
RUN curl https://packages.microsoft.com/config/debian/$(lsb_release -rs)/prod.list \
-o /etc/apt/sources.list.d/mssql-release.list && \
apt-get update && ACCEPT_EULA=Y apt-get install -y msodbcsql18
COPY pyproject.toml poetry.lock ./
RUN pip install -U pip poetry && poetry config virtualenvs.create false
RUN poetry install --no-dev
COPY . .
ENV TORTOISE_TEST_MODULES=tests.testmodels \
TORTOISE_MYSQL_PASS=123456 \
TORTOISE_POSTGRES_PASS=123456 \
TORTOISE_MSSQL_PASS=Abcd12345678 \
TORTOISE_MSSQL_DRIVER="ODBC Driver 18 for SQL Server" \
PYTHONDEVMODE=1 \
PYTEST_ARGS="-n auto --cov=tortoise --cov-append --tb=native -q"
RUN apt-get update && apt-get install -y make && rm -rf /var/lib/apt/lists/*
CMD ["make", "style"]
CMD ["make", "ci"] docker-compose.ymlservices:
app:
build: .
depends_on:
- postgres
- mysql
- mssql
environment:
TORTOISE_TEST_MODULES: tests.testmodels
TORTOISE_MYSQL_PASS: 123456
TORTOISE_POSTGRES_PASS: 123456
TORTOISE_MSSQL_PASS: Abcd12345678
TORTOISE_MSSQL_DRIVER: "ODBC Driver 18 for SQL Server"
postgres:
image: postgres
ports:
- "5432:5432"
environment:
POSTGRES_PASSWORD: 123456
POSTGRES_USER: postgres
mysql:
image: mysql
ports:
- "3306:3306"
environment:
MYSQL_ROOT_PASSWORD: 123456
mssql:
image: mcr.microsoft.com/mssql/server:2022-CU15-ubuntu-22.04
ports:
- "1433:1433"
environment:
ACCEPT_EULA: Y
SA_PASSWORD: Abcd12345678 (And changed the database-URLs in the Makefile accordingly) |
Pull Request Test Coverage Report for Build 11880457757Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Seems like it was some flaky test, restarting it helped Please bear with me before I will be able to review it properly, probably on weekend, as it's quite hard to review massive refactoring like that |
tortoise/contrib/pydantic/creator.py
Outdated
model = create_model( | ||
lname, | ||
__base__=PydanticListModel, | ||
root=(List[submodel], Field(default_factory=list)), # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has to be PydanticField
, not Field
. Will be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall - I am happy with the changes
Thank you for working on it
Although model creator is still hard to get through, but at least now it is more readable and typed
tortoise/contrib/pydantic/creator.py
Outdated
return pconfig | ||
|
||
def _initialize_field_map(self) -> FieldMap: | ||
return FieldMap(self.meta) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reformat to
return (
FieldMap(self.meta)
if self._exclude_read_only
else FieldMap(self.meta, pk_field=self._model_description.pk_field)
)
Not sure why black didn't reformat it itself 🤔
tortoise/contrib/pydantic/creator.py
Outdated
exclude=tuple( | ||
str(v[prefix_len:]) for v in self.meta.exclude if v.startswith(field_name + ".") | ||
), | ||
include=tuple( | ||
str(v[prefix_len:]) for v in self.meta.include if v.startswith(field_name + ".") | ||
), | ||
computed=tuple( | ||
str(v[prefix_len:]) for v in self.meta.computed if v.startswith(field_name + ".") | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we can make this transformation into a function? Inline function would be good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want to use a lambda expression here (pycharm yelled at me "PEP 8: E731 do not assign a lambda expression, use a def"), so I put it in an inner function
tortoise/contrib/pydantic/creator.py
Outdated
self.meta = meta_from_class.finalize_meta( | ||
exclude, include, computed, allow_cycles, sort_alphabetically, model_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please - use kwargs here, it looks like too many arguments for using args here
The method
pydantic_model_creator
was very difficult to understand and therefore also very hard to maintainDescription
I have refactored to use a class-based approche. I also added some dataclasses to describe the Models and Fields so that typing is much more consistent.
Motivation and Context
I encountered many issues using
pydantic_model_creator
. The results are very unpredictable. For example: Why is MinRelation in tests.testmodels never referenced if you callpydantic_model_creator
with any of Event, Tournament, Team, Address? Why should someone want a model to be namedtortoise__contrib__pydantic__creator__tests__testmodels__Address__leaf
?How Has This Been Tested?
since nearly every current written test fails because of the changes how names are generated and which fields are included, it is hard to say that everything is working as intended.
Edit: It is now tested against the existing test, that needed to be updated to match the new naming scheme.
I have created some pydantic models and reviewed the changes with Meld:
here is the comparison old vs new
produced with the following code:
the code...
Checklist: