forked from metabrainz/sir
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull upstream changes #1
Open
danielunderwood
wants to merge
49
commits into
Lidarr:master
Choose a base branch
from
metabrainz:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The build_entity_query method of all entities is mocked in test_index_by_fk_* tests. These mocks are never reset after the test is over. This causes the mock to be visible instead of the actual method in other tests causing those to fail. Fix the issue by using mock.patch and cleaning up after test run is over.
The test config should be copied inside the Dockerfile manually. Copying config.test.ini to config.ini in a local or jenkins script before building the docker image will not work because config.ini is included in .dockerignore. Therefore, config.ini will be ignored and not be copied to the image otherwise. Co-authored-by: yvanzo <[email protected]>
This erases all the code from the image built in CI. So remove the mount, the downside is that we need to build the image for tests each time. An alternative could be to have separate docker compose files for CI and local tests, perhaps even using one as overlay over the other.
We want to these indexing functions with real database tests. Thus, accept an optional session parameter in these functions. In tests, we create a session, start a transaction, insert data from sql files, query the data in the same transaction and at the end of the test rollback the transaction. As a result, we are able to get away without resetting the database in between the tests. If we don't pass the same session here, then it may not be possible to use this method because changes in a transaction are not visible outside it. This code earlier used to be inside a with but the queries executed here are only select, so we can do without the `with` here as well.
Following the model of `docker/push.sh`: - Document expected behavior and usage with heading comments - Exit on error, for example if the first Docker Compose command failed - Support running from any working directory (using `cd`) - Support any Docker Compose setup and version: * Run `docker-compose` by default assuming v1 and `docker` group * Support `docker compose` for Docker Compose v2 * Support `sudo ...` if needed
See https://docs.sqlalchemy.org/en/11/changelog/migration_11.html FIXME: LinkAreaArea.area0 is not instance of InstrumentAttribute
The column can now be a hybrid_propertyProxy object which we don't want to skip, necessarily. But the previous behavior of skipping strings and mbdata models has been preserved by checking for those directly.
This brings us back to the same issue as in 1.1 with _wildcard_token, so might as well do try and do the jump in one go.
The point of defer_everything_but was to effectively do what load_only does, but probably back when load_only was not an option yet. This just gets rid of it, and uses load_only instead. Since load_only cannot work on relationships, just columns, we filter relationships out of the column set before using it.
I could not find docs on `table` attribute but https://docs.sqlalchemy.org/en/14/orm/internals.html#sqlalchemy.orm.RelationshipProperty.mapper documents a `mapper` attribute which seems to do the job. FWIW, there is also a `target` attribute but that is not documented in SQLAlchemy. I am not sure if the two are different or not.
__tablename__ is not a column anyways. Also, the checks in `defer_everything_but` (before we replaced it with `load_only`) filtered it out.
The path artist_links.artist.gid is not being processed correctly by iterate_path_values instead of returning the gid, it returns the artist. LinkArtistUrl.artist is a @hybrid_property and the check isinstance(LinkArtistUrl.artist, InstrumentedAttribute) returns false. To identify hybrid_attributes in iterate_path_values, we could use https://docs.sqlalchemy.org/en/14/orm/internals.html#sqlalchemy.orm.InspectionAttr.extension_type . I tried this, but currently it does not seem to work. Hence, it is better that we do not use hybrid attributes in paths for now.
In tests, we need to pass the session manually to control transactions. Refactoring code so that tests and production code execute same code path.
Starting from using SQLAlchemy 1.4, SIR crashes if materialized (or denormalized) tables for the MusicBrainz database are not available. Just document it for now and consider SEARCH-687 for better checks.
Upgrade SQLAlchemy from 1.0 to 1.4
Provide information for basic maintenance and implementation details. Follow guidelines for using RabbitMQ service: - https://github.com/metabrainz/guidelines/blob/60c708538bc84e11a79d2cdad38274f5b04276ad/services/README.md - https://github.com/metabrainz/guidelines/blob/f2dcf19bac2d76d7348758c021f1e5ada4451cf8/services/RabbitMQ.md
* Fix deprecation warnings in SIR After upgrading SIR to SQLAlchemy 1.4, some deprecation warnings came up. 1. SADeprecationWarning: Use .persist_selectable (deprecated since: 1.3) mapped_table = class_mapper(entity.model).mapped_table.name Fix is straightforward use .persist_selectable instead of .mapped_table as looking at following docs it only seems to have been renamed. .mapped_table in SQLAlchemy 1.2: https://docs.sqlalchemy.org/en/12/orm/mapping_api.html?highlight=mapped_table#sqlalchemy.orm.Mapper.mapped_table .persist_selectable in SQLAlchemy 1.4: https://docs.sqlalchemy.org/en/14/orm/mapping_api.html?highlight=mapped_table#sqlalchemy.orm.Mapper.persist_selectable 2. SADeprecationWarning: Calling URL() directly is deprecated and will be disabled in a future release. The public constructor for URL is now the URL.create() method. As the warning says, replace URL() with URL.create() 3. /usr/local/lib/python2.7/site-packages/pysqlite2/dbapi2.py:81: DeprecationWarning: Converters and adapters are deprecated. Please use only supported SQLite types. Any type mapping should happen in layer above this module. This warning comes from pysqlite. Python 2.7 does have a sqlite module so getting rid of pysqlite entirely. We only used it for a few tests in any case. 4. Finally, there's also a SAWarning about overlapping relationships in B and C models which are only used in tests. Add a backref to fix the warning.
The test logs are filled with warnings like: SAWarning: relationship 'AreaTag.area' will copy column area.id to column area_tag.area, which conflicts with relationship(s): 'CustomArea.tags' (copies area.id to area_tag.area). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards. To silence this warning, add the parameter 'overlaps="tags"' to the 'AreaTag.area' relationship. This particular warnings arises from `tags = relationship("AreaTag")` in CustomArea. The reason (AFAIU) is that `tags` relationship links `Area` to `AreaTag` and `area` in `AreaTag` links AreaTag back to `Area`. However, these relationships aren't linked with a backref so it isn't known to SQLAlchemy that both `relationship` properties represent the same relation. Possible fixes include using `back_ref` to make it known to SQLAlchemy that both relationships are same. That approach however needs changes in mbdata. In many cases like that of `Tag` and `Alias` relationships, it seems like a good idea to update mbdata and say add `tags` and `aliases` attributes directly to `Area` model and other entities. But there are a few cases like `artist_links` and `recording_links` in `CustomWork` which probably don't make sense to be added to mbdata and should continue to live in custom sir models. Another alternative approach to get rid of the warning as the message suggests is setting `viewonly=True`, since we only ever read the data in sir and never modify this option works fine for us. Updating mbdata likely needs to consider implications on other users of the library so currently I am not inclined to pursue the approach. Hence, going with the `viewonly=True` approach.
The test logs are filled with warnings like: SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery. These warnings come from code like which calculate various counts: `label_count = column_property(select([func.count(Label.id)]).where(Label.area_id == Area.id))` Fix is simple append .scalar_subquery() to the queries.
This reverts commit 8b16c86. The SQL queries generated using load_only are less optimal as compared to using defer_everything_but. The reason likely stems from a difference in which columns ways eagerly load. For now, just revert to the old way to avoid performance regressions.
Trying to specify the load type for a composite property is erroneous. So filter those properties.
We access begin_date, end_date and ended fields from area fields in convert_area_inner which is called for all area types. This function is only called by wscompat converter so we only need to add these to extrapaths and not to the main fields in the entity.
convert_event used as wscompat converter for event core calls convert_artist_simple as well which accesses sort_name so eagerly load it.
Logging SQL queries shows that when using the hybrid property like artist of LinkArtistPlace model instead of the actual attribute like entity0 does not trigger eager loading of the db column. Therefore, use entity0 instead of artist in artist_links, entity0 instead of area in area_links and entity1 instead of place in place_links.
Logging SQL queries shows that when using the hybrid property like artist of LinkArtistUrl model instead of the actual attribute like entity0 does not trigger eager loading of the db column. Therefore, use entity0 instead of artist in artist_links and entity0 instead of release in release_links.
convert_release_group used as wscompat converter for release group core calls convert_alias as well which accesses alias.gid so eagerly load it.
convert_release used as wscompat converter for release core calls convert_artist_simple as well which accesses comment so eagerly load it.
convert_release used as wscompat converter for release core calls convert_release_packaging as well which accesses comment so eagerly load it.
Logging SQL queries shows that when using the hybrid property like area0 of LinkAreaArea model instead of the actual attribute like entity0 does not trigger eager loading of the db column. Therefore, use entity0 instead of area0 in area_links.
convert_area calls convert_area_relation_list which in turn calls convert_area_inner on the related area that accesses these attributes so eagerly load these for performance.
convert_place used as wscompat converter for place core calls convert_area_inner as well which accesses area.type.gid, area.type.name, area.begin_date, area.end_date, area.ended so eagerly load it.
The change amends the implementation from SEARCH-319. The CustomReleaseGroup model has 2 relationships to ReleaseGroupMeta table while stores first_release_date. 1. `meta` - this relationship is established on `ReleaseGroup` model through a [backref](https://github.com/acoustid/mbdata/blob/bbe303865e4cec3f83a65ce29f0d3468c729173e/mbdata/models.py#L8153) in `ReleaseGroupMeta` table. 2. `first_release_date` - we add this relationship to `ReleaseGroup` in `CustomReleaseGroup` in [sir](https://github.com/metabrainz/sir/blob/80954d6953da4b74028b3b5a73fada5d81f910e0/sir/schema/modelext.py#L122). Ideally, we would have only one relationship: the one in mbdata. However, using `meta` relationship leads to an error. Investigating the error, I found that the `meta` relationship is created using `uselist=False` parameter. The logic in [iterate_path_values](https://github.com/metabrainz/sir/blob/80954d6953da4b74028b3b5a73fada5d81f910e0/sir/querying.py#L78-L85) does not support [`ONETOONE`](https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html#one-to-one) relationships that `uselist=False` creates. Until `iterate_path_values` is updated we need to keep using the `first_release_date` relationship. Accordingly, update the logic in convert_release_group converter. This should also likely speed up indexing as well.
The change amends the implementation from SEARCH-218. The CustomRecording model has 2 relationships to RecordingFirstReleaseDate table which stores first_release_date. 1. `first_release` - this relationship is established on `Recording` model through a [backref](https://github.com/acoustid/mbdata/blob/bbe303865e4cec3f83a65ce29f0d3468c729173e/mbdata/models.py#L1616) in `RecordingFirstReleaseDate` table. 2. `first_release_date` - we add this relationship to `Recording` in `CustomRecording` in [sir](https://github.com/metabrainz/sir/blob/ade0073ddd450e641599d56af1a1ca4762aef83c/sir/schema/modelext.py#L116). Ideally, we would have only one relationship: the one in mbdata. However, using `first_release` relationship leads to an error. Investigating the error, I found that the `first_release` relationship is created using `uselist=False` parameter. The logic in [iterate_path_values](https://github.com/metabrainz/sir/blob/80954d6953da4b74028b3b5a73fada5d81f910e0/sir/querying.py#L78-L85) does not support [`ONETOONE`](https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html#one-to-one) relationships that `uselist=False` creates. Until `iterate_path_values` is updated we need to keep using the `first_release_date` relationship. Accordingly, update the logic in convert_recording converter. This should also likely speed up indexing as well.
The change amends the implementation from SEARCH-494. The CustomRelease model has 2 relationships to ReleaseMeta table which stores amazon asin. 1. `meta` - this relationship is established on `Release` model through a [backref](https://github.com/acoustid/mbdata/blob/bbe303865e4cec3f83a65ce29f0d3468c729173e/mbdata/models.py#L7872) in `RecordingFirstReleaseDate` table. 2. `asin` - we add this relationship to `Release` in `CustomRelease` in [sir](https://github.com/metabrainz/sir/blob/ade0073ddd450e641599d56af1a1ca4762aef83c/sir/schema/modelext.py#L134). Ideally, we would have only one relationship: the one in mbdata. However, using `meta` relationship leads to an error. Investigating the error, I found that the `meta` relationship is created using `uselist=False` parameter. The logic in [iterate_path_values](https://github.com/metabrainz/sir/blob/80954d6953da4b74028b3b5a73fada5d81f910e0/sir/querying.py#L78-L85) does not support [`ONETOONE`](https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html#one-to-one) relationships that `uselist=False` creates. Until `iterate_path_values` is updated we need to keep using the `asin` relationship. Accordingly, update the logic in convert_release converter. This should also likely speed up indexing as well.
convert_artist calls convert_area_inner on the area that accesses these area.type.name and area.type.gid attributes so eagerly load these for performance.
convert_label used as wscompat converter for label core calls convert_area_inner as well which accesses area.begin_date, area.end_date, area.ended so eagerly load it. Also, add one more test case where these fields aren't NULL.
Add sort_name and comment for eager loading because those are accessed by convert_artist_simple. Change recording to entity0 because hybrid properties are not eagerly loaded correctly.
We load recording links anyway so a len in python is simpler.
When moving to another instance of RabbitMQ (mostly for master not mirror), the exchanges and queues have to be declared using SIR’s `amqp_setup` command, otherwise messages are just dropped by the new RabbitMQ instance. This step was missing in the documentation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.