Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use relational database #2311

Open
4 of 27 tasks
jsangmeister opened this issue Mar 18, 2024 · 3 comments
Open
4 of 27 tasks

Use relational database #2311

jsangmeister opened this issue Mar 18, 2024 · 3 comments
Assignees
Labels
Milestone

Comments

@jsangmeister
Copy link
Contributor

jsangmeister commented Mar 18, 2024

I would make the following suggestions regarding the change to the relational DB in the backend:

  • Create work on a feature branch feature/relational-db
  • Revert some changes to the models.yml: The plan was to not change anything about the relation handling in the backend in the first step. With the current changes to the models.yml, this is no longer possible, as many to fields have been removed. I think we should keep the models file backwards-compatible as this will not only keep the relation handling intact, it will also simplify the update of the generate-models script and the general handling of the changes. This means re-introducing all removed relation meta data.
    • From this, it also follows that we do not want the n:m tables in the models.py, as all backend code is not fit to handle these. The changes in the write adapter must therefore get the required information from the additional fields in the models.yml as well as the helper functions supplied by the meta repo.
    • document the schema of the models.yml
    • Afterwards, the generate-models script can be updated - in the first step, it should suffice to allow the new meta attributes and simply write them to the models.py as-is. If we later figure out a smarter way to handle these, we can still change it.
  • Port the existing datastore code to the backend.
  • Create exactly one transaction per backend request so that all queries (read and write) run inside this transaction
    • I don't think we need to implement a new postgres adapter, it should suffice to simple use the one from the datastore. The adapter should probably be added as a new service to the service declaration container.
  • Adjust the read methods to not use the datastore reader, but to directly access the database via the adapter. Since the views connection tables and calculated fields always have the same name as the collection, this should be rather easy. For all filter-related methods, the datastore already provides helper methods to translate the Filter classes to SQL queries, but they have to be updated to the new database definition. We should copy all required methods to the backend (with the foresight that more may follow, so choosing a smart directory placement), since the datastore will be removed later anyway
  • The above point includes the following methods:
    • get
    • get_many
    • get_all
    • filter
    • exists
    • count
    • min
    • max
  • Adjust the write methods to write directly to the database. This includes the following methods:
    • write: The new procedure will be the following:
      • Extract all relevant data from the write request (i.e., all fields which are physically saved in the DB and not represented by a calculated view field) and drop the rest
      • Translate the extracted data into the relational format by adding n:m tables (and maybe other necessary changes)
      • Write the data (still inside the request's transaction) into the appropriate tables
      • Create a new position in the old positions table to save the history information in. The events are irrelevant now and can be dropped.
    • reserve_ids: Use the methods supplied by postgres to mimic the old behaviour of reserving id's (@r-peschke probably has a better idea on how to do that)
    • truncate_db: Is already implemented as a helper method in the meta repository - dropping the database and rebuilding it from a template seems to be the fastest way to do this. This method is only required for development/testing purposes.
  • Adjust the migrations (EDIT: see comment below - these points may be partly obsolete):
    • Implement a new migration type RelationalMigration which should work on the new relational database schema
    • Implement the option to migrate from the models-based schema to the relational one - hopefully, there is a smart way to combine this with the previous bullet point
    • Implement the actual migration to move all data to a relational model. This needs to be extensively tested, automated as well as with actual user data, to ensure that we don't lose data or break something here.

Let me know if you see anything differently or if I'm missing something.

@r-peschke

This comment was marked as resolved.

@jsangmeister

This comment was marked as resolved.

@jsangmeister
Copy link
Contributor Author

Some thoughts about the migrations and ways on how it could be designed:

1) New migration class inside the old migration framework

Besides the EventMigrater and the ModelMigrater classes we'll need a third type (RelationalMigrater or similar). It needs to be implemented in a conventional and a memory-only version. The MigrationReader needs to be adapted for both versions to work with the relational database or, alternatively, use the new read adapter of the backend. This must all work in addition to the existing migrations. The new migrater must offer some possibility to execute DDL commands. Those can simply be ignored in the memory-only versions.

Furthermore, the migrations itself need a new format. While the event migrations migrated the data with the help of keyframes by taking the current list of events for a position and returning the new list of events, the model migrations had another approach by taking no input (only implicitly the whole content of the datastore) and returning a list of traditional request events (the kind which the backend sends to the datastore), which were consequently applied to the models. The new migration type will work similar to the latter, but it needs to be determined what the return type should be. This depends on the future of the history: If the history will be created solely via database triggers and functions, it suffices for the migration to directly write its output to the respective tables. If the history is created via some external backend code, the migration will have to funnel its data through the same mechanism. Then, it might prove benefitial to use a return type which this method already accepts and leave it to the RelationalMigrater to send the resulting data through that.

The first relational migration can be implemented within this framework if it is given some freedom on how to read data, since it must not use the new way of reading it via the relational tables, but the old way. This could be done by manually patching this migration with the old MigrationReader or by simply executing SQL queries directly and this way iterating over all events. I think the latter way might be simpler and more flexible.

Keeping the existing migration framework requires the positions table to be kept as well, since this table saves the migration_index of all data. Additionally, this table currently contains the history information, although this could be moved as it's not relevant to the migrations. On a related note, if this version is chosen, I would suggest to not remove the old tables during the initial relational migration, but at a later point (some migrations down the line). This way, the data can still be acessed by later migrations when errors in the initial relational migration are found (and, let's be honest, there will be errors).

All in all, while it's possible to add a third migration type to the system as described, it seems very cumbersome. The whole migration system was designed specifically with the event store in mind and many concepts and methods are not useful or applicable to the other migration types, which I've already noticed while writing the model migrations (for example, it is now conceptually impossible for positions to have different migration indices, so it does no longer make sense to save these on a position basis. Also, the whole "migrate vs. finalize" distinction is not applicable to model or relational migrations). Therefore, I think it's worth to debate whether the current migration system should be kept or if it's better to implement a new one, specifically designed for the relational database (maybe even using a pre-existing framework/package). I will present two options for that in the following.

2) The OS4 way: Export + import

There was no migration from OS3 to OS4. The only option to migrate the data was to export the OS3 instance, create a new OS4 instance and then import the old data as a meeting. I don't really think we should repeat this since it's not very user-friendly, but from an implementation perspective, this would be a clean cut and make the development of the new framework the easiest. Since the new relational database will (probably?) come with a new major version anyway, this could theoretically be an option, but I'd more inclined to use the third option below.

3) Edge migration

In lack of a better name, I'm calling this "edge migration". Let version 4.x be the last OpenSlides release before OS5. (For simplicity, I'm assuming here that OS5 is the first version with the relational database, but feel free to replace the number with any other). With this concept, we implement a completely new migration framework in OS5, specifically designed for the relational DB and with a first migration which ports all the data from the old database tables to the relational ones. It is therefore not capable of executing the old migrations. Consequently, all instances must be first upgraded to version 4.x and then further upgraded to version 5. For older instances, this will be a little more maintenance, but if this is communicated and documented well, it should pose no problem. The instances hosted by us will probably be upgraded to 4.x beforehand anyway. In the code, it can be ensured that no older versions are directly upgraded to OS5 by checking the migration index of the data (can be hardcoded, since it will never change). It must match the one from the last model migration in version 4.x.

This solution tries to combine the best of both worlds: a fresh start for the migration framework while keeping the update process as user-friendly as possible under these restrictions. It would also make it easier to use an existing relational migration framework which would reduce the implementation work and the error proneness. Therefore, I would favor this option over the others.


A side note on memory migrations: These will probably be increasingly tricky to implement, independently of the chosen way, as the data stored in the database is no longer simple JSON blobs. An alternative method might be to not do the migrations in-memory, but to create an explicit database, fill it with the content to migrate and execute the migrations directly on the database. I don't know how feasible this is, though (maybe creating and filling the database takes to long?) - just an idea to try out.

@rrenkert rrenkert assigned reicda and unassigned luisa-beerboom May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants