Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Better migration architecture #366

Open
merlinz01 opened this issue Nov 21, 2024 · 4 comments
Open

RFC: Better migration architecture #366

merlinz01 opened this issue Nov 21, 2024 · 4 comments
Milestone

Comments

@merlinz01
Copy link
Contributor

From my experience using aerich in a production system, the current migration approach leaves much to be desired.

Here are my complaints with the current architecture:

  • Using raw SQL is not database-independent, so you are tied to using the same database type for testing and production.
  • Sometimes the SQL commands are out of order.
  • You can't do custom migration logic. All you can do is tweak a generated SQL script.

This is what I've come up with as a better approach:

I think a migration file should consist of the following items:

  • The current and previous migration ID
  • The old schema as a dict literal
  • The new schema as a dict literal
  • A list of migration operations such as field renames, table creation, etc. with database-specific information only where absolutely unavoidable.
  • A function which can be used to do manual data migrations in Python or SQL, and its complement for downgrades

Benefits of this approach:

  • Database-agnostic
  • A clear way of describing what is going to happen to the database without having to mess with SQL.
  • Custom upgrade and downgrade logic
  • The aerich tool can lint the operations so that the schema stays consistent even if a user edits the migration file.
  • You could ask aerich to generate an null migration file and then implement some data-only (non-schema) updates in the custom migration logic.

Conceptual example:

# This is a migration generated by the aerich tool.
# You may edit this file, but be warned that your changes will be overwritten
# if you run aerich migrate again without upgrading the database to this migration.

from tortoise import BaseDBAsyncClient

# This is the ID of the migration previous to this one, or None if this is the first.
PREVIOUS_MIGRATION_ID = '20241121101234'

# This is the ID of the migration, used to determine if the migration has been run.
MIGRATION_ID = '20241121142100'

# This is the schema of the database as of the last migration (maybe could be imported directly from previous migration?)
OLD_SCHEMA = {
    'MyModel': {
        'table_name': 'my_models',
        'fields': [
            'id': {
                'type': 'IntField',
                'pk': True,
                'null': False,
            },
            'name': {
                'type': 'CharField',
                'pk': False,
                'null': False,
                'max_len': 255,
            },
            ...
        ],
    },
    ....
}

# This is the schema of the database after this migration has been run.
NEW_SCHEMA = {
    'MyModel': {
        'table_name': 'my_models',
        'fields': [
            'id': {
                'type': 'IntField',
                'pk': True,
                'null': False,
            },
            'display_name': {
                'type': 'CharField',
                'pk': False,
                'null': False,
                'max_len': 255,
            },
            ...
        ],
    },
    ....
}

# These are directives to the migration executor about what the migration consists of.
MIGRATION_OPERATIONS = {
    {'type': 'rename_field', 'table': 'my_models', 'old_name': 'name', 'new_name': 'display_name'},
}

# This is where you can implement fancy migration logic if the simple cases don't meet your needs.
async def CUSTOM_MIGRATION_SCRIPT_UPGRADE(dbconn: BaseDBAsyncClient):
    # Example (pseudo-code):
    names = [(id, f"Hi, I'm {name}") for id, name in dbconn.execute('SELECT id, name FROM my_models').all()]

    # Wait for the migrations to be executed
    yield

    dbconn.executemany('UPDATE my_models SET display_name=? WHERE id=?', names)

# This is where you can implement the reverse logic in the same manner if you need to downgrade.
async def CUSTOM_MIGRATION_SCRIPT_DOWNGRADE(dbconn: BaseDBAsyncClient):
    yield

Migration pseudo-code:

# Evaluate the migration file
import migration_file

# Connect to the database
dbconn = get_db_conn()

# Check the previous migration ID against the database
if migration_file.PREVIOUS_MIGRATION_ID is not None and get_last_migration(dbconn) != migration_file.PREVIOUS_MIGRATION_ID:
    raise ValueError('You are running migrations out of order!')

# Make sure the operations update the schema correctly in case the user edited them
schema = migration_file.OLD_SCHEMA
for operation in migration_file.MIGRATION_OPERATIONS:
    modify_schema(schema, operation)
if schema != migration_file.NEW_SCHEMA:
    raise ValueError('The operations do not modify the schema correctly!')

# Run the first part of the user's upgrade logic
asyncgen = await migration_file.CUSTOM_MIGRATION_SCRIPT_UPGRADE(dbconn)
await anext(asyncgen)

# Run the migration operations
for operation in migration_file.MIGRATION_OPERATIONS:
    execute_migration_operation(operation, dbconn, downgrade=False)

# Run the second part of the user's upgrade logic
try:
    await anext(asyncgen)
except StopIteration:
    pass
else:
    raise RuntimeError('Only one yield statement is allowed!')

# Update the migrations
add_migration_to_migrations_table(dbconn, migration_file.MIGRATION_ID)

# Done
@merlinz01
Copy link
Contributor Author

Linking #76 #148 #243

@henadzit
Copy link
Contributor

@merlinz01 thank you for sharing your ideas!

A few comments/questions:

  • Having both OLD_SCHEMA and NEW_SCHEMA would make the migration files quite large
  • How would you populate NEW_SCHEMA in the case where you are starting from an empty (null) migration?
  • Have you looked into other migrations tools? For instance, Django's migrations or alembic? What do you think about their approach to solving migrations?

@merlinz01
Copy link
Contributor Author

In answer to your questions.

  1. Yes, having both old and new schemas would make the migration files significantly larger. The idea here is to reduce ambiguity for the migration tool, to eliminate the need for a database connection when generating a migration file, and to be able to check the operations against the new schema in case the user modifies one or the other. However, it should be workable to "import" the schema from the previous migration, because there is going to be trouble anyway if that file doesn't exist. So instead of OLD_SCHEMA = { ... } we should have something like OLD_SCHEMA = import_schema_from_migration_file(PREVIOUS_MIGRATION_ID).
  2. If a user is not making schema changes, aerich could just generate NEW_SCHEMA = None or MIGRATION_OPERATIONS = None and use that as a signal to skip the schema migration stuff.
  3. Django seems to employ a similar concept as far as defining operations instead of generating SQL. Sample Django migration files:
# Generated by Django 5.1.3 on 2024-11-25 20:41

from django.db import migrations, models


class Migration(migrations.Migration):

    initial = True

    dependencies = []

    operations = [
        migrations.CreateModel(
            name="TestModel",
            fields=[
                (
                    "id",
                    models.BigAutoField(
                        auto_created=True,
                        primary_key=True,
                        serialize=False,
                        verbose_name="ID",
                    ),
                ),
                ("name", models.CharField(max_length=100)),
                ("age", models.IntegerField()),
            ],
        ),
    ]
# Generated by Django 5.1.3 on 2024-11-25 20:42

from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        ("testmigrations", "0001_initial"),
    ]

    operations = [
        migrations.RenameField(
            model_name="testmodel",
            old_name="age",
            new_name="age2",
        ),
    ]

Alembic is fairly similar to what I came up with. I like the fact that it enables type-checking on the operations.

"""test

Revision ID: f0f068244d58
Revises: e8fa71e392c3
Create Date: 2024-11-25 16:08:05.689241

"""
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision: str = 'f0f068244d58'
down_revision: Union[str, None] = 'e8fa71e392c3'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    op.add_column('test_models', sa.Column('test', sa.String(length=20), nullable=False))
    # ### end Alembic commands ###


def downgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_column('test_models', 'test')
    # ### end Alembic commands ###

@waketzheng waketzheng added this to the v0.9 milestone Dec 12, 2024
@merlinz01
Copy link
Contributor Author

Dropping this link here mostly for interest's sake: https://github.com/gobuffalo/fizz/blob/main/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants