Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(robot-server): Avoid features that will be removed in SQLAlchemy 2.0 #16926

Open
wants to merge 4 commits into
base: EXEC-655-store-commands-error-list-in-db
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions robot-server/pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,8 @@ markers =
ot3_only: Test only functions using the OT3 hardware
addopts = --color=yes --strict-markers
asyncio_mode = auto

# Don't allow any new code that uses features removed in SQLAlchemy 2.0.
# We should remove this when we upgrade to SQLAlchemy 2.0.
filterwarnings =
error::sqlalchemy.exc.RemovedIn20Warning
42 changes: 40 additions & 2 deletions robot-server/robot_server/persistence/_migrations/_util.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
"""Shared helpers for migrations."""

import sqlalchemy

import shutil
import typing
from pathlib import Path

import sqlalchemy

from ..database import sqlite_rowid


Expand Down Expand Up @@ -54,3 +55,40 @@ def copytree_if_exists(src: Path, dst: Path) -> None:
shutil.copytree(src=src, dst=dst)
except FileNotFoundError:
pass


def add_column(
engine: sqlalchemy.engine.Engine,
table_name: str,
column: typing.Any,
) -> None:
"""Add a column to an existing SQL table, with an `ALTER TABLE` statement.

Params:
engine: A SQLAlchemy engine to connect to the database.
table_name: The SQL name of the parent table.
column: The SQLAlchemy column object.

Known limitations:

- This does not currently support indexes.
- This does not currently support constraints.
- The column will always be added as nullable. Adding non-nullable columns in
SQLite requires an elaborate and sensitive dance that we do not wish to attempt.
https://www.sqlite.org/lang_altertable.html#making_other_kinds_of_table_schema_changes

To avoid those limitations, instead of this function, consider this:

1. Start with an empty database, or drop or rename the current table.
2. Use SQLAlchemy's `metadata.create_all()` to create an empty table with the new
schema, including the new column.
3. Copy rows from the old table to the new one, populating the new column
however you please.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, this is what Alembic is for.

But I wonder if there's a way to use Alembic as a fancy SQL generator for diffs without buying into the full Alembic ecosystem.

Copy link
Contributor Author

@SyntaxColoring SyntaxColoring Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, this is what Alembic is for.

That is what we thought, but I looked into it when working on this, and I was underwhelmed. Alembic does implement this dance, and that is appealing. But:

First, like you said, there's a full Alembic ecosystem that we'd have to contend with. In particular, we would probably want to run Alembic "inside" our existing migration system. (It can't completely replace our own migration system, because we need to account for migrating regular files, and Alembic can only help with the schema inside the .db file.) Embedding it like that...seems messy? Like, one hypothetical way for it to work would be if Alembic gave us a standalone util function like alembic.add_all_column_constraints(command_table.c.command_status), and we'd call that from within our existing system, but it doesn't seem like Alembic works like that.

Second, as far as I can tell, Alembic leaves us on our own for the data part of these migrations, e.g. populating the new non-nullable column. They recommend some general patterns in sqlalchemy/alembic#972 (reply in thread) and https://alembic.sqlalchemy.org/en/latest/cookbook.html#data-migrations-general-techniques. Those patterns strike me as their own dances that are only marginally better. Especially if we need to rearchitect to fit into the Alembic ecosystem just for the privilege of using them.

But I could definitely have big misconceptions about all of this. I've never actually used Alembic for real. If you want to make a sketch or proof of concept to show what it would look like, I'd definitely love something better than what we have now.

But I wonder if there's a way to use Alembic as a fancy SQL generator for diffs without buying into the full Alembic ecosystem.

Yeah. I'm not sure if this what you're getting at, but there is https://alembic.sqlalchemy.org/en/latest/autogenerate.html, and we might be able to combine that with https://alembic.sqlalchemy.org/en/latest/offline.html. So one option is to run Alembic once on our laptops to autogenerate the ALTER TABLE dance, and then manually integrate that with our data migrations. Is that what you have in mind?

"""
column_type = column.type.compile(engine.dialect)
with engine.begin() as transaction:
transaction.execute(
sqlalchemy.text(
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use schemas here? Most of the SqlAlchemy code I've seen before passes around a tuple of (schema, table_name) for functions like this. Or is the schema included in the table_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this is what you mean by schema in this context, but we represent the SQL schema in terms of Python objects with a sqlalchemy.Metadata. That lives in, e.g. robot_server.persistence.tables.schema_8.

What would passing (schema, table_name) do?

Copy link
Contributor

@ddcc4 ddcc4 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, the word "schema" is way too overloaded in the programming world.

For organizing big projects, databases like PostgreSQL let you create folders/directories/namespaces/whatever-you-want-to-call-them to divide up your data. PostgreSQL calls these folders "schemas" (which have nothing to do with the colloquial use of "schema" to refer to the shape of a table). And tables, enum definitions, server-side functions, etc., all live inside a schema.

So to refer to a table in PostgreSQL, you would need its fully qualified name, like:
SELECT something FROM myschema.mytable WHERE ...

The practical upshot is that in the code I worked on, whenever you pass a table name around, you would also need to pass the schema name around. https://docs.sqlalchemy.org/en/20/core/metadata.html#specifying-the-schema-name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaaah, gotcha, thank you.

No, we don't use that kind of schema. Our .db file has only one, implicit, "main" schema. In SQLite, the myschema.mytable syntax is used for when you're opening multiple .db files in the same connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SQLite, the myschema.mytable syntax is used for when you're opening multiple .db files in the same connection.

Oh neat! Then as a sidenote, I think that would let us solve the TODO in your copy_rows_unmodified() (where you wanted to avoid pulling the whole DB into Python/SqlAlchemy and then writing it back out). You could open both the source and destination tables in the same connection, then do INSERT INTO new_table SELECT * FROM old_table, and have the copy be done entirely inside the SQL engine.

)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if you want to try something cute, folks online seem to recommend constructing the statement like this:

compiler = engine.dialect.ddl_compiler(engine.dialect, None)
column_specification = compiler.get_column_specification(column)
...execute(f"ALTER TABLE {table_name} ADD COLUMN {column_specification}")

to have SqlAlchemy's compiler generate the column definition for you.

11 changes: 1 addition & 10 deletions robot-server/robot_server/persistence/_migrations/v3_to_v4.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

import sqlalchemy

from ._util import add_column
from ..database import sql_engine_ctx
from ..file_and_directory_names import DB_FILE
from ..tables import schema_4
Expand All @@ -35,16 +36,6 @@ def migrate(self, source_dir: Path, dest_dir: Path) -> None:
dest_engine = exit_stack.enter_context(sql_engine_ctx(dest_db_file))
schema_4.metadata.create_all(dest_engine)

def add_column(
engine: sqlalchemy.engine.Engine,
table_name: str,
column: Any,
) -> None:
column_type = column.type.compile(engine.dialect)
engine.execute(
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}"
)

add_column(
dest_engine,
schema_4.analysis_table.name,
Expand Down
14 changes: 1 addition & 13 deletions robot-server/robot_server/persistence/_migrations/v4_to_v5.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,8 @@
from pathlib import Path
from contextlib import ExitStack
import shutil
from typing import Any

import sqlalchemy

from ._util import add_column
from ..database import sql_engine_ctx
from ..file_and_directory_names import DB_FILE
from ..tables import schema_5
Expand All @@ -35,16 +33,6 @@ def migrate(self, source_dir: Path, dest_dir: Path) -> None:
dest_engine = exit_stack.enter_context(sql_engine_ctx(dest_db_file))
schema_5.metadata.create_all(dest_engine)

def add_column(
engine: sqlalchemy.engine.Engine,
table_name: str,
column: Any,
) -> None:
column_type = column.type.compile(engine.dialect)
engine.execute(
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}"
)

add_column(
dest_engine,
schema_5.protocol_table.name,
Expand Down
12 changes: 1 addition & 11 deletions robot-server/robot_server/persistence/_migrations/v6_to_v7.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@
from pathlib import Path
from contextlib import ExitStack
import shutil
from typing import Any

import sqlalchemy

from ._util import add_column
from ..database import sql_engine_ctx, sqlite_rowid
from ..tables import DataFileSourceSQLEnum, schema_7
from .._folder_migrator import Migration
Expand Down Expand Up @@ -44,16 +44,6 @@ def migrate(self, source_dir: Path, dest_dir: Path) -> None:

dest_transaction = exit_stack.enter_context(dest_engine.begin())

def add_column(
engine: sqlalchemy.engine.Engine,
table_name: str,
column: Any,
) -> None:
column_type = column.type.compile(engine.dialect)
engine.execute(
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}"
)

add_column(
dest_engine,
schema_7.run_command_table.name,
Expand Down
18 changes: 6 additions & 12 deletions robot-server/robot_server/persistence/_migrations/v7_to_v8.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

import sqlalchemy

from ._util import add_column
from ..database import sql_engine_ctx
from ..tables import schema_8
from .._folder_migrator import Migration
Expand Down Expand Up @@ -41,16 +42,6 @@ def migrate(self, source_dir: Path, dest_dir: Path) -> None:

dest_transaction = exit_stack.enter_context(dest_engine.begin())

def add_column(
engine: sqlalchemy.engine.Engine,
table_name: str,
column: Any,
) -> None:
column_type = column.type.compile(engine.dialect)
engine.execute(
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}"
)

add_column(
dest_engine,
schema_8.run_command_table.name,
Expand All @@ -73,9 +64,12 @@ def add_column(
def _add_missing_indexes(dest_transaction: sqlalchemy.engine.Connection) -> None:
# todo(2024-11-20): Probably add the indexes missing from prior migrations here.
# https://opentrons.atlassian.net/browse/EXEC-827
dest_transaction.execute(
"CREATE UNIQUE INDEX ix_run_run_id_command_status_index_in_run ON run_command (run_id, command_status, index_in_run);"
index = next(
index
for index in schema_8.run_command_table.indexes
if index.name == "ix_run_run_id_command_status_index_in_run"
)
index.create(dest_transaction)


def _migrate_command_table_with_new_command_error_col_and_command_status(
Expand Down
Loading