Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squash Alembic migrations to improve test execution time #17590

Open
DarkaMaul opened this issue Feb 11, 2025 · 2 comments
Open

Squash Alembic migrations to improve test execution time #17590

DarkaMaul opened this issue Feb 11, 2025 · 2 comments
Labels
developer experience Anything that improves the experience for Warehouse devs testing Test infrastructure and individual tests

Comments

@DarkaMaul
Copy link
Contributor

warehouse test suite currently executes 246 Alembic migrations sequentially during database setup, adding significant overhead to test execution times. I investigated squashing these migrations into a single migration that represents the current database state.

Using alembic's automatic migration feature provides a good starting point, but requires manual review to address limitations:

  1. SQL functions are not automatically exported
  2. Triggers are not included in the export
Configuration Full Tests API Tests
With Squash 19.69s 3.12s
Without Squash 21.75s 3.66s

Note: Currently experiencing 128 failures out of 4402 tests in the full test suite, while API tests are passing successfully.

Proposed Implementation:

  1. Preserve historical migrations by moving them to migrations/old/
  2. Replace current migrations with a single consolidated migration in migrations/versions/

Primary Consideration:

The main tradeoff is losing the ability to easily roll back to intermediate database states. We could mitigate this by choosing an earlier cutoff date for the consolidated migration if needed.

I'm opening this issue to discuss if we should proceed forward before trying to fix every edge case in the migration.

/cc @miketheman @woodruffw

@miketheman miketheman added testing Test infrastructure and individual tests developer experience Anything that improves the experience for Warehouse devs labels Feb 11, 2025
@miketheman
Copy link
Member

I am generally 👍 on things that go faster, so yay!

When it comes to alembic and initial DB state, that's where it's a little tricky.

For the functions and triggers not represented - is that due to an alembic limitation, or some other reason? Do you have a sense of which those are? I'm curious if they predate warehouse or exist for some other reason.

Considering that all migrations live in git history, do we need to preserve them at all? I routinely search git history with git log -p -S <somestring> to find the commits with that text, and then can use other techniques to browse from there.

The main tradeoff is losing the ability to easily roll back to intermediate database states. We could mitigate this by choosing an earlier cutoff date for the consolidated migration if needed.

Possibly, however it's rare that we roll back the db more than a single migration, usually due to some deployment issue or other error.

I think the Most Important Part of this kind of effort is that running the new "base" migration should be a no-op in production, and since the dev db doesn't fully represent the production database, that's a risk we'll have to consider and mitigate..

@DarkaMaul
Copy link
Contributor Author

I finished running the migration today and fixed the tests - everything is green now, but there is a slight coverage difference I still need to investigate.

There are 11 triggers in the database:

  • projects_update_sitemap_bucket
  • projects_update_normalized_name
  • accounts_user_update_sitemap_bucket
  • update_user_password_date
  • update_project_last_serial
  • release_files_requires_python
  • releases_requires_python
  • update_project_total_size_release_files
  • releases_update_is_prerelease
  • normalize_prohibited_project_names (this one was named normalize_blacklist but I renamed it to keep it consistent with the new table name)

They are used to call functions before / after events on certain table:

CREATE TRIGGER normalize_prohibited_project_names
            BEFORE INSERT OR UPDATE ON prohibited_project_names
            FOR EACH ROW EXECUTE PROCEDURE ensure_normalized_prohibited_project_names();
    )

Alembic does not support detecting triggers or procedures. alembic-utils does but you need to register them using their functions (which was not the case in warehouse)

What was needed:

  • adding procedures and triggers
  • adding extensions
  • adding certain tables values ( admin_flags, prohibited_project_names)
  • the Python script generated was slightly invalid (missing imports)
  • adding a constraint that had not been detected (ck_disallow_private_top_level_classifier)

With the newly squashed migration all green, the tests are now even faster (~17.2s) 🎉

I think the Most Important Part of this kind of effort is that running the new "base" migration should be a no-op in production, and since the dev db doesn't fully represent the production database, that's a risk we'll have to consider and mitigate..

This should definitely be taken into account - but I don't know how I can help there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer experience Anything that improves the experience for Warehouse devs testing Test infrastructure and individual tests
Projects
None yet
Development

No branches or pull requests

2 participants