Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make catalog migration lenient #29549

Merged
merged 2 commits into from
Jul 11, 2024

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Jul 10, 2024

SUMMARY

The function upgrade_catalog_perms is used to add catalog-related permissions when support for catalogs is introduced in a DB engine spec:

The migration needs to query the analytical databases in order to read the schemas, so that new permissions can be added. It might be possible that during the migration the analytical database is offline, resulting in an exception.

This PR wraps the call to database.get_all_schema_names in a try/except block, to prevent the migration from failing. If a given database is offline during the migration, the script will read the existing known schemas from schema_access, so that the migration can succeed.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

N/A

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@betodealmeida betodealmeida requested a review from a team as a code owner July 10, 2024 18:59
@github-actions github-actions bot added risk:db-migration PRs that require a DB migration preset-io labels Jul 10, 2024
@betodealmeida
Copy link
Member Author

@sadpandajoe:

  • Updated description with new approach
  • Added unit test

Copy link
Member

@Antonio-RiveroMartnez Antonio-RiveroMartnez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@michael-s-molina
Copy link
Member

@gooroodev please review

@gooroodev
Copy link

Please double-check what I found in the pull request:

🐞Mistake 🤪Typo 🚨Security 🚀Performance 💪Best Practices 📖Readability ❓Others
0 0 0 0 1 1 0

Summary of Proposed Changes

  • 🛠️ Added Session import from sqlalchemy.orm.
  • 🛠️ Introduced get_schemas function to read all known schemas from schema permissions.
  • 🛠️ Refactored upgrade_catalog_perms to use upgrade_schema_perms for schema permission updates.
  • 🛠️ Added upgrade_schema_perms function to handle schema permission updates, including a fallback if the database connection fails.
  • 🛠️ Refactored downgrade_catalog_perms to use downgrade_schema_perms for schema permission updates.
  • 🛠️ Added downgrade_schema_perms function to handle schema permission downgrades.
  • ➕ Added unit test test_upgrade_catalog_perms_graceful to ensure the upgrade process is lenient when the database connection fails.

Identified Issues

ID Type Details Severity Confidence
1 💪Best Practices Broad exception handling in upgrade_schema_perms and downgrade_schema_perms 🟠Medium 🟠Medium
2 📖Readability Inline SQL query in get_schemas could be more readable with better formatting 🟡Low 🟡Low

Issue 1: Broad exception handling in upgrade_schema_perms and downgrade_schema_perms

Explanation:
Broad exception handling can mask other potential issues and make debugging difficult.

File and Lines:

  • superset/migrations/shared/catalogs.py, lines 145-147 and 273-275

Suggested Fix:

def upgrade_schema_perms(database: Database, catalog: str, session: Session) -> None:
    """
    Rename existing schema permissions to include the catalog.
    """
    ssh_tunnel = DatabaseDAO.get_ssh_tunnel(database.id)
    try:
        schemas = database.get_all_schema_names(
            catalog=catalog,
            cache=False,
            ssh_tunnel=ssh_tunnel,
        )
    except DatabaseConnectionError as e:  # Replace with specific exception
        logging.error(f"Database connection failed: {e}")
        schemas = get_schemas(database.database_name)

    for schema in schemas:
        perm = security_manager.get_schema_perm(
            database.database_name,
            None,
            schema,
        )
        existing_pvm = session.query(ViewMenu).filter_by(name=perm).one_or_none()
        if existing_pvm:
            existing_pvm.name = security_manager.get_schema_perm(
                database.database_name,
                catalog,
                schema,
            )

def downgrade_schema_perms(database: Database, catalog: str, session: Session) -> None:
    """
    Rename existing schema permissions to omit the catalog.
    """
    ssh_tunnel = DatabaseDAO.get_ssh_tunnel(database.id)
    try:
        schemas = database.get_all_schema_names(
            catalog=catalog,
            cache=False,
            ssh_tunnel=ssh_tunnel,
        )
    except DatabaseConnectionError as e:  # Replace with specific exception
        logging.error(f"Database connection failed: {e}")
        schemas = get_schemas(database.database_name)

    for schema in schemas:
        perm = security_manager.get_schema_perm(
            database.database_name,
            catalog,
            schema,
        )
        existing_pvm = session.query(ViewMenu).filter_by(name=perm).one_or_none()
        if existing_pvm:
            existing_pvm.name = security_manager.get_schema_perm(
                database.database_name,
                None,
                schema,
            )

Explanation of Fix:
Replace the broad Exception with a more specific exception type, such as DatabaseConnectionError, to handle only the expected error and log the error message for better debugging.

Issue 2: Inline SQL query in get_schemas could be more readable with better formatting

Explanation:
The inline SQL query in get_schemas can be formatted for better readability.

File and Lines:

  • superset/migrations/shared/catalogs.py, lines 94-101

Suggested Fix:

def get_schemas(database_name: str) -> list[str]:
    """
    Read all known schemas from the schema permissions.
    """
    query = f"""
    SELECT
        avm.name
    FROM
        ab_view_menu avm
    JOIN
        ab_permission_view apv ON avm.id = apv.view_menu_id
    JOIN
        ab_permission ap ON apv.permission_id = ap.id
    WHERE
        avm.name LIKE '[{database_name}]%' AND
        ap.name = 'schema_access';
    """
    # [PostgreSQL].[postgres].[public] => public
    return sorted({row[0].split(".")[-1][1:-1] for row in op.execute(query)})

Explanation of Fix:
Reformat the SQL query to improve readability by aligning the clauses and keywords.

General Review of Code Quality and Style

The proposed changes generally improve the code by making the migration process more lenient and adding tests to ensure robustness. However, the broad exception handling should be replaced with more specific exceptions to avoid masking other potential issues. Additionally, minor improvements in SQL query formatting can enhance readability.

--
I only arrive when I am mentioned and asked to review the pull request.
Please reply or add a reaction to this review.

@betodealmeida
Copy link
Member Author

The broad except is needed because it could happen with any of the 50+ drivers we support.

@betodealmeida betodealmeida merged commit d535f3f into master Jul 11, 2024
36 of 37 checks passed
eschutho pushed a commit that referenced this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preset-io risk:db-migration PRs that require a DB migration size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants