Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#53: Create a deployment script #54

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

benedeki
Copy link
Contributor

@benedeki benedeki commented Sep 18, 2023

  • deployment script
  • README.md for the deployment script

Closes #53

* deployment script
@benedeki benedeki added the work in progress Work on this item is not yet finished (mainly intended for PRs) label Sep 18, 2023
@github-actions
Copy link

JaCoCo code coverage report - scala 2.11.12

There is no coverage information present for the Files changed

Total Project Coverage 48.81% 🍏

@github-actions
Copy link

JaCoCo code coverage report - scala 2.12.17

There is no coverage information present for the Files changed

Total Project Coverage 48.96% 🍏

@benedeki benedeki marked this pull request as ready for review September 18, 2023 15:51
@benedeki benedeki removed the work in progress Work on this item is not yet finished (mainly intended for PRs) label Sep 18, 2023
@benedeki benedeki self-assigned this Sep 18, 2023
Copy link
Contributor

@miroslavpojer miroslavpojer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • pull
  • code review
  • run

scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
scripts/deployment/deploy.py Outdated Show resolved Hide resolved
elif input_file.endswith(".sql"):
functions.append(read_file(schema_dir + input_file))
elif input_file.endswith(".ddl"):
tables.append(read_file(schema_dir + input_file))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential optimization could be to only get the file names in this function, and then the loading of their content could be done in the same way as on line "\n".join(map(read_file, init_sqls)), so it would be consistent and a bit more memory efficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More consitent maybe.
More memory efficient why?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More memory efficient why?

because now it has to be hold in memory from this point until it get's garbage collected which is probably after some point of running the sql statements.

If read only in that join function, then the memory requirements are the same but it will be held in memory probably for shorter time. Not a big deal in this scenario I know

benedeki and others added 4 commits September 20, 2023 19:27
Co-authored-by: Ladislav Sulak <[email protected]>
Co-authored-by: miroslavpojer <[email protected]>
* renamed `deploy.py` to `deploy_pg.py`
* added example fo the expected structure to the `README.md`
* included `requirements.txt`
Copy link
Collaborator

@lsulak lsulak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this script on AUL DB files, and it worked when. However, repetitive runs don't work - generally speaking there is CREATE TABLE X and not CREATE TABLE IF NOT EXISTS X - detecting changes on tables would be complicated for this script and its intended (temporary) use = so we won't be able to replace / update tables with this script but that's okay - besides, functions will be replaced, that's good.

But I feel that this script shouldn't fail and we should be able to replace the implementation of DB functions with it - but now the script fails saying that a table already exists - we could capture this exception psycopg2.errors.DuplicateTable maybe? Or even better, perhaps we could ignore Initializing the database... step and it could be driven by some CLI param.?

So I propose to have this:

  • CREATE DB - no by default, but can be enabled
  • INIT DB - no by default, but can be enabled - this would need one more CLI parameter
  • Populate schemas - yes by default, cannot be disabled
    parser.add_argument(
        "--init-db",
        action="store_true",
        help="initializes the target database (runs the scripts ending with '.ddl' that don't start with '00_'"
    )

return file.read()


def process_dir(directory: str, conn_config: PostgresDBConn, create_db: bool) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still think that this function is a bit too long / doing too much, I would definitely split the processing of those 3 file categories into 3 separate functions - but I don't insist since this is a temporary script, still very useful.

An obvious improvement would be also to check whether the CLI DB name and DB name from the SQL files are matching, or to have the SQL files as Jinja2 templates and insert all necessary values dynamically. But that would be probably too much for this script & now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are exactly right in all what you write.
And also have good reasoning why not to do it. Hopefully we will have good deployment tool, after...

Copy link
Collaborator

@lsulak lsulak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • code reviewed
  • pulled
  • built
  • ran on my machine against my local PG db with Ursa Unify DB objects

found a few issues and I would love one thing to implement (repetitive run support), otherwise I'm happy to approve

Copy link
Contributor

@miroslavpojer miroslavpojer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • tested

@benedeki benedeki requested a review from Zejnilovic as a code owner October 11, 2023 12:57
@lsulak
Copy link
Collaborator

lsulak commented Nov 12, 2023

I've tried this script on AUL DB files, and it worked when. However, repetitive runs don't work - generally speaking there is CREATE TABLE X and not CREATE TABLE IF NOT EXISTS X - detecting changes on tables would be complicated for this script and its intended (temporary) use = so we won't be able to replace / update tables with this script but that's okay - besides, functions will be replaced, that's good.

But I feel that this script shouldn't fail and we should be able to replace the implementation of DB functions with it - but now the script fails saying that a table already exists - we could capture this exception psycopg2.errors.DuplicateTable maybe? Or even better, perhaps we could ignore Initializing the database... step and it could be driven by some CLI param.?

So I propose to have this:

  • CREATE DB - no by default, but can be enabled
  • INIT DB - no by default, but can be enabled - this would need one more CLI parameter
  • Populate schemas - yes by default, cannot be disabled
    parser.add_argument(
        "--init-db",
        action="store_true",
        help="initializes the target database (runs the scripts ending with '.ddl' that don't start with '00_'"
    )

From my side, I think this is the only thing that could be implemented, otherwise LGTM, I'm even gonna use it soon in Atum Service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a deployment script
3 participants