-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#53: Create a deployment script #54
base: master
Are you sure you want to change the base?
Conversation
* deployment script
JaCoCo code coverage report - scala 2.11.12
|
JaCoCo code coverage report - scala 2.12.17
|
Co-authored-by: Ladislav Sulak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- pull
- code review
- run
scripts/deployment/deploy.py
Outdated
elif input_file.endswith(".sql"): | ||
functions.append(read_file(schema_dir + input_file)) | ||
elif input_file.endswith(".ddl"): | ||
tables.append(read_file(schema_dir + input_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potential optimization could be to only get the file names in this function, and then the loading of their content could be done in the same way as on line "\n".join(map(read_file, init_sqls))
, so it would be consistent and a bit more memory efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More consitent maybe.
More memory efficient why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More memory efficient why?
because now it has to be hold in memory from this point until it get's garbage collected which is probably after some point of running the sql statements.
If read only in that join function, then the memory requirements are the same but it will be held in memory probably for shorter time. Not a big deal in this scenario I know
Co-authored-by: Ladislav Sulak <[email protected]>
Co-authored-by: Ladislav Sulak <[email protected]> Co-authored-by: miroslavpojer <[email protected]>
* renamed `deploy.py` to `deploy_pg.py` * added example fo the expected structure to the `README.md` * included `requirements.txt`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried this script on AUL DB files, and it worked when. However, repetitive runs don't work - generally speaking there is CREATE TABLE X
and not CREATE TABLE IF NOT EXISTS X
- detecting changes on tables would be complicated for this script and its intended (temporary) use = so we won't be able to replace / update tables with this script but that's okay - besides, functions will be replaced, that's good.
But I feel that this script shouldn't fail and we should be able to replace the implementation of DB functions with it - but now the script fails saying that a table already exists - we could capture this exception psycopg2.errors.DuplicateTable
maybe? Or even better, perhaps we could ignore Initializing the database...
step and it could be driven by some CLI param.?
So I propose to have this:
- CREATE DB - no by default, but can be enabled
- INIT DB - no by default, but can be enabled - this would need one more CLI parameter
- Populate schemas - yes by default, cannot be disabled
parser.add_argument(
"--init-db",
action="store_true",
help="initializes the target database (runs the scripts ending with '.ddl' that don't start with '00_'"
)
return file.read() | ||
|
||
|
||
def process_dir(directory: str, conn_config: PostgresDBConn, create_db: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still think that this function is a bit too long / doing too much, I would definitely split the processing of those 3 file categories into 3 separate functions - but I don't insist since this is a temporary script, still very useful.
An obvious improvement would be also to check whether the CLI DB name and DB name from the SQL files are matching, or to have the SQL files as Jinja2 templates and insert all necessary values dynamically. But that would be probably too much for this script & now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are exactly right in all what you write.
And also have good reasoning why not to do it. Hopefully we will have good deployment tool, after...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- code reviewed
- pulled
- built
- ran on my machine against my local PG db with Ursa Unify DB objects
found a few issues and I would love one thing to implement (repetitive run support), otherwise I'm happy to approve
Co-authored-by: Ladislav Sulak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- tested
From my side, I think this is the only thing that could be implemented, otherwise LGTM, I'm even gonna use it soon in Atum Service. |
README.md
for the deployment scriptCloses #53