Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse import process #59

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

sheridancbio
Copy link
Contributor

Scripts, property files, and documentation for the blue-green deployment strategy to enable MySQL/Clickhouse database updates during cancer study import.

@sheridancbio sheridancbio added the enhancement New feature or request label Sep 27, 2024
@sheridancbio sheridancbio force-pushed the clickhouse-dependent-import-process branch 2 times, most recently from 6648c06 to 901afad Compare October 2, 2024 14:31
@sheridancbio sheridancbio force-pushed the clickhouse-dependent-import-process branch from 8966e72 to aa61d3b Compare October 10, 2024 13:59
mandawilson
mandawilson previously approved these changes Oct 18, 2024
get_genetic_profile_id_list_query="SELECT genetic_profile_id FROM genetic_profile WHERE genetic_alteration_type NOT IN ('GENERIC_ASSAY', 'MUTATION_EXTENDED', 'STRUCTURAL_VARIANT')"
query_argument_template="--query={0}"
query_argument = query_argument_template.format(get_genetic_profile_id_list_query)
clickhouse_client_obtain_genetic_profile_id_list = ["clickhouse", "client", "--config-file=clickhouse_client_config_2024-10-14-09-03-02.yaml", query_argument]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the config-file always hardcoded to this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see below you have a TODO to deal with that.

@@ -0,0 +1,36 @@
DROP TABLE IF EXISTS sample_to_gene_panel_derived;
DROP TABLE IF EXISTS gene_panel_to_gene_derived;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these commands stored somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This file was a placeholder. I need to add calls to the script which processes the .sql files from the github repo (clickhouse.sql and materialized_views.sql) and bursts them into individual sql statements (1 per file). Also, we need to detect and skip over the special cases (genetic_alteration_derived, generic_assay_data_derived).

So this file will be going away

return 1
fi
update_management_database_name="${my_properties['mysql_update_management_database']}"
### TODO : fix this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs adding back.

return 1
fi
update_management_database_name="${my_properties['mysql_update_management_database']}"
### TODO : fix this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

- focus on derived table construction and wrap up steps

Co-authored-by: Manda Wilson <[email protected]>
Co-authored-by: Robert Sheridan <[email protected]>
@sheridancbio sheridancbio force-pushed the clickhouse-dependent-import-process branch from c84d0a0 to 4713301 Compare November 15, 2024 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants