-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kSNP4] Add new task/workflow for phylogenetic/SNP analysis #759
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Also temporarily renamed kSNP4 workflow to hopefully show up in doctore
Tests done for ksnp4 with flu Tested ksnp3 Everything looks good |
Michal-Babins
approved these changes
Feb 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation looks good. Code additions look good. Maintains ksnp3 scheme but updates for 4. Good work. Tests are passing.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR closes #
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
⚡ Impacted Workflows/Tasks
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
Created a new workflow for
kSNP4
, representing version 4.1.⚙️ Algorithm
Algorithmically, there are no significant changes between
kSNP3
andkSNP4
. According to their paper,Additionally, their README states that upgrading from version 3.1.2 to 4.1 will significantly reduce runtime, with most changes being transparent to the user.
➡️ Inputs
All inputs/flags remain the same as in
kSNP3
except for a few that have been removed. Notably, these removed flags were not used in the previouskSNP3
task/workflow. The following flags have been removed:--genbank
<genbank.gbk> # optional file for SNP annotation--all_annotations
# optional, annotate each locus exhaustively with all the annotations in any of the annotated genomes. Without this option it only provides the first annotation it comes to for a given locus, checking in the order genomes are listed in the -annotate file.⬅️ Outputs
One default output file in
kSNP4
was renamed, while all other outputs remain unchanged fromkSNP3
. See line 65 intask_ksnp4.wdl
for details. To clarify,kSNP3
originally outputs a file namedtree.core.tre
, but inkSNP4
, this has been renamed totree.core_SNPs.parsimony.tre
. However, in both workflows, we automatically rename this file to~{cluster_name}_core.nwk
for consistency.🧪 Testing
Tested
kSNP4
locally with miniwdl and on Terra using a set of samples from thephylogenetics_bacterial_set
data table from the PHB_Validation_v2-3-0 workspace. To assess differences betweenkSNP3
andkSNP4
, I compared all of the major output files from each workflow usingdiff -u
. This allowed me to check for any changes in SNP count or tree structure.Files Compared:
(Terra output variables are listed above their corresponding file name.)
ksnp4_core_snp_matrix
ksnp4_core_tree
ksnp4_pan_tree
ksnp4_vcf_ref_genome
ksnp4_core_snp_table
ksnp4_pan_snp_matrix
ksnp4_vcf_snps_not_in_ref
ksnp4_snps
Results:
pan_tree.nwk
andpan_distance_matrix.csv
had slight differences (mostly due to changes in order).core_tree.nwk
andcore_distance_matrix.csv
also showed minor differences.Terra kSNP3 Test
Terra kSNP4 Test
diff -u STDOUT cjejuni_0810PADBR-1
diff-u STDOUT senterica_simulated
diff -u STDOUT senterica_tuna_scrape
diff -u STDOUT ecoli_1405WAEXK-1 (empty file)
diff -u STDOUT lmonocytogenes_1408MLGX6-3WGS (empty file)
Suggested Scenarios for Reviewer to Test
It might be beneficial to conduct additional tests using some of the optional arguments (or other sample sets). I tested the optional arguments locally, and they work, but I haven’t fully assessed their impact in comparison to
kSNP3
.🔬 Final Developer Checklist
workflows_overview
tables to be the tag for the next upcoming release. If you do not know the tag, please put "vX.X.X"🎯 Reviewer Checklist