-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functionmotifs #116
Functionmotifs #116
Conversation
…are being applied properly.
changelog: - kmers can now be scored by probability score subtracting the observed kmers in a supplied background set, family set, or combining both background and family - note: some column headers have changed, which may affect downstream analysis (e.g. integration with #115, #116) - to handle user-supplied background files, new rules have been created to count background kmers and combine background kmer counts into a background matrix. The appropriate files for the new workflow have been created. - extensive changes have been made to `snekmer.score` to accommodate the new changes, including: - `snekmer.score.score` now has 3 distinct formulae to compute probability scores according to the desired scoring method - `snekmer.score.feature_class_probabilities` now also integrates the scoring method - the main scoring rule itself has been significantly altered as follows" - all references to the old and not-working "background subtraction" (e.g. separating sequences by "sample" or "background" labels) have been removed - extraneous kmer probability scores for every family are no longer calculated; only the family in question's kmer profile is scored - scoring method now integrated
changelog: - kmers can now be scored by probability score subtracting the observed kmers in a supplied background set, family set, or combining both background and family - note: some column headers have changed, which may affect downstream analysis (e.g. integration with #115, #116) - to handle user-supplied background files, new rules have been created to count background kmers and combine background kmer counts into a background matrix. The appropriate files for the new workflow have been created. - extensive changes have been made to `snekmer.score` to accommodate the new changes, including: - `snekmer.score.score` now has 3 distinct formulae to compute probability scores according to the desired scoring method - `snekmer.score.feature_class_probabilities` now also integrates the scoring method - the main scoring rule itself has been significantly altered as follows" - all references to the old and not-working "background subtraction" (e.g. separating sequences by "sample" or "background" labels) have been removed - extraneous kmer probability scores for every family are no longer calculated; only the family in question's kmer profile is scored - scoring method now integrated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall Comments
- Code is very well documented IMO, great job on that front :)
- I've added (very minor) comments re:code changes.
- I think the function motifs capability would strongly benefit from an example notebook or demo going through results and the interpretation thereof. We can also potentially work on a template for a summary report.
- We need a docs page, as well as general updates to the docs, to reflect the function motifs workflow.
@@ -52,7 +52,7 @@ jobs: | |||
- shell: bash -l {0} | |||
run: mamba install -y -c conda-forge snakemake==7.0 tabulate==0.8.10 | |||
- shell: bash -l {0} | |||
run: pip install -e git+https://github.com/PNNL-CompBio/Snekmer@kmer-association#egg=snekmer | |||
run: pip install -e git+https://github.com/PNNL-CompBio/Snekmer@functionmotifs#egg=snekmer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why this was changed for local testing, but once we merge in the PR, this should be changed to not point to the functionmotifs branch anymore
snekmer/rules/motif.smk
Outdated
@@ -0,0 +1,209 @@ | |||
"""model.smk: Module for supervised kmer-based annotation models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be motif.smk
changelog: - RTD pages now include notebook-formatted pages with the learn/apply and motif demos. - Formatted notebooks for compatibility with sphinx notebook formatting. - Cleaned up notebooks, e.g. removing extraneous import statements
changelog: - removed ~HEAD notebook file - moved learn/apply and motif tutorial notebooks to docs folders to add them into sphinx documentation - created symlinks for the above notebooks so that notebooks are still accessible from the resources directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestions are minor and snekmer motif mode works on my machine without issues. We are close!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would additionally love to see a (minor) deep dive on the results for the two different families, i.e. comparing the highest-scoring kmers for the two families, or brief analysis of the top N kmers. Extra points if there is some sequence motif we can use as an example that makes sense for each given family.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Add Motif mode for identification of functionally relevant sequence motifs