Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functionmotifs #116

Merged
merged 258 commits into from
Jun 5, 2024
Merged

Functionmotifs #116

merged 258 commits into from
Jun 5, 2024

Conversation

tnitka
Copy link
Collaborator

@tnitka tnitka commented Nov 27, 2023

Add Motif mode for identification of functionally relevant sequence motifs

tnitka added 30 commits April 28, 2023 08:17
@christinehc
Copy link
Collaborator

Please remember to update the version here before pushing. Depending on whether #115 gets merged first or not, try to coordinate versions (I suggest 1.2.0 for #115 and 1.3.0 for this update, unless we do a simultaneous 1.2.0)

christinehc added a commit that referenced this pull request Dec 12, 2023
changelog:
- kmers can now be scored by probability score subtracting the observed kmers in a supplied background set, family set, or combining both background and family
  - note: some column headers have changed, which may affect downstream analysis (e.g. integration with #115, #116)
- to handle user-supplied background files, new rules have been created to count background kmers and combine background kmer counts into a background matrix. The appropriate files for the new workflow have been created.
- extensive changes have been made to `snekmer.score` to accommodate the new changes, including:
  - `snekmer.score.score` now has 3 distinct formulae to compute probability scores according to the desired scoring method
  - `snekmer.score.feature_class_probabilities` now also integrates the scoring method
- the main scoring rule itself has been significantly altered as follows"
  - all references to the old and not-working "background subtraction" (e.g. separating sequences by "sample" or "background" labels) have been removed
  - extraneous kmer probability scores for every family are no longer calculated; only the family in question's kmer profile is scored
  - scoring method now integrated
christinehc added a commit that referenced this pull request Dec 12, 2023
changelog:
- kmers can now be scored by probability score subtracting the observed kmers in a supplied background set, family set, or combining both background and family
  - note: some column headers have changed, which may affect downstream analysis (e.g. integration with #115, #116)
- to handle user-supplied background files, new rules have been created to count background kmers and combine background kmer counts into a background matrix. The appropriate files for the new workflow have been created.
- extensive changes have been made to `snekmer.score` to accommodate the new changes, including:
  - `snekmer.score.score` now has 3 distinct formulae to compute probability scores according to the desired scoring method
  - `snekmer.score.feature_class_probabilities` now also integrates the scoring method
- the main scoring rule itself has been significantly altered as follows"
  - all references to the old and not-working "background subtraction" (e.g. separating sequences by "sample" or "background" labels) have been removed
  - extraneous kmer probability scores for every family are no longer calculated; only the family in question's kmer profile is scored
  - scoring method now integrated
Copy link
Collaborator

@christinehc christinehc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Comments

  • Code is very well documented IMO, great job on that front :)
  • I've added (very minor) comments re:code changes.
  • I think the function motifs capability would strongly benefit from an example notebook or demo going through results and the interpretation thereof. We can also potentially work on a template for a summary report.
  • We need a docs page, as well as general updates to the docs, to reflect the function motifs workflow.

@@ -52,7 +52,7 @@ jobs:
- shell: bash -l {0}
run: mamba install -y -c conda-forge snakemake==7.0 tabulate==0.8.10
- shell: bash -l {0}
run: pip install -e git+https://github.com/PNNL-CompBio/Snekmer@kmer-association#egg=snekmer
run: pip install -e git+https://github.com/PNNL-CompBio/Snekmer@functionmotifs#egg=snekmer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why this was changed for local testing, but once we merge in the PR, this should be changed to not point to the functionmotifs branch anymore

@@ -0,0 +1,209 @@
"""model.smk: Module for supervised kmer-based annotation models.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be motif.smk

tnitka and others added 8 commits January 30, 2024 10:34
changelog:
- RTD pages now include notebook-formatted pages with the learn/apply and motif demos.
- Formatted notebooks for compatibility with sphinx notebook formatting.
- Cleaned up notebooks, e.g. removing extraneous import statements
changelog:
- removed ~HEAD notebook file
- moved learn/apply and motif tutorial notebooks to docs folders to add them into sphinx documentation
- created symlinks for the above notebooks so that notebooks are still accessible from the resources directory
Copy link
Collaborator

@christinehc christinehc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions are minor and snekmer motif mode works on my machine without issues. We are close!

docs/source/getting_started/config.rst Outdated Show resolved Hide resolved
docs/source/getting_started/usage.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would additionally love to see a (minor) deep dive on the results for the two different families, i.e. comparing the highest-scoring kmers for the two families, or brief analysis of the top N kmers. Extra points if there is some sequence motif we can use as an example that makes sense for each given family.

snekmer/motif.py Outdated Show resolved Hide resolved
snekmer/scripts/motif_motif.py Outdated Show resolved Hide resolved
snekmer/scripts/motif_preselect.py Outdated Show resolved Hide resolved
snekmer/scripts/motif_rescore.py Outdated Show resolved Hide resolved
snekmer/scripts/motif_rescore.py Show resolved Hide resolved
snekmer/scripts/motif_rescore.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@christinehc christinehc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@tnitka tnitka removed the request for review from biodataganache June 5, 2024 20:30
@tnitka tnitka merged commit c3b8d91 into main Jun 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants