Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functionmotifs #116

Merged
merged 258 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from 245 commits
Commits
Show all changes
258 commits
Select commit Hold shift + click to select a range
a5ddb35
Fix wildcard error in motif.smk
tnitka Apr 28, 2023
b1e5c5d
Intentionally break motif before current error to check that changes …
tnitka Apr 28, 2023
9b0d4c7
Fix most motif.smk errors, including intentional break
tnitka Apr 28, 2023
508a01b
Fix wildcard and key errors
tnitka Apr 28, 2023
9d91602
Fix undefined name in motif script
tnitka Apr 28, 2023
83b90b7
Fix input error in motif
tnitka Apr 28, 2023
eca87b3
Fix logging error in motif script
tnitka Apr 28, 2023
b26a173
Fix input error in motif script
tnitka Apr 28, 2023
5b17a81
Fix logging error in motif script
tnitka Apr 28, 2023
80bf8d0
Fix input error in motif script
tnitka Apr 28, 2023
c10b240
Fix input error in motif script
tnitka Apr 28, 2023
6aacb30
Fix input error in motif script
tnitka Apr 28, 2023
fcf8a2d
Fix compressed input handling in motif script
tnitka Apr 28, 2023
16e1256
Fix compressed input handling in motif script
tnitka Apr 28, 2023
5278048
Change input score source in motif script to the score matrix instead…
tnitka Apr 28, 2023
2e312d3
Implement bad debugging practive to be reverted in next commit
tnitka Apr 28, 2023
cbafaff
Fix input handling and remove bad debugging practice
tnitka Apr 28, 2023
7f8691e
Fix compressed input handling in motif script
tnitka Apr 28, 2023
381a68b
Fix conflict between dataframe and ndarray usage in motif_motif.py
tnitka Apr 28, 2023
77ea592
Use common basis for all proteins in motif
tnitka Apr 28, 2023
d1f07d2
Fix input error in motif snakefile
tnitka Apr 28, 2023
7096a71
Fix input error in motif snakefile
tnitka Apr 28, 2023
00438e7
Fix input error in motif script
tnitka May 1, 2023
11e7d1d
Fix input processing error in motif
tnitka May 1, 2023
43efb71
Fix input processing error in motif
tnitka May 1, 2023
0422444
Fix input processing error in motif
tnitka May 1, 2023
6e37b51
Fix scoring error in motif script
tnitka May 1, 2023
4ef303f
Fix error in motif scoring
tnitka May 1, 2023
2028438
Fix error in motif scoring
tnitka May 1, 2023
cd5951b
Fix error in motif scoring
tnitka May 1, 2023
52086fa
Fix type error in motif script
tnitka May 1, 2023
9774b51
Fix matrix shape error)
tnitka May 1, 2023
3415f4b
Fix matrix shape error in motif script
tnitka May 1, 2023
ec3cc33
Fix input handling error in motif script
tnitka May 1, 2023
0bd376f
Fix input handling error in motif script
tnitka May 1, 2023
8b8a9b1
Fix input handling error in motif script
tnitka May 1, 2023
308ddac
Correct input matrix column used for family labels
tnitka May 1, 2023
f9914d2
Correct input matrix column used for family labels
tnitka May 1, 2023
cbfd862
Fix input handling error in motif script
tnitka May 1, 2023
39c7ed6
Fix input handling error in motif script
tnitka May 1, 2023
74c197b
Fix input handling error in motif script
tnitka May 1, 2023
efb6477
Fix class error in motif script
tnitka May 1, 2023
e757935
Fix class error in motif script
tnitka May 1, 2023
6ea40eb
Fix class error in motif script
tnitka May 1, 2023
bb77727
Fix class error in motif script
tnitka May 1, 2023
af8d11d
Fix issue with DataFrame not being converted to ndarray
tnitka May 1, 2023
43168ff
Fix issue with DataFrame being converted to empty ndarray
tnitka May 1, 2023
734485f
Fix typographical error in motif script that prevented code execution
tnitka May 1, 2023
addd685
Correct error in selecting data labels
tnitka May 1, 2023
7d656fa
Fix type error in scoring parameters
tnitka May 1, 2023
98192c5
Fix type error in scoring parameters
tnitka May 1, 2023
32c61e4
Fix type error in scoring parameters
tnitka May 1, 2023
117dea5
Fix type error in scoring parameters
tnitka May 1, 2023
ab563aa
Fix type error in scoring parameters
tnitka May 1, 2023
84f1b4b
Fix type error in motif.py
tnitka May 1, 2023
d4f69a5
Fix error passing params to scorer
tnitka May 1, 2023
9008d46
Fix error passing params to scorer
tnitka May 1, 2023
bce9b70
Fix error passing params to scorer
tnitka May 1, 2023
4874b50
Fix error passing params to scorer
tnitka May 1, 2023
fc154b3
Fix error passing params to scorer
tnitka May 1, 2023
5669148
Fix error passing params to scorer
tnitka May 1, 2023
25e18aa
Fix error passing params to scorer
tnitka May 1, 2023
1dbd556
Fix error in motif object
tnitka May 1, 2023
535169f
Fix valueerror in motif script
tnitka May 1, 2023
0bca526
Change motif object to use pandas dataframe instead of numpy ndarray
tnitka May 2, 2023
ead24fa
Fix formatting errors in motif
tnitka May 2, 2023
f371cf4
Fix formatting errors in motif
tnitka May 2, 2023
38e87d5
Fix formatting errors in motif
tnitka May 2, 2023
7f7c47d
Fix input error in motif script
tnitka May 2, 2023
2f5558a
Fix input error in motif script
tnitka May 2, 2023
4fee5ea
Fix input error in motif script
tnitka May 2, 2023
31efcde
Fix error passing params to motif module
tnitka May 2, 2023
75430ea
Fix error passing params to score module
tnitka May 2, 2023
1357fb3
Fix error passing params to score module
tnitka May 2, 2023
354ec59
Fix error fetching output from score module
tnitka May 2, 2023
6afaa55
Fix error fetching output from score module
tnitka May 2, 2023
150e6ac
Fix error tabulating permutation scores
tnitka May 2, 2023
6e92538
Fix error tabulating permutation scores
tnitka May 2, 2023
d370955
Fix error tabulating permutation scores
tnitka May 2, 2023
58b082c
Fix error tabulating permutation scores
tnitka May 2, 2023
d6af933
Fix error tabulating permutation scores
tnitka May 2, 2023
e76ebe0
Fix error tabulating permutation scores
tnitka May 2, 2023
4dd3867
Fix error tabulating permutation scores
tnitka May 2, 2023
c59cb9c
Fix error tabulating permutation scores
tnitka May 2, 2023
4b55fce
Fix error tabulating permutation scores
tnitka May 2, 2023
66f3c7f
Fix error tabulating permutation scores
tnitka May 2, 2023
aaf38cd
Fix error tabulating permutation scores
tnitka May 2, 2023
32a247f
Fix reference error in motif module
tnitka May 2, 2023
2e0da3d
Fix reference error in motif module
tnitka May 2, 2023
80bf596
Fix array dimension error in motif script
tnitka May 2, 2023
0ab36d5
Fix array dimension error in motif script
tnitka May 3, 2023
0eb3a7e
Fix array dimension error in motif script
tnitka May 3, 2023
2749537
Fix array dimension error in motif script
tnitka May 3, 2023
bf25fd6
Fix array dimension error in motif script
tnitka May 3, 2023
e4848e3
Allow array concatenation in motif script
tnitka May 3, 2023
69c7995
Fix array dimension error in motif script
tnitka May 3, 2023
836c956
Fix array dimension error in motif script
tnitka May 3, 2023
1f4f6c4
Fix array dimension error in motif script
tnitka May 3, 2023
65df50a
Fix array dimension error in motif script
tnitka May 3, 2023
3130c72
Change p_values class in motif module to use dataframes
tnitka May 3, 2023
9911684
Fix indexerror in motif module
tnitka May 3, 2023
70295b4
Fix type error in motif module
tnitka May 3, 2023
3a4bd06
Fix type error in motif module
tnitka May 3, 2023
6898f68
Fix type error in motif module
tnitka May 3, 2023
09d21d4
Fix type error in motif module
tnitka May 3, 2023
dd07587
Fix type error in motif module
tnitka May 3, 2023
6836b3b
Fix error uncompressing input
tnitka May 3, 2023
df278b1
Fix error uncompressing input
tnitka May 3, 2023
ab32afa
Fix error uncompressing input
tnitka May 3, 2023
8e56cbc
Fix error uncompressing input
tnitka May 3, 2023
270e91d
Fix error uncompressing input
tnitka May 3, 2023
04d80e7
Fix error in p_values class
tnitka May 3, 2023
e14fcb6
Fix input data typing error
tnitka May 3, 2023
e999781
Fix typing error in motif module
tnitka May 3, 2023
08056d3
Fix type error in motif
tnitka May 3, 2023
c1aabe7
Fix type error in motif
tnitka May 3, 2023
5e13e59
Fix type error in motif
tnitka May 3, 2023
79de7e9
Fix type error in motif
tnitka May 3, 2023
eea9dc7
Fix type error in motif
tnitka May 3, 2023
a896562
Fix type error in motif
tnitka May 3, 2023
02e9393
Fix type error in motif
tnitka May 3, 2023
cdfa4e4
Fix type error in motif
tnitka May 3, 2023
a0b2ccd
Fix type error in motif
tnitka May 3, 2023
c34dacd
Fix type error in motif
tnitka May 3, 2023
aac2f03
Fix type error in motif
tnitka May 3, 2023
c5a0f90
Fix type error in motif
tnitka May 3, 2023
59cf33c
Fix type error in motif
tnitka May 3, 2023
0a67d8e
Fix type error in motif
tnitka May 3, 2023
1c2d4b7
Fix type error in motif
tnitka May 3, 2023
e783d62
Fix type error in motif
tnitka May 3, 2023
39b988a
Fix type error in motif
tnitka May 3, 2023
7ff7acf
Remove redundant output
tnitka May 3, 2023
cb0aa78
Prevent snakemake from expecting redundant output
tnitka May 3, 2023
870d4af
Fix error that causes empty output file
tnitka May 3, 2023
ce902bb
Fix error that causes empty output file
tnitka May 3, 2023
210494c
Fix error that causes empty output file
tnitka May 3, 2023
7ec82ba
Fix error that causes empty output file
tnitka May 3, 2023
4118256
Fix error that causes empty output file
tnitka May 3, 2023
24aa508
Fix error that causes empty output file
tnitka May 3, 2023
56ddb0c
Fix error that causes empty output file
tnitka May 3, 2023
f50bb79
Fix error that causes empty output file
tnitka May 3, 2023
e6c4730
Fix error that causes empty output file
tnitka May 3, 2023
0359abb
Fix error that causes empty output file
tnitka May 3, 2023
9ba204f
Fix error that causes empty output file
tnitka May 3, 2023
893bd89
Fix error preventing kmers from being included in output
tnitka May 3, 2023
1882e9a
Fix error preventing kmers from being included in output
tnitka May 3, 2023
f43560b
Fix error preventing all kmers from being scored
tnitka May 3, 2023
62e1964
Fix indexing errors
tnitka May 3, 2023
3df7a20
Fix indexing errors
tnitka May 3, 2023
7b3f280
Fix indexing errors
tnitka May 3, 2023
e33699f
Fix indexing errors
tnitka May 3, 2023
87e0034
Fix indexing errors
tnitka May 3, 2023
641207e
Fix indexing errors
tnitka May 4, 2023
8addb07
Fix indexing errors
tnitka May 4, 2023
c3a4d51
Fix indexing errors
tnitka May 4, 2023
35f771c
Fix indexing errors
tnitka May 4, 2023
274c3af
Fix indexing errors
tnitka May 4, 2023
90df931
Explicitly name columns to fix KeyError in motif
tnitka May 4, 2023
54f0190
Explicitly name columns to fix KeyError in motif
tnitka May 4, 2023
f3a24b6
Explicitly name columns to fix KeyError in motif
tnitka May 4, 2023
99759ff
Explicitly name columns to fix KeyError in motif
tnitka May 4, 2023
c260458
Fix issue selecting kmer sequence in motif
tnitka May 4, 2023
311bb6e
Fix issue selecting kmer sequence in motif
tnitka May 4, 2023
0f54a89
Change iteration from DataFrame to NDArray
tnitka May 4, 2023
7aad4fe
Change iteration from DataFrame to NDArray
tnitka May 4, 2023
10d71c8
Change iteration from DataFrame to NDArray
tnitka May 4, 2023
b6e7625
Change iteration from DataFrame to NDArray
tnitka May 4, 2023
b7eb7e3
Change iteration from DataFrame to NDArray
tnitka May 4, 2023
37de91d
Change iteration from DataFrame to NDArray
tnitka May 4, 2023
ce1ac40
Fix error fetching kmer scores from score
tnitka May 4, 2023
f8cdb37
Fix error fetching kmer scores from score
tnitka May 4, 2023
250f15b
Fix error fetching kmer scores from score
tnitka May 4, 2023
141c95f
Fix type error in p value calculation
tnitka May 4, 2023
1539238
Remove redundant code
tnitka May 8, 2023
c62212b
Fix issue in motif rule that was sometimes causing MissingInputException
tnitka May 10, 2023
be64178
Ensure that all permutation scores are compared to the real score whe…
tnitka May 10, 2023
c403b28
Ensure that all permutation scores are compared to the real score whe…
tnitka May 10, 2023
ba857c5
Ensure that all permutation scores are compared to the real score whe…
tnitka May 10, 2023
7512bb7
Add scores as output from motif
tnitka May 10, 2023
aa80508
Remove score output from motif due to I/O issues
tnitka May 10, 2023
2fcf54b
Remove score output from motif due to I/O issues
tnitka May 10, 2023
d07a9b5
Temporarily print permutation scores to check whether they are identical
tnitka May 10, 2023
fec04e5
Remove printing as it is no longer necessary
tnitka May 10, 2023
710cda1
Remove unnecessary conversions in motif script
tnitka May 10, 2023
89e59f3
Fix syntax in motif script
tnitka May 10, 2023
1cf25a4
Add scores as output from motif
tnitka May 11, 2023
eb8d2b7
Fix issue causing scores to be the same across iterations in motif
tnitka May 11, 2023
e8be1e2
Remove unused code
tnitka May 12, 2023
abb0a70
Change motif output to be sorted by p value
tnitka May 12, 2023
a22fe90
Fix motif output sorting order to put most significant kmers first
tnitka May 12, 2023
b3897ea
Fix issue parsing kmers when called with k=2
tnitka May 12, 2023
9ccf635
Fix motif output sorting order to put most significant kmers first
tnitka May 15, 2023
efd7f02
Change output sorting to use p value first followed by score on real …
tnitka May 15, 2023
9cb9834
Fix motif output sorting order to put highest scoring kmers first for…
tnitka May 15, 2023
1f7a838
Fix formatting of csv containing scores from motif iterations
tnitka May 16, 2023
eab5490
Fix issue causing motif to read kmer NA as np.nan
tnitka May 17, 2023
99e1526
Expand definition of false positives in motif to scores greater than …
tnitka May 17, 2023
dcea2f6
Fix issue causing too few scoring iterations to be compared in motif
tnitka May 18, 2023
39aa1e5
Remove minimum family size from motif
tnitka May 30, 2023
c0631c1
Remove unnecessary family size check from motif snakefile
tnitka Jul 3, 2023
04d38bd
Fix issue sorting negative scores in motif
tnitka Jul 3, 2023
69e2624
Fix issue sorting kmers with negative weights in motif output
tnitka Jul 6, 2023
6f58027
Parallelize the motif workflow
tnitka Jul 12, 2023
333822c
Fix error sorting output of motif workflow
tnitka Jul 19, 2023
788d086
Remove redundant code from motif workflow to slightly reduce memory u…
tnitka Jul 19, 2023
16efac9
Reduce memory usage in motif and model
tnitka Jul 20, 2023
bf8dfb4
Further reduce memory usage in model
tnitka Jul 21, 2023
e5f4bd4
Fix an issue affecting rescoring results
tnitka Jul 24, 2023
9496699
Reduce memory usage in motif
tnitka Jul 25, 2023
670819a
Reduce memory usage of motif module
tnitka Jul 26, 2023
17a24da
Reduce peak memory usage during motif
tnitka Jul 26, 2023
c875698
Fix error normalizing kmer weights in motif
tnitka Jul 31, 2023
99cd156
Fix error normalizing kmer weights in motif
tnitka Jul 31, 2023
4af188e
Fix error normalizing kmer weights in motif
tnitka Jul 31, 2023
d76238b
Fix error normalizing kmer weights in motif
tnitka Jul 31, 2023
fb615b0
Fix error normalizing scores in motif
tnitka Aug 1, 2023
9a52700
Add preselection step to motif workflow
tnitka Aug 3, 2023
f57d8c0
Fix score scaling error in motif
tnitka Aug 3, 2023
d049f5f
Fix error in motif preselection step
tnitka Aug 4, 2023
2cef422
Fix error calculating p-values in motif
tnitka Aug 4, 2023
dfff88f
Apply recursive feature elimination during motif preselection
tnitka Aug 15, 2023
121df5f
Adjust motif preselection stopping criterion to improve results
tnitka Aug 16, 2023
f368ea7
Decrease RFE step size in motif preselection step
tnitka Aug 22, 2023
d8d0e06
Remove redundant code
tnitka Aug 23, 2023
9bc50c2
fixup! Remove redundant code
tnitka Nov 27, 2023
76410b4
Merge branch 'main' into functionmotifs
tnitka Nov 28, 2023
bd0b268
Format/lint code
tnitka Nov 29, 2023
96d88d3
Format/lint motif snakefile
tnitka Nov 29, 2023
c195a9d
Fix whitespace error in snakefmt output
tnitka Nov 29, 2023
2eba304
Format/lint snakefile
tnitka Nov 29, 2023
2507498
update action.yml
tnitka Nov 29, 2023
3c54509
Update test and fix command line parser error introduced during rebase
tnitka Dec 5, 2023
4e81aee
fixup! Update test and fix command line parser error introduced durin…
tnitka Dec 5, 2023
7383c42
Add motif test to CI workflow
tnitka Dec 5, 2023
2d5cae5
Correct snekmer motif test environment
tnitka Dec 5, 2023
8de742d
chore: update _version.py
tnitka Dec 12, 2023
46b526c
Merge branch 'main' into functionmotifs
tnitka Dec 14, 2023
8db7a27
Add Motif tutorial
tnitka Jan 30, 2024
22f0a02
Update docs to include motif
tnitka Feb 1, 2024
4c4ad19
Add model from RFE as output in motif
tnitka Feb 2, 2024
9c47a7a
Update documentation for motif
tnitka Feb 2, 2024
aa1c11d
Fix formatting
tnitka Feb 2, 2024
6924e9e
Add motif report output
tnitka Feb 21, 2024
9f72ba1
docs: add demo pages for learn/apply and motif
christinehc Mar 5, 2024
56f6607
docs: clean up files and update/create symlinks
christinehc Mar 5, 2024
4ff1b0a
Fix formatting
tnitka Mar 15, 2024
cdb04f7
Remove redundant code from motif result script
tnitka Mar 15, 2024
4d86439
move Motif tutorial into separate directory with more informative con…
tnitka Mar 26, 2024
a57cb02
Add motif to README.md
tnitka Jun 5, 2024
64f4fef
Merge branch 'main' into functionmotifs
tnitka Jun 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/workflows/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
- shell: bash -l {0}
run: mamba install -y -c conda-forge snakemake==7.0 tabulate==0.8.10
- shell: bash -l {0}
run: pip install -e git+https://github.com/PNNL-CompBio/Snekmer@kmer-association#egg=snekmer
run: pip install -e git+https://github.com/PNNL-CompBio/Snekmer@functionmotifs#egg=snekmer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why this was changed for local testing, but once we merge in the PR, this should be changed to not point to the functionmotifs branch anymore


#test clustering step
- name: Snekmer Cluster
Expand Down Expand Up @@ -105,3 +105,11 @@ jobs:
source activate snekmer
snekmer apply --configfile .test/config_learnapp.yaml -d .test --cores 1
rm -rf .test/output

# run Snekmer Motif using previously generated model files
- name: Snekmer Motif
run: |
export PATH="/usr/share/miniconda/bin:$PATH"
source activate snekmer
snekmer motif --configfile .test/config.yaml -d .test --cores 1
rm -rf .test/output
4 changes: 3 additions & 1 deletion .test/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,6 @@ score_dir: "output/example-model/"
learnapp:
save_apply_associations: False


# motif params
motif:
n: 200
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ scikit-learn
tabulate == 0.8.10
umap-learn
hdbscan
pyarrow
pyarrow
10 changes: 7 additions & 3 deletions resources/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ model:
random_state: None

# search params
model_dir: "output/example-model/"
basis_dir: "output/example-model/"
score_dir: "output/example-model/"
model_dir: "output/model/"
basis_dir: "output/kmerize/"
score_dir: "output/score/"

# motif params
motif:
n: 2000
3 changes: 3 additions & 0 deletions resources/tutorial/demo_example/run_demo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ mv output/scoring/*scorer output/example-model/

# run snekmer search on examples using provided config.yaml
snekmer search --configfile=../../config.yaml

# run snekmer motif on examples using provided config.yaml
snekmer motif --configfile=../../config.yaml
1 change: 1 addition & 0 deletions snekmer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from . import vectorize
from . import report
from . import _version
from . import motif

# from . import walk

Expand Down
2 changes: 1 addition & 1 deletion snekmer/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.2.0"
__version__ = "1.3.0"
29 changes: 29 additions & 0 deletions snekmer/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,11 @@ def get_argument_parser():
description="Search sequences against pre-existing models via Snekmer.",
parents=[parser["smk"]],
)
parser["motif"] = parser["subparsers"].add_parser(
"motif",
description="Find structurally and functionally relevant motifs via Snekmer.",
parents=[parser["smk"]]
)
parser["learn"] = parser["subparsers"].add_parser(
"learn",
description="Learn kmer-annotation associations via Snekmer",
Expand Down Expand Up @@ -386,6 +391,30 @@ def main():
verbose=args.verbose,
quiet=args.quiet,
)

elif args.mode == "motif":
snakemake(
resource_filename("snekmer", os.path.join("rules", "motif.smk")),
configfiles=configfile,
config=config,
cluster_config=args.clust,
cluster=cluster,
keepgoing=args.keepgoing,
force_incomplete=True,
forcerun=args.forcerun,
cores=args.cores,
nodes=args.jobs,
workdir=args.directory,
dryrun=args.dryrun,
unlock=args.unlock,
list_code_changes=args.list_code_changes,
list_params_changes=args.list_params_changes,
until=args.until,
touch=args.touch,
latency_wait=args.latency,
verbose=args.verbose,
quiet=args.quiet,
)

elif args.mode == "learn":
snakemake(
Expand Down
109 changes: 109 additions & 0 deletions snekmer/motif.py
christinehc marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
"""motif: Identification of structurally and functionally relevant motifs with Snekmer.
Created on Fri Apr 21 15:25:54 2023

author: @tnitka
"""
# ---------------------------------------------------------
# Imports
# ---------------------------------------------------------
# import pickle
# from datetime import datetime

# import snekmer as skm
import pandas as pd
import numpy as np
# import snekmer.motif
# from typing import Any, Dict, List, Optional
# from ._version import __version__
# from .vectorize import KmerBasis
# from .score import KmerScorer
# from .model import SnekmerModel, SnekmerModelCV
#from numpy.typing import NDArray
# from sklearn.base import BaseEstimator, ClassifierMixin
# from sklearn.tree import DecisionTreeClassifier
# from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
# from sklearn.linear_model import LogisticRegression # LogisticRegressionCV
# from sklearn.model_selection import GridSearchCV, cross_validate
# from sklearn.pipeline import make_pipeline, Pipeline
# from sklearn.svm import SVC

# object to permute training data and retrain
class SnekmerMotif:
"""Permute training data and retrain to find highly distinguishing kmers.

Parameters
----------
n : int
Number of permutations to test.
scores : NDArray
"""
def __init__(self):
self.generator = np.random.default_rng()
# self.scorer = skm.score.KmerScorer()

def permute(self, X: pd.DataFrame, label, label_col="family"):
"""

Parameters
----------
X : Dataframe containing matrix of shape (n_kmers, n_features)
Labeled training data.
label : str
Primary family label.
label_col : str
Column with family labels.

Returns
-------
Dataframe
Training data with permuted labels, for retraining and rescoring.

"""
# save primary family label
self.primary_label = label
self.labels = X[label_col].values

self.generator.shuffle(self.labels)
# self.permuted_labels = self.generator.permutation(self.labels)
# self.permuted_data = X
X[label_col] = self.labels

return X

def p_values(self, X, y: np.ndarray, n: int):
"""

Parameters
----------
X: Dataframe containing matrix of shape (n_kmers, n_iterations)
kmer scores from each permutation tested
y: list or array-like of shape (n_kmers, 1)
kmer scores from real training data
n: int
number of permutations tested

Returns
-------
Dataframe
matrix containing kmer sequences, scores on real data, number of scores
on permuted data that exceed that on real data, n_iterations, and
proportion of scores on permuted data that exceed that on real data.

"""
# self.output = pd.DataFrame(columns=('kmer', 'real score', 'false positives', 'n', 'p'))
self.output_matrix = np.empty((1, 5))
for i in range(0, len(y)-1):
self.seq = X['kmer'].iloc[i]
self.real_score = y[i]
self.false_score = X.iloc[i, 1:(n+1)].ge(self.real_score).sum()
self.p = self.false_score/n
self.vec = np.array([[self.seq, self.real_score, self.false_score, n, self.p]])
self.output_matrix = np.append(self.output_matrix, self.vec, axis=0)


else:
self.output_matrix = np.delete(self.output_matrix, 0, 0)

self.output = pd.DataFrame(self.output_matrix, columns=('kmer', 'real score', 'false positives', 'n', 'p'))

return self.output
Loading
Loading