Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kaiko and casanovo multiple comparisons #13

Open
wants to merge 12 commits into
base: Kaiko-main
Choose a base branch
from

Conversation

jynlee7
Copy link
Contributor

@jynlee7 jynlee7 commented Jul 20, 2023

This PR is for a script that compares multiple Kaiko and Casanovo output sequences and puts it into one tsv file.

  • the script creates three functions, get_mgf_path_by_filename, get_kaiko_path_by_mgf_filename, get_casanovo_path_by_mgf_filename
  • each function does what the name of it does, first get_mgf_path_by_filename, inputs mgf_paths and it turns the mgf_paths into a dictionary with the mgf_file name and the mgf_path, so the dictionary would look something like this mgf_dict: dict{'input1': 'path/to/input1.mgf', 'input2': 'path/to/input2.mgf', ...}
  • get_kaiko_path_by_mgf_filename, would take in a list of kaiko output file paths, and return mgf_dict: dict{'input1': 'path/to/input1_out.txt', 'input2': 'path/to/input2_out.txt', ...}
  • get_casanovo_path_by_mgf_filename, would take a list of casanovo output file paths, and return mgf_dict: dict{'input1': 'path/to/casanovo_20230710143300.mztab', 'input2': 'path/to/casanovo_20230710143400.mztab', ...}
  • with these dictionaries, I created a for loop to find all of the common paths out of the 36 mgf files, and another for loop finding the common paths and outputting them into separate columns in the compare dataframe
  • then saved the dataframe into the output_path above

Added Casanovo outputs of the 300k mgf files. Used the gpu in order to finish the process in an optimal amount of time.

@jynlee7
Copy link
Contributor Author

jynlee7 commented Jul 20, 2023

the run_casanovo.sh was a bash script in order to run the 300k mgf files into the casanovo model to get mztab outputs

@jynlee7 jynlee7 changed the title kaiko compare multiple kaiko and casanovo multiple comparisons Jul 20, 2023
@jynlee7
Copy link
Contributor Author

jynlee7 commented Jul 20, 2023

in order to run this code you need the function aggregate_kaiko_casanovo from design_comparison_table.py

@jynlee7
Copy link
Contributor Author

jynlee7 commented Jul 24, 2023

@CamiloPosso, I resolved all the conflicts so that it's ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant