Skip to content

Analysis code for identifying genetic part variants in engineered plasmids

License

Notifications You must be signed in to change notification settings

barricklab/widespread-recurrent-part-variants

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Analysis code for:

McGuffie, M.J., Barrick, J.E., 2023. Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences. https://doi.org/10.1101/2023.04.10.536277

This code uses a directory of CSVs generated by pLannotate v1.2.0 from the Addgene plasmid database as input. The Addgene plasmid IDs used in analysis are available at data/addgene_ids.txt.gz in this repository. The associated plasmid sequences are available individually from the AddGene website (https://www.addgene.org/browse/) for download, and available for bulk download from AddGene upon request.

The computational enviroment can be created using the provided env.yml file with conda or mamba. The environment can be created with the following command:

mamba env create -f env.yml
  1. Annotate plasmids with pLannotate and output the results to a directory of CSVs.
  2. Run 01_parsing_annotate_addgene_csvs.ipynb to collect and clean the data.
  3. Run 02_pairwise_ds_scores.py to calculate DS scores of plasmid pairs.
  4. Run 03_analyzing_clusters.ipynb to cluster and analyze the part variants.

The output CSV of identified variants is available at data/supplemental_table_1.csv.gz in this repository.

About

Analysis code for identifying genetic part variants in engineered plasmids

Resources

License

Stars

Watchers

Forks

Packages

No packages published