This project contains two sets of code:
Firstly, parser for the accession format described here: https://www.ncbi.nlm.nih.gov/Sequin/acc.html called
scrape_accession_rules.py
and the RefSeq sections described in
https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly and
the SRA accessions described in https://www.ncbi.nlm.nih.gov/books/NBK56913/#search.what_do_the_different_sra_accessi.
This can be run as
scrape_accession_rules.py accession_rules.json
to scrape the rules at that website and save them in JSON format. The file generated by this parser is
used in the second script, parse_accession.py
which can be used as a Python module but also
as a command line tool e.g. parse_accession.py accession_rules.json AE014297
.
The parse_accession.py
script aims to work with all Python versions. The scrape_accession_rules.py
script requires at least Python 3.6 and the modules specified in conda_requirements.txt
.
This code is licensed under the BSD 3 clause license. Please feel free to use and modify the code, but
it is released without warranty. See the .py
files for details.