A python script used to extract specific sequences from a fasta file based on ID or keyword.
Next Generation Sequences produce large fasta files that are too large to be handled in graphical user interface (GUI) text editors. Quite often these files need to be filtered based on ID list produced by other tools' output (gene expression results, anootation results, etc.). This script provides a simple GUI to allow researchers to retrieve specific sequences from big fasta files (genome/transcriptome assemblies) without the hassle of using Unix command line scripts and tools.
Run the script get_from_fasta_gui.py from command line or your favourite Python IDE.
- List of sequence IDs, one in a line, in a csv file
- Fasta database to look for the sequences
- Python 2.7 with easygui and BioPython packages installed
- Graphical terminal access (windows on linux machines).
- Add exception handling for long fasta headers
- Add exception handling for program abort (no input files selection)
- Tidy up functions
- Keyword list support
- Command line version