Skip to content

A python script used to extract specific sequences from a fasta file based on ID or keyword

Notifications You must be signed in to change notification settings

IdoBar/FastRetriever

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

FastRetriever

A python script used to extract specific sequences from a fasta file based on ID or keyword.

Background

Next Generation Sequences produce large fasta files that are too large to be handled in graphical user interface (GUI) text editors. Quite often these files need to be filtered based on ID list produced by other tools' output (gene expression results, anootation results, etc.). This script provides a simple GUI to allow researchers to retrieve specific sequences from big fasta files (genome/transcriptome assemblies) without the hassle of using Unix command line scripts and tools.

Usage

Run the script get_from_fasta_gui.py from command line or your favourite Python IDE.

Requirements

  1. List of sequence IDs, one in a line, in a csv file
  2. Fasta database to look for the sequences
  3. Python 2.7 with easygui and BioPython packages installed
  4. Graphical terminal access (windows on linux machines).

TODO

  1. Add exception handling for long fasta headers
  2. Add exception handling for program abort (no input files selection)
  3. Tidy up functions
  4. Keyword list support
  5. Command line version

About

A python script used to extract specific sequences from a fasta file based on ID or keyword

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages