Skip to content

Constructing pan genome graphs from "protein family" assignments

Notifications You must be signed in to change notification settings

LPCDRP/pangenome_graphs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Panaconda

Requirements

//core code
python 2.7 (for now)
networkx


// layout algorithm
java

// visualization
firefox for web based browsing
gephi for graph editing and manipulation

Installation

git clone --recursive https://github.com/aswarren/pangenome_graphs.git
cd pangenome_graphs
pip install -r requirements.txt

Running

usage: fam_to_graph.py [-h] [--no_function] [--layout] [--output OUTPUT]  
                       [--rfgraph RFGRAPH] [--diversity {genus,species}]  
                       [--patric_figfam | --patric_plfam | --patric_pgfam | --generic]  
                       [--context {genome,contig,feature}]  
                       [--ksize {3,4,5,6,7,8,9}]  
                       [feature_files [feature_files ...]]  
positional arguments:  
  feature_files         Files of varying format specifing group, genome,  
                        contig, feature, and start in sorted order. stdin also  
                        accepted  
optional arguments:  
  -h, --help            show this help message and exit  
  --no_function         No functions as labels. Keep file size smaller.  
  --layout              run gephi layout code for gexf  
  --output OUTPUT       the path and base name give to the output files. if  
                        not given goes to stdout  
  --rfgraph RFGRAPH     create rf-graph gexf file at the following location  
  --diversity {genus,species}  
                        calculate diversity quotient according to given taxa  
                        level  
  --patric_figfam       PATRIC feature file in tab format  
  --patric_plfam        PATRIC feature file in tab format  
  --patric_pgfam        PATRIC feature file in tab format. selecting pgfams  
  --generic             table specifying the group, genome, contig, feature,  
                        and start in sorted order  
  --context {genome,contig,feature}  
                        the synteny context  
  --ksize {3,4,5,6,7,8,9}  
                        the size of the kmer to use in constructing synteny  

Example run for creating a graph

python fam_to_graph.py --layout --output data/BrucellaInversion/test_psgraph.gexf --patric_pgfam ./data/BrucellaInversion/*.tab

Visualizing data

Resulting gexf files can be opened in Gephi or through the JS visualizer distributed with Panaconda. Files can be loaded from local disk but Chrome currently restricts this. To run the javascript based visualizer locally you can use python to host a webserver and use a URL to view the data.

To do this:

cd viewer/gexf-js/

Soft link or copy a gexf file you want to view into the gexf-js folder. e.g. ln -s ../../data/BrucellaInversion/psgraph.gexf ./brucellainversion.gexf

python -m SimpleHTTPServer 8080

In firefox navigate to http://localhost:8080/index.html#brucellainversion.gexf

Data

Examples from the paper https://www.biorxiv.org/content/early/2017/11/08/215988

Zipped versions can be found in the data directory.

These data can also be found and manipulated at PATRIC BRC (currently requires free account) at the following https://patricbrc.org/workspace/public/[email protected]/Panaconda/PanSyntenyExamples

Currently the most conveniently accessible supported format is PATRIC's feature tab format. Groups can be downloaded from the "feature tab" in PATRIC.

About

Constructing pan genome graphs from "protein family" assignments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.3%
  • Perl 8.7%