mapping input contig names to output #10

lskatz · 2017-04-04T16:43:03Z

Given that I get a contig name like RNODE_1_length_2185_cov_1230.39000, how would I know which contig that is in the original file? I can search the coverage field 1230.39 to find the original contig name NODE_18_length_2240_cov_1230.39_component_1 but I hope there is a better way. Even in the stdout, the contig name is different as EDGE_703_length_2240_cov_1230.39_component_1.

This output is from plasmid spades. Sorry, I don't know if/when I would have the time to do a head to head comparison, regarding my previous issue #8.

The text was updated successfully, but these errors were encountered:

rozovr · 2017-04-04T17:18:36Z

For cycles that are composed of multiple contigs in the graph, the second output file is meant for this purpose:
.cycs.paths_w_cov.txt - a text file containing information about plasmids composed of multiple contigs.
The README contains information on the format of this file.

For plasmids that are isolated in the graph, I don't think the names are maintained, but you can also parse them out of the original fastg by searching for self edges, having headers formatted as X: X;

rozovr · 2017-04-04T17:19:12Z

and no worries about the comparison

lskatz · 2017-04-04T21:13:34Z

.cycs.paths_w_cov.txt is a blank file. Do I need to update from 0.62, or how do I get the right contents of it? This is how I run recycle.py:

recycle.py -g plasmidspades.fastg -k 55 -b reads_pe_primary.sort.bam -i True --max_CV 0.5 > cv.log

On this particular run, I have a fasta file with extension .cycs.fasta with header >RNODE_1_length_4125_cov_569.98200. The cov.txt file is zero bytes. Stdout was

85.0667 702.422769783 2152.29
================== path, coverage levels when added ====================
1432  nodes remain in component

==================final_paths identities after updates: ================
('EDGE_138_length_4180_cov_569.982_component_1',)

rozovr · 2017-04-05T13:57:14Z

Together these outputs mean there was only one plasmid found, and that it was isolated in the original graph. the cov.txt file only gets written to when there are cycles made up of multiple nodes (edges in spades' convention).

lskatz · 2017-04-05T17:14:44Z

Ok but going back to the original question, would it be possible to make a method of mapping source contigs to output contigs?

rprops · 2017-10-18T10:23:33Z

I would appreciate this feature as well. This makes it possible to further analyse putative contigs in anvi'o by importing the full assembly and map the contigs into putative plasmid bins.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mapping input contig names to output #10

mapping input contig names to output #10

lskatz commented Apr 4, 2017 •

edited

Loading

rozovr commented Apr 4, 2017

rozovr commented Apr 4, 2017

lskatz commented Apr 4, 2017

rozovr commented Apr 5, 2017

lskatz commented Apr 5, 2017

rprops commented Oct 18, 2017

mapping input contig names to output #10

mapping input contig names to output #10

Comments

lskatz commented Apr 4, 2017 • edited Loading

rozovr commented Apr 4, 2017

rozovr commented Apr 4, 2017

lskatz commented Apr 4, 2017

rozovr commented Apr 5, 2017

lskatz commented Apr 5, 2017

rprops commented Oct 18, 2017

lskatz commented Apr 4, 2017 •

edited

Loading