forked from cole-trapnell-lab/cufflinks
-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
74 lines (58 loc) · 3.46 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
CUFFLINKS
----------------------------
Cufflinks is a reference-guided assembler for RNA-Seq experiments. It
simultaneously assembles transcripts from reads and estimates their relative
abundances, without using a reference annotation. The software expects as
input RNA-Seq read alignments in SAM format (http://samtools.sourceforge.net).
Here's an example spliced read alignment record:
s6.25mer.txt-913508 16 chr1 4482736 255 14M431N11M * 0 0 CAAGATGCTAGGCAAGTCTTGGAAG IIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 XS:A:-
This record includes a custom tag used by Cufflinks to determine the strand
of the mRNA from which this read originated. Often, RNA-Seq experiments lose
strand information, and so reads will map to either strand of the genome.
However, strandedness of spliced alignments can often be inferred from the
orientation of consensus splice sites, and Cufflinks requires that spliced
aligments have the custom strand tag XS, which has SAM attribute type "A", and
can take values of "+" or "-". If your RNA-Seq protocol is strand specific,
including this tag for all alignments, including unspliced alignments, will
improve the assembly quality.
The SAM records MUST BE SORTED by reference coordinate, like so:
sort -k 3,3 -k 4,4n hits.sam
The program is fully threaded, and when running with multiple threads, should
be run on a machine with plenty of RAM. 4 GB per thread is probably reasonable
for most experiments. Since many experiments feature a handful of genes that
are very abundantly transcribed, Cufflinks will spend much of its time
assembling" a few genes. When using more than one thread, Cufflinks may
appear to "hang" while these genes are being assembled.
Cufflinks assumes that library fragment lengths are size selected and normally
distributed. When using paired end RNA-Seq reads, you must take care to supply
Cufflinks with the mean and variance on the inner distances between mate
pairs. For the moment, Cufflinks doesn't support assembling mixtures of paired
end reads from different fragment size distributions. Mixtures of single
ended reads (of varying lengths) with paired ends are supported.
Cufflinks also assumes that the donor does not contain major structural
variations with respect to the reference. Tumor genomes are often highly
rearranged, and while Cufflinks may eventually identify gene fusions and
gracefully handle genome breakpoints, users are encouraged to be careful when
using Cufflinks on tumor RNA-Seq experiments.
The full manual may be found at http://cufflinks.cbcb.umd.edu
CUFFCOMPARE
----------------------------
Please see http://cole-trapnell-lab.github.io/cufflinks/manual
REQUIREMENTS
---------------------------
Cufflinks is a standalone tool that requires gcc 4.0 or greater, and runs on
Linux and OS X. It depends on Boost (http://www.boost.org) version 1.38 or
higher.
REFERENCES
---------------------------
Cufflinks builds on many ideas, including some
proposed in the following papers:
Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer and Barbara
Wold, "Mapping and quantifying mammalian transcriptomes by RNA-Seq",Nature
Methods, volume 5, 621 - 628 (2008)
Hui Jiang and Wing Hung Wong, "Statistical Inferences for isoform expression",
Bioinformatics, 2009 25(8):1026-1032
Nicholas Eriksson, Lior Pachter, Yumi Mitsuya, Soo-Yon Rhee, Chunlin Wang,
Baback Gharizadeh, Mostafa Ronaghi, Robert W. Shafer, Niko Beerenwinkel, "Viral
population estimation using pyrosequencing", PLoS Computational Biology,
4(5):e1000074