What's New in Version 2.1

Built in UMI collapsing algorithm
Parallel processing enabled on Mac, Linux, and Slurm HPC systems
Automatic file detection

Dependencies

Setup

All options from IronThrone v 1.0 still function as described below.
In a single directory, place amplicon R1/R2 files, barcode whitelist file, config file, IronThrone-GoT script, Combine_IronThrone_Parallel_Output.R script, and the appropriate Parallelized UMI Collapse script
fastq files will be automatically detected if filenames contain R1 or R2
Barcode whitelist file will be automatically detected if filename contains B/barcode
Config file will be automatically detected if file extension is .config

New Option Parameters

Option	Description
`-tl/--target_lines`	desired file length for split fastq files (must be multiple of 4, default: 500000)
`-pcr/--pcr_read_threshold`	ratio above which majority of PCR reads must be in order for a UMI to be called definitively (default: 0.5)
`-z/--skip_shuf`	if set to 1, skip the random shuffling of fastq files step, useful if needing to re-run just the IronThrone and UMI collapsing components of the pipeline (default: 0)
`-x/--skip_iron_throne`	if set to 1, skip the IronThrone step of the pipeline, useful if run in combination with `skip_shuf` to only run the UMI collapsing component of the pipeline (default: 0)
`-ld/--levenshtein_distance`	Allowed Levenshtein distance between UMIs determined to be similar enough for collapsing (default: 0.1)

Notes/Bugs to be Ironed out in Future Versions

Amplicon fastqs should be concatenated, paired-end, into 1 R1 and 1 R2 file
fastq read lengths should be identical for all reads within a given R1 or R2 file (R1 and R2 read lengths do not need to be equivalent)
Barcode file can be a complete whitelist provided by 10X (see below) or a custom .txt file of barcodes from a corresponding 10X run. Note: If using the unzipped barcodes.tsv file from CellRanger's filtered_feature_bc_matrix folder, barcodes will often be appended with -1, which will result in an error. You can run sed 's/..$//' < barcodes.tsv > barcodes_trim.tsv to trim these suffixes.
.config file entries should be hard-tab-separated

IronThrone-GoT

Processes GoT amplicon data and generates a table of metrics

(created by the Landau lab at the New York Genome Center)

Getting Ready

Amplicon R1.fastq & R2.fastq files

R1.fastq (includes barcodes and UMIs sequences)
R2.fastq (includes primer, shared and targeted sequences)

IronThrone-GoT supports both uncompressed and GZipped formats.

Config file

To scan the expected sequences at specific positions in amplicon reads,
.config file should be prepared as follows:

(linear_example.config)

GCAGCAGAGAAACAAATGAAGGA 1 23   #(1st row)  primer.sequence start.pos end.pos
AAACAGGACGAGGAGCAGAGG 25 45    #(2nd row)  shared.sequence start.pos end.pos
AAGAAGACAAG 59 69              #(3rd row)      WT.sequence start.pos end.pos
TGAGGACAAAG 59 69              #(4th row)     MUT.sequence start.pos end.pos

(circ_example.config)

GAAGAAGACAAGAAACGCAAAGAGG 1 25        #(1st row)          primer.sequence start.pos end.pos
AGGAGGAGGCAGAGGACAA 26 44             #(2nd row)          shared.sequence start.pos end.pos
GGAGGATGATG 45 55                     #(3rd row)              WT.sequence start.pos end.pos
TTGTCGGAGGA 45 55                     #(4th row)             MUT.sequence start.pos end.pos
TGAGGATGAGGAGGATGAGG 66 85            #(5th row)   PCR#2Fw_in_WT.sequence start.pos end.pos
TGAGGATGAGGAGGATGAGG 71 90            #(6th row)  PCR#2Fw_in_MUT.sequence start.pos end.pos
CACTGAGAATGTAAGAACTACAAACAA 96 122    #(7th row)   PCR#2Rv_in_WT.sequence start.pos end.pos
CACTGAGAATGTAAGAACTACAAACAA 101 127   #(8th row)  PCR#2Fv_in_MUT.sequence start.pos end.pos

A config file should be prepared in tab-separated format. As an example, see testdata/linear_example.config or testdata/circ_example.config.

Whitelist barcode file

This file provides a list of barcodes that are put in preparation for both GoT and 10X libraries, so that false barcodes are distinguished.

Whitelisted barcode files provided by 10X are collected at barcodes10X/ directory.

Choose the whitelist barcode file considering which 10X chemistry version used:

10X v2.chemistry: 737K-august-2016.txt (default)
10X v3.chemistry: 3M-february-2018.txt

Download the 3M-february-2018.txt file here, and save at barcodes10X/ directory.

Also, the length of UMI depends on 10X chemistry version so set --umilen if needed.

10X v2.chemistry: 10 (default)
10X v3.chemistry: 12

Option parameters

Usage

IronThrone-GoT [options] --run linear --fastqR1 <in.R1.fastq> --fastqR2 <in.R2.fastq>
                --config <in.config> --sample <out.prefix> --outdir <out.path>

Required inputs

Option	Description
`-r/--run`	run module for processing `linear` or `circ` GoT (default: `linear`)
`-f1/--fastqR1`	input R1.FASTQ FILE (input file can be in GZip format with .gz extension)
`-f2/--fastqR2`	input R2.FASTQ FILE (input file can be in GZip format with .gz extension)
`-c/--config`	input CONFIG FILE (input file should be in tab-separated)

Additional options

Option	Description
`-m/--mmtch`	allowed mismatch ratio to grep the expected sequences (default: 0.2)
`-p/--postP`	cutoff for the posterior probability in the barcode replacement (default: 0.99)
`-d/--dupcut`	cutoff for the total number of duplication (default: 1)
`-b/--bclen`	length of barcode (default: 16)
`-u/--umilen`	length of UMI (default: 10)
`-w/--whitelist`	file for whitelisted barcodes (737K-august-2016.txt)
`-s/--sample`	prefix for outputs (myGoT)
`-o/--outdir`	path for outputs (./out)
`-l/--log`	logfile name (myGoT.log)
`-t/--thread`	number of threads to run in parallel (default: 4)
`-k/--keepouts`	if set to 1, keeping intermediate files (default: 0)
`-v/--verbose`	if set to 1, returning more logs (default: 0)
`-h/--help`	show option parameters

Output table

(prefix).summTable.txt is a table of aggregated (semi-colon separated) and averaged metrics sorted by unique cell barcode and UMI.

Column	Description
BC	barcode
whitelist	Y:barcode was perfectly matched with the whitelisted barcodes N:barcode was not perfectly matched with the whitelisted barcodes, and replaced with statistical significance
UMI	UMIs sharing the same barcode
num.WT.in.dups	counts of WT calls for each duplicate read (barcode + each UMI)
num.MUT.in.dups	counts of MUT calls for each duplicate read (barcode + each UMI)
num.amb.in.dups	counts of amb calls for each duplicate read (barcode + each UMI)
call.in.dups	calls for each duplicate read (barcode + each UMI)
avg.base_error.R2	average of base errors for entire R2.fastq reads sharing the same barcode
avg.base_error.primer	average of base errors for the primer region in R2.fastq reads sharing the same barcode
avg.base_error.shared	average of base errors for the shared region in R2.fastq reads sharing the same barcode
avg.base_error.WT	average of base errors for the WT region in R2.fastq reads sharing the same barcode
avg.base_error.MUT	average of base errors for the MUT region in R2.fastq reads sharing the same barcode
mismatch.primer	counts of mismatches in the primer region for each duplicate read (barcode + each UMI)
mismatch.shared	counts of mismatches in the shared region for each duplicate read (barcode + each UMI)
mismatch.WT	counts of mismatches in the WT region for each duplicate read (barcode + each UMI)
mismatch.MUT	counts of mismatches in the MUT region for each duplicate read (barcode + each UMI)
WT.calls	counts of total WT call
MUT.calls	counts of total MUT call
amb.calls	counts of total amb call

(Run an example)

Test with example .fastq and .config files at testdata/ directory.

(linear-GoT)

IronThrone-GoT --run linear --config linear_example.config \
               --fastqR1 linear_example.R1.fastq --fastqR2 linear_example.R2.fastq \
               --outdir game --log of --sample throne

(circ-GoT)

IronThrone-GoT --run circ --config circ_example.config \
               --fastqR1 circ_example.R1.fastq --fastqR2 circ_example.R2.fastq \
               --outdir game --log of --sample throne

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Parallelized_UMI_Collapse		Parallelized_UMI_Collapse
barcodes10X		barcodes10X
testdata		testdata
Combine_IronThrone_Parallel_Output.R		Combine_IronThrone_Parallel_Output.R
Genotyping_GEX_Comparison.Rmd		Genotyping_GEX_Comparison.Rmd
GoT_logo.png		GoT_logo.png
IronThrone-GoT		IronThrone-GoT
README.md		README.md
v1_README		v1_README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What's New in Version 2.1

Dependencies

Setup

New Option Parameters

Notes/Bugs to be Ironed out in Future Versions

IronThrone-GoT

Getting Ready

Amplicon R1.fastq & R2.fastq files

Config file

Whitelist barcode file

Option parameters

Usage

Required inputs

Additional options

Output table

(Run an example)

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

landau-lab/GoT-IronThrone

Folders and files

Latest commit

History

Repository files navigation

What's New in Version 2.1

Dependencies

Setup

New Option Parameters

Notes/Bugs to be Ironed out in Future Versions

IronThrone-GoT

Getting Ready

Amplicon R1.fastq & R2.fastq files

Config file

Whitelist barcode file

Option parameters

Usage

Required inputs

Additional options

Output table

(Run an example)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages