Home

Welcome to the Roth Lab Wiki!

General help

High-performance-computing (HPC) clusters

At the Roth lab, we currently have access to three HPC clusters. Two at our University of Toronto location, and one at the University of Pittsburgh.

galen, located at the Lunenfeld-Tanenbaum Research Institute (LTRI, Mt Sinai Hospital)
- Galen uses the "Slurm" scheduling system
- After our move to Pittsburgh, we only have a grace period until approximately September 2024 to use this resource.
dc, located at the Donnelly Centre
- dc uses the "PBS" scheduling system
- After our move to Pittsburgh, we only have a grace period until approximately September 2024 to use this resource.
cluster.csb.pitt.edu, located at the Department of Computational Systems Biology, University of Pittsburgh
- cluster uses the "Slurm" scheduling system as well.

Despite the clusters using different scheduling systems, we have a custom abstraction layer called clusterutil that offers a unified interface.

Web servers

We have two web servers at the LTRI in Toronto: dalai out web-dev server and yantra, the web production server.

Help articles:

Making data downloadable

Sequencing data storage and handling

We have two servers for the purpose of storing and demultiplexing our sequencing data.

rothseq1, located at the LTRI (Mt Sinai)
rothsequt, located at the Donnelly Centre These servers do not have access to the HPC clusters. However rothseq1 shares user home directories with galen, making it easy to share data between the two.

Help articles:

Demultiplexing Illumina data

TileSeq

TileSeq is a method for analyzing variant effect libraries. Instead of using barcodes to establish clone identity, TileSeq reads out the genotypes of the clones directly, albeit only within short "tiles" of the mutagenized target sequence. These tiles are designed to be ~150bp in length, which is just short enough to be covered completely by a standard Illumina read. This way, information from the forward (R1) and reverse (R2) reads can be used to distinguish real variants from sequencing errors. Tileseq data is analyzed in two steps: First, using the tileseq_mut pipeline, reads are aligned to the template and variants are called and counted. In the second step, the tileseqMave pipeline is used to calculate enrichment of variants between conditions, an error model and filters are applied, and QC outputs are generated.

BarSeq

BarSeq is another method for analyzing variant effect libraries. Here clones are carrying (mostly) unique barcodes. The association between barcode and genotype is established using long-read sequencing (PacBio), which is analyzed via the Pacybara pipeline. After the selection assay, the barcodes in the condition-specific libraries are sequenced via Illumina short-reads and analyzed via the bartender wrapper Pacybartender (also found in the pacybara repo).

[Barseq] Analyzing Barseq data

MaveVis and other visualizations

MaveVis is a visualization tool for variant effect maps. It displays maps as 'genophenograms', i.e. heatmaps of variant effect scores on a grid of all amino acid positions vs all possible amino acid changes at the given position. It also has the option of displaying additional information tracks, such as sequence conservation, protein domains, secondary structure, accessible surface area and interaction interfaces. A web interface for visualizing datasets from MaveDB via MaveVis is available here.

Pathogenicity LLRs

Translating scores to pathogenicity LLRs

MaveQuest

MaveQuest is an online database for querying literature-curated functional assays, phenotypes and clinical interests of human genes for Multiplex Assays of Variant Effect (MAVE) studies.

Start at the main Wiki page: [MaveQuest] Main Page

Other Wiki pages:

MaveRegistry

MaveRegistry is a collaborative resource for sharing progress on Multiplexed Assays of Variant Effect (MAVE).

Start at the main Wiki page: [MaveRegistry] Main Page

Other Wiki pages:

UK Biobank Projects

UK Biobank is a large long-term biobank study in the United Kingdom which is investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease.

This section contains projects in the lab using UK Biobank data.

Start at the main Wiki page: [UKB Projects] Main Page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly