To predict MoA annotations based on drugs' biological response.
From Kaggle:
"Today, with the advent of more powerful technologies, drug discovery has changed from the serendipitous approaches of the past to a more targeted model based on an understanding of the underlying biological mechanism of a disease. In this new framework, scientists seek to identify a protein target associated with a disease and develop a molecule that can modulate that protein target. As a shorthand to describe the biological activity of a given molecule, scientists assign a label referred to as mechanism-of-action or MoA for short.
One approach is to treat a sample of human cells with the drug and then analyze the cellular responses with algorithms that search for similarity to known patterns in large genomic databases, such as libraries of gene expression or cell viability patterns of drugs with known MoAs."
sig_id: A Connectivity Map unique identification number assigned to each signature generated from L1000 assay. A signature is the entire vector of differential gene expression values, one per gene. The signature provides a representation of the biological response.
cp_type: indicates how the samples are treated: either with small-molecule compounds(trt_cp) or with a control perturbation (ctrl_vehicle). Control pertubations have no MoAs.
cp_time: indicates treatment time: 24, 48, and 72 hours.
cp_does: indicates treatment does: high or low.
g-#: signify gene expression data. There are 772 gene features for each sig_id. Every expression data describes the degree to which a gene's expression is increased or decreased in response to a chemical perturbagen.
c-#: signify cell viability data. There are 100 cell lines for each sig_id, which are assessed collectively by PRISM (Profiling Relative Inhibition Simultaneously in Mixture).
L1000 assay: a high-throughput gene expression assay that measures the mRNA transcript abundance of 978 "landmark" genes from human cells.
- Collect raw data: the bead identities (two genes per bead) and the fluorescent intensity for every bead.
- Peak deconvolution: assign the correct expression level to each of the two genes whose transcripts bind to beads of the same color.
- Construct a calibration curve based on the control transcripts to rescale the experimental data.
- Use QNORM to standardize the shape of th expression profile distribution on each plate.
- Use quantile normalization to standardize the data across plates.
- It yields the expression values of the 872 genes, of which 772 belong to g-0 to g-772. The rest of the 100 is for c-0 and c-99.
- It yields the expression values of the 872 genes, of which 772 belong to g-0 to g-772. The rest of the 100 is for c-0 and c-99.
- The expression values, combined with the treatment type, time and dose are assigned to one unique sig_id.
- The cellualr responses are measured by two methods:
- Gene expression (G) by L1000 assay.
- Cell viability (C) by PRISM.
- Both G and C are measured by the mRNA abundance.
- The data is collected and go through the processing workflow before analysis.
- About controls
- The G and C in controls are sigfinicantly different from that of the treated samples.
- Because the doublling times of each cell line are different, so significant difference are among C in controls. (Clustering?)
- Because the controls have little impact on the cells, so the dose and time don’t have a significant impact on G and C in controls
- About gene expression and cell viability
- G and C are related in someway
- About targets
- The drugs are labeled with the protein they are targeting and how they influence the protein.
- The top 3 drug types are inhibitor, antagonist, and agonist.
H_0: The average gene expressions in control with dose 2 are the same at 24 vs. 48 hours.
H_a: The average gene expressions in control with dose 2 are significantly different at 24 vs. 48 hours.
H_0: The average gene expressions in control with dose 2 are the same at 48 vs. 72 hours.
H_a: The average gene expressions in control with dose 2 are significantly different at 48 vs. 72 hours.
H_0: The gene expression in one observation are normally distributed.
H_a: The gene expression in one observation are not normally distributed.
- Mechanisms of Action (MoA) Prediction Description
- Connectopedia
- Corsello et al. “Discovering the anticancer potential of non-oncology drugs by systematic viability profiling,” Nature Cancer, 2020
- Subramanian et al. “A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles,” Cell, 2017