Predicting MoA (Mechanisms of Action) Annotations

Goals

To predict MoA annotations based on drugs' biological response.

Background

From Kaggle:
"Today, with the advent of more powerful technologies, drug discovery has changed from the serendipitous approaches of the past to a more targeted model based on an understanding of the underlying biological mechanism of a disease. In this new framework, scientists seek to identify a protein target associated with a disease and develop a molecule that can modulate that protein target. As a shorthand to describe the biological activity of a given molecule, scientists assign a label referred to as mechanism-of-action or MoA for short.
One approach is to treat a sample of human cells with the drug and then analyze the cellular responses with algorithms that search for similarity to known patterns in large genomic databases, such as libraries of gene expression or cell viability patterns of drugs with known MoAs."

Acknowledgements

Data Dictionary[1,2]

sig_id: A Connectivity Map unique identification number assigned to each signature generated from L1000 assay. A signature is the entire vector of differential gene expression values, one per gene. The signature provides a representation of the biological response.
cp_type: indicates how the samples are treated: either with small-molecule compounds(trt_cp) or with a control perturbation (ctrl_vehicle). Control pertubations have no MoAs.
cp_time: indicates treatment time: 24, 48, and 72 hours.
cp_does: indicates treatment does: high or low.
g-#: signify gene expression data. There are 772 gene features for each sig_id. Every expression data describes the degree to which a gene's expression is increased or decreased in response to a chemical perturbagen.
c-#: signify cell viability data. There are 100 cell lines for each sig_id, which are assessed collectively by PRISM (Profiling Relative Inhibition Simultaneously in Mixture).

Key Concepts Explained

L1000 assay: a high-throughput gene expression assay that measures the mRNA transcript abundance of 978 "landmark" genes from human cells.

Data Processing Workflow for Sig_id[2,3,4]

Collect raw data: the bead identities (two genes per bead) and the fluorescent intensity for every bead.
Peak deconvolution: assign the correct expression level to each of the two genes whose transcripts bind to beads of the same color.
Construct a calibration curve based on the control transcripts to rescale the experimental data.
Use QNORM to standardize the shape of th expression profile distribution on each plate.
Use quantile normalization to standardize the data across plates.
- It yields the expression values of the 872 genes, of which 772 belong to g-0 to g-772. The rest of the 100 is for c-0 and c-99.
The expression values, combined with the treatment type, time and dose are assigned to one unique sig_id.

Initial Thoughts & Hypotheses

Thoughts

The cellualr responses are measured by two methods:
- Gene expression (G) by L1000 assay.
- Cell viability (C) by PRISM.
- Both G and C are measured by the mRNA abundance.
- The data is collected and go through the processing workflow before analysis.
About controls
- The G and C in controls are sigfinicantly different from that of the treated samples.
- Because the doublling times of each cell line are different, so significant difference are among C in controls. (Clustering?)
- Because the controls have little impact on the cells, so the dose and time don’t have a significant impact on G and C in controls
About gene expression and cell viability
- G and C are related in someway
About targets
- The drugs are labeled with the protein they are targeting and how they influence the protein.
- The top 3 drug types are inhibitor, antagonist, and agonist.

Hypotheses

Gene expressions data in controls are the same over time

H_0: The average gene expressions in control with dose 2 are the same at 24 vs. 48 hours.
H_a: The average gene expressions in control with dose 2 are significantly different at 24 vs. 48 hours.

H_0: The average gene expressions in control with dose 2 are the same at 48 vs. 72 hours.
H_a: The average gene expressions in control with dose 2 are significantly different at 48 vs. 72 hours.

Gene expressions in one observation are normally distributed.

H_0: The gene expression in one observation are normally distributed.
H_a: The gene expression in one observation are not normally distributed.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.gitignore		.gitignore
README.md		README.md
classifier_chain_v1.ipynb		classifier_chain_v1.ipynb
classifier_chain_v2.ipynb		classifier_chain_v2.ipynb
classifier_chain_v3.ipynb		classifier_chain_v3.ipynb
classifier_chain_v4.ipynb		classifier_chain_v4.ipynb
classifier_chain_v5.ipynb		classifier_chain_v5.ipynb
classifier_chain_v6.ipynb		classifier_chain_v6.ipynb
classifier_chain_v7.ipynb		classifier_chain_v7.ipynb
classifier_chain_v8.ipynb		classifier_chain_v8.ipynb
classifier_chain_v9.ipynb		classifier_chain_v9.ipynb
classify_cyclooxygenase_inhibitor_by_pca_features.ipynb		classify_cyclooxygenase_inhibitor_by_pca_features.ipynb
classify_nfkb_inhibitor_by_pca_features.ipynb		classify_nfkb_inhibitor_by_pca_features.ipynb
explore_moa.ipynb		explore_moa.ipynb
explore_moa2.ipynb		explore_moa2.ipynb
explore_moa3.ipynb		explore_moa3.ipynb
feature_engineering.ipynb		feature_engineering.ipynb
feature_engineering2.ipynb		feature_engineering2.ipynb
moa_prediction_overview.ipynb		moa_prediction_overview.ipynb
model_moa.ipynb		model_moa.ipynb
model_moa2.ipynb		model_moa2.ipynb
model_moa3.ipynb		model_moa3.ipynb
model_study.ipynb		model_study.ipynb
pca_to_visualize_cyclo_and_nfkb.ipynb		pca_to_visualize_cyclo_and_nfkb.ipynb
pca_to_visualize_cyclooxygenase_inhibitor.ipynb		pca_to_visualize_cyclooxygenase_inhibitor.ipynb
pca_to_visualize_nfkb.ipynb		pca_to_visualize_nfkb.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting MoA (Mechanisms of Action) Annotations

Goals

Background

Acknowledgements

Data Dictionary[1,2]

Key Concepts Explained

Data Processing Workflow for Sig_id[2,3,4]

Initial Thoughts & Hypotheses

Thoughts

Hypotheses

Gene expressions data in controls are the same over time

Gene expressions in one observation are normally distributed.

Reference

About

Uh oh!

Releases

Packages

Languages

Yongliang-Shi/Mechanism-of-Action-Prediction

Folders and files

Latest commit

History

Repository files navigation

Predicting MoA (Mechanisms of Action) Annotations

Goals

Background

Acknowledgements

Data Dictionary[1,2]

Key Concepts Explained

Data Processing Workflow for Sig_id[2,3,4]

Initial Thoughts & Hypotheses

Thoughts

Hypotheses

Gene expressions data in controls are the same over time

Gene expressions in one observation are normally distributed.

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages