Skip to content

Commit

Permalink
🎨 specify headlines (in hierarchy)
Browse files Browse the repository at this point in the history
  • Loading branch information
Henry Webel committed May 29, 2024
1 parent 7cd7979 commit af5e573
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 15 deletions.
21 changes: 9 additions & 12 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
# <center>CLASTER
## <center> Modeling nascent RNA transcription from chromatin landscape and structure <center>

**Abstract**
## Abstract

_Different cell types and their associated functionalities can emerge from a single genomic sequence when certain regions are expressed while others remain silenced. The study of gene regulation and its potential malfunctioning in different cellular contexts is hence pivotal to understand both development and disease. We present the Chromatin Landscape and Structure to Expression Regressor (CLASTER), an epigenetic-based deep neural network that can integrate different data modalities describing the chromatin landscape and its 3D structure in their raw format. CLASTER effectively translates them into nascent transcription levels measured by EU-seq at a kilobasepair resolution. Our predictions reached a Pearson correlation with targets above r=0.86 at both bin and gene levels, without relying on DNA sequence nor explicitly extracted chromatin features. The model mostly used the information found within 10 kbp of the predicted locus to perform the predictions, even when a wide genomic region of 1 Mbp was available. Explicit modeling of long-range interactions using multi-headed attention and high-resolution chromatin contact maps had little impact on model performance, despite the model correctly identifying elements in these inputs influencing nascent transcription. The trained model then served as a platform to predict the transcriptional impact of simulated epigenetic silencing perturbations. Our results point towards a rather local, integrative and combinatorial paradigm of gene regulation, where changes in the chromatin environment surrounding a gene shape its context-specific transcription. We conclude that the predominant locality and limitations of current machine learning approaches might emerge as a genuine signature of genomic organization, having broad implications for future modeling approaches._

![Claster image](./images/Claster_image.png)
![Claster image](https://raw.githubusercontent.com/RasmussenLab/CLASTER/master/images/Claster_image.png)

**CLASTER overview:** CLASTER integrates the chromatin landscape (accessibility, promoter and enhancer activities and chromatin silencing) and structure (Micro-C) to predict nascent transcription levels measured by EU-seq.
**CLASTER overview** CLASTER integrates the chromatin landscape (accessibility, promoter and enhancer activities and chromatin silencing) and structure (Micro-C) to predict nascent transcription levels measured by EU-seq.

## In this repository

This repository contains the files and scripts required to reproduce the results of the paper and a short tutorial. The repository consists of the following folders:

```configurations```:
### `configurations`
- Configuration files (.yaml) required to build different flavours of CLASTER.

```images```:
### `images`
- Overview of CLASTER's architecture.

```inputs```:
### `inputs`

The folder contains the test set inputs for both data modalities, i.e. samples exploring regions of 1 Mbp centered at the TSS of protein coding genes found in chr4 (in mice). They will be used in the tutorial to exemplify how can we train and validate CLASTER.

```scripts```:
### `scripts`

- [`0_Tutorial.ipynb`](scripts/0_Tutorial.ipynb): The notebook provides a rapid overview of the most important steps in CLASTER's pipeline, including training and validating the network using the EIR framework.
- [`0_Tutorial.ipynb`](https://github.com/RasmussenLab/CLASTER/blob/master/scripts/0_Tutorial.ipynb): The notebook provides a rapid overview of the most important steps in CLASTER's pipeline, including training and validating the network using the EIR framework.
- `1_Data_obtention.ipynb`: This notebook guides the user through the data obtention process, including:
- Data download from publicly available repositories:
- Inputs: Chromatin landscape (ATAC-seq, H3K4me3, H3K27ac and H3K27me3 in mESCs) and structure (Micro-C maps in mESCs)
Expand All @@ -45,9 +45,6 @@ These were used to benchmark CLASTER. It includes:
- Code to fine-tune Hyena-DNA's backbone and the added head together.
- `3_Data_analysis.ipynb`: The notebook contains the functions used to perform the data analysis and create the figures included in the manuscript.

```targets```:
### `targets`

The folder contains the target EU-seq profiles matching the input (test) samples.



11 changes: 8 additions & 3 deletions scripts/0_Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# <center> TUTORIAL <center>\n",
"# <center> _Training and validating CLASTER_ <center>\n",
"# TUTORIAL\n",
"**_Training and validating CLASTER_**\n",
"\n",
"*Authors:* \n",
"\n",
Expand Down Expand Up @@ -100,7 +100,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Inputs and outputs\n",
"## 1. Inputs and outputs\n",
"The obtention of inputs and outputs from publicly available sources is detailed in notebook ```1_Data_obtention.ipynb```. In this tutorial we will however provide you with the already created inputs and targets for all samples in the test set.\n",
"\n",
"Input samples and their matching targets are named after the ENSEMBL ID code for the protein coding gene located at the center of the region of interest. We kept the orientation of the genes, and hence the EU-seq signal can go both towards the right or towards the left. \n",
Expand Down Expand Up @@ -1612,6 +1612,11 @@
"\n",
"If in doubt, feel free to reach out to us!\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit af5e573

Please sign in to comment.