Skip to content

Commit f4794f1

Browse files
committed
Enable training on continuous values and add data documentation
1 parent 765a606 commit f4794f1

29 files changed

+895
-765
lines changed

README.rst

+31-30
Original file line numberDiff line numberDiff line change
@@ -22,35 +22,21 @@ DeepCpG: Deep neural networks for predicting single-cell DNA methylation
2222
.. |Tweet| image:: https://img.shields.io/twitter/url/http/shields.io.svg?style=social
2323
:target: https://twitter.com/intent/tweet?text=Checkout+%23DeepCpG%3A+%23DeepLearning+for+predicting+DNA+methylation%2C+%40cangermueller
2424

25-
DeepCpG is a deep neural network for predicting the methylation state of CpG dinucleotides in multiple cells. It allows to accurately impute incomplete DNA methylation profiles, to discover predictive sequence motifs, and to quantify the effect of sequence mutations. (`Angermueller et al, 2017 <http://biorxiv.org/content/early/2017/02/01/055715>`_).
25+
DeepCpG [1]_ is a deep neural network for predicting the methylation state of CpG dinucleotides in multiple cells. It allows to accurately impute incomplete DNA methylation profiles, to discover predictive sequence motifs, and to quantify the effect of sequence mutations. (`Angermueller et al, 2017 <http://biorxiv.org/content/early/2017/02/01/055715>`_).
2626

2727
**Please help to improve DeepCpG**, by reporting bugs, typos in notebooks and documentation, or any ideas on how to make things better. You can submit an `issue <https://github.com/cangermueller/deepcpg/issues>`_ or send me an `email <mailto:[email protected]>`_.
2828

29-
.. image:: docs/fig1.png
30-
:scale: 50%
29+
.. figure:: fig1.png
30+
:width: 640 px
31+
:align: left
3132
:alt: DeepCpG model architecture and applications
32-
:align: center
33-
34-
**DeepCpG model training and applications.** (a) Sparse
35-
single-cell CpG profiles, for example as obtained from scBS-seq or
36-
scRRBS-seq. Methylated CpG sites are denoted by ones, un-methylated CpG
37-
sites by zeros, and question marks denote CpG sites with unknown methylation
38-
state (missing data). (b) Modular architecture of DeepCpG. The DNA module
39-
consists of two convolutional and pooling layers to identify predictive motifs
40-
from the local sequence context, and one fully connected layer to model motif
41-
interactions. The CpG module scans the CpG neighbourhood of multiple cells
42-
(rows in b), using a bidirectional gated recurrent network (GRU),
43-
yielding compressed features in a vector of constant size. The fusion module
44-
learns interactions between higher-level features derived from the DNA- and
45-
CpG module to predict methylation states in all cells. (c,d) The trained
46-
DeepCpG model can be used for different downstream analyses, including
47-
genome-wide imputation of missing CpG sites (c) and the discovery of DNA
48-
sequence motifs that are associated with DNA methylation levels or
49-
cell-to-cell variability (d).
5033

51-
.. code::
34+
**DeepCpG model architecture and applications.**
35+
36+
\(a\) Sparse single-cell CpG profiles as obtained from scBS-seq or scRRBS-seq. Methylated CpG sites are denoted by ones, unmethylated CpG sites by zeros, and question marks denote CpG sites with unknown methylation state (missing data). (b) DeepCpG model architecture. The DNA model consists of two convolutional and pooling layers to identify predictive motifs from the local sequence context, and one fully connected layer to model motif interactions. The CpG model scans the CpG neighborhood of multiple cells (rows in b), using a bidirectional gated recurrent network (GRU), yielding compressed features in a vector of constant size. The Joint model learns interactions between higher-level features derived from the DNA- and CpG model to predict methylation states in all cells. (c, d) The trained DeepCpG model can be used for different downstream analyses, including genome-wide imputation of missing CpG sites (c) and the discovery of DNA sequence motifs that are associated with DNA methylation levels or cell-to-cell variability (d).
37+
5238

53-
Angermueller, Christof, Heather Lee, Wolf Reik, and Oliver Stegle. Accurate Prediction of Single-Cell DNA Methylation States Using Deep Learning. http://biorxiv.org/content/early/2017/02/01/055715 bioRxiv, February 1, 2017, 55715. doi:10.1101/055715.
39+
.. [1] Angermueller, Christof, Heather Lee, Wolf Reik, and Oliver Stegle. Accurate Prediction of Single-Cell DNA Methylation States Using Deep Learning. http://biorxiv.org/content/early/2017/02/01/055715 bioRxiv, February 1, 2017, 55715. doi:10.1101/055715.
5440
5541
5642
@@ -63,13 +49,16 @@ Table of contents
6349
* `Model Zoo`_
6450
* `FAQ`_
6551
* `Content`_
52+
* `Changelog`_
6653
* `Contact`_
6754

6855

6956
News
7057
====
7158

72-
* **170305**: New documentation about DeepCpG module architectures `released <http://deepcpg.readthedocs.io/modules.html>`_!
59+
* **170404**: New documentation about creating and analyzing DeepCpG data `released <http://deepcpg.readthedocs.io/data.html>`_!
60+
* **170404**: Training on continuous data, e.g. from bulk experiments, now `supported <http://deepcpg.readthedocs.io/data.html>`_!
61+
* **170305**: New documentation about DeepCpG model architectures `released <http://deepcpg.readthedocs.io/models.html>`_!
7362
* **170302**: New guide on DeepCpG model training `released <http://deepcpg.readthedocs.io/train.html>`_!
7463
* **170228**: New example shell scripts for building a DeepCpG pipeline `released <./examples/README.md>`_!
7564

@@ -146,7 +135,7 @@ Example:
146135
--nb_epoch 30
147136
--out_dir ./model
148137
149-
This command uses chromosomes 1-3 for training and 10-13 for validation. ``---dna_model``, ``--cpg_model``, and ``--joint_model`` specify the architecture of the CpG, DNA, and joint module, respectively (see manuscript for details). Training will stop after at most 30 epochs and model files will be stored in ``./model``.
138+
This command uses chromosomes 1-3 for training and 10-13 for validation. ``---dna_model``, ``--cpg_model``, and ``--joint_model`` specify the architecture of the CpG, DNA, and Joint model, respectively (see manuscript for details). Training will stop after at most 30 epochs and model files will be stored in ``./model``.
150139

151140

152141
4. Use ``dcpg_eval.py`` to impute methylation profiles and evaluate model performances.
@@ -157,9 +146,9 @@ This command uses chromosomes 1-3 for training and 10-13 for validation. ``---dn
157146
./data/*.h5
158147
--model_files ./model/model.json ./model/model_weights_val.h5
159148
--out_data ./eval/data.h5
160-
--out_report ./eval/report.csv
149+
--out_report ./eval/report.tsv
161150
162-
This command predicts missing methylation states on all chromosomes and evaluates prediction performances using known methylation states. Predicted states will be stored in ``./eval/data.h5`` and performance metrics in ``./eval/report.csv``.
151+
This command predicts missing methylation states on all chromosomes and evaluates prediction performances using known methylation states. Predicted states will be stored in ``./eval/data.h5`` and performance metrics in ``./eval/report.tsv``.
163152

164153

165154
5. Export imputed methylation profiles to HDF5 or bedGraph files:
@@ -182,7 +171,7 @@ You can find example notebooks and scripts on how to use DeepCpG `here <examples
182171
Documentation
183172
=============
184173

185-
The `DeepCpG documentation <http://deepcpg.readthedocs.io>`_ provides information on training, hyper-parameter selection, and module architectures.
174+
The `DeepCpG documentation <http://deepcpg.readthedocs.io>`_ provides information on training, hyper-parameter selection, and model architectures.
186175

187176

188177
Model Zoo
@@ -195,7 +184,7 @@ FAQ
195184
===
196185

197186
**Why am I getting warnings 'No CpG site at position X!' when using `dcpg_data.py`?**
198-
This means that some sites in ``--cpg_profile`` files are not CpG sites, e.g. there is no CG dinucleotide at the given position in the DNA sequence. Make sure that ``--dna_files`` point to the correct genome and CpG sites are correctly aligned. Since DeepCpG currently does not support allele-specific methylation, data from different alleles must be merged (recommended) or only one allele be used.
187+
This means that some sites in ``--cpg_profile`` files are not CpG sites, i.e. there is no CG dinucleotide at the given position in the DNA sequence. Make sure that ``--dna_files`` point to the correct genome and CpG sites are correctly aligned. Since DeepCpG currently does not support allele-specific methylation, data from different alleles must be merged (recommended) or only one allele be used.
199188

200189
**How can I train models on one or more GPUs?**
201190
DeepCpG use the `Keras <https://keras.io>`_ deep learning library, which supports `Theano <http://deeplearning.net/software/theano/>`_ or `Tensorflow <https://www.tensorflow.org/>`_ as backend. If you are using Tensorflow, DeepCpG will automatically run on all available GPUs. If you are using Theano, you have to set the flag `device=GPU` in the `THEANO_FLAGS` environment variable.
@@ -208,7 +197,6 @@ You can find more information about Keras backends `here <https://keras.io/backe
208197

209198

210199

211-
212200
Content
213201
=======
214202
* ``/deepcpg/``: Source code
@@ -218,6 +206,19 @@ Content
218206
* ``/tests``: Test files
219207

220208

209+
Changelog
210+
=========
211+
212+
1.0.3
213+
-----
214+
Extends ``dcpg_data.py``, updates documentation, and fixes minor bugs.
215+
* Extends `dcpg_data.py` to support bedGraph and TSV input files.
216+
* Enables training on continuous methylation states.
217+
* Adds new documentation about creating and analyzing Data.
218+
* Updates API documentation.
219+
220+
221+
221222
Contact
222223
=======
223224
* Christof Angermueller

deepcpg/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = '1.0.2'
1+
__version__ = '1.0.3'

deepcpg/evaluation.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515

1616
def cor(y, z):
17-
"""Computes Pearon's correlation."""
17+
"""Compute Pearson correlation coefficient."""
1818
return np.corrcoef(y, z)[0, 1]
1919

2020

docs/fig1.png

-422 KB
Binary file not shown.

docs/source/conf.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,9 @@
6868
# built documents.
6969
#
7070
# The short X.Y version.
71-
version = '1.0.2'
71+
version = '1.0.3'
7272
# The full version, including alpha/beta/rc tags.
73-
release = '1.0.2'
73+
release = '1.0.3'
7474

7575
# The language for content autogenerated by Sphinx. Refer to documentation
7676
# for a list of supported languages.

0 commit comments

Comments
 (0)