Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 0 additions & 42 deletions deeptools4.0.0_changes.md

This file was deleted.

115 changes: 22 additions & 93 deletions docs/content/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
Changes in deepTools4.0
=======================

Changes:
Plotting
--------

* Plots in general:
- Removed Plotly for all graphics.
Expand All @@ -25,97 +26,25 @@ Changes:
- Scree plot is showing lines for individual and accumulated variation.
- Points are by default rainbow colored circles.

Changes in deepTools2.0
========================
Core
----

.. contents::
:local:

Major changes
-------------

.. note:: The major changes encompass features for **increased efficiency**, **new sequencing data types**, and **additional plots**, particularly for QC.

Moreover, deepTools modules can now be used by other python programs.
The :ref:`api` is part of the new documentation.

Accommodating additional data types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* correlation and comparisons can now be calculated for **bigWig files** (in addition to BAM files) using ``multiBigwigSummary`` and ``bigwigCompare``

* **RNA-seq:** split-reads are now natively supported
* bamCoverage
- --no_collapse flag to not merge bins with equal coverage values together.

* **MNase-seq:** using the new option ``--MNase`` in ``bamCoverage``, one can now compute read coverage only taking the 2 central base pairs of each mapped fragment into account.

Structural updates
^^^^^^^^^^^^^^^^^^^

* All modules have comprehensive and automatic tests that evaluate proper functioning after any modification of the code.
* Virtualization for stability: we now provide a ``docker`` image and enable the easy deployment of deepTools via the Galaxy ``toolshed``.
* Our documentation is now version-aware thanks to readthedocs and ``sphinx``.
* The API is public and documented.

Renamed tools
^^^^^^^^^^^^^

* **heatmapper** to :doc:`tools/plotHeatmap`
* **profiler** to :doc:`tools/plotProfile`
* **bamCorrelate** to :doc:`tools/multiBamSummary`
* **bigwigCorrelate** to :doc:`tools/multiBigwigSummary`
* **bamFingerprint** to :doc:`tools/plotFingerprint`


Increased efficiency
^^^^^^^^^^^^^^^^^^^^

* We dramatically improved the **speed** of bigwig related tools (:doc:`tools/multiBigwigSummary` and ``computeMatrix``) by using the new `pyBigWig module <https://github.com/dpryan79/pyBigWig>`_.

* It is now possible to generate one composite heatmap and/or meta-gene image based on **multiple bigwig files** in one go (see :doc:`tools/computeMatrix`, :doc:`tools/plotHeatmap`, and :doc:`tools/plotProfile` for examples)

* ``computeMatrix`` now also accepts multiple input BED files. Each is treated as a group within a sample and is plotted independently.

* We added **additional filtering options for handling BAM files**, decreasing the need for prior filtering using tools other than deepTools: The ``--samFlagInclude`` and ``--samFlagExclude`` parameters can, for example, be used to only include (or exclude) forward reads in an analysis.

* We separated the generation of read count tables from the calculation of pairwise correlations that was previously handled by ``bamCorrelate``. Now, read counts are calculated first using ``multiBamSummary`` or ``multiBigWigCoverage`` and the resulting output file can be used for calculating and plotting pairwise correlations using ``plotCorrelation`` or for doing a principal component analysis using ``plotPCA``.

New features and tools
^^^^^^^^^^^^^^^^^^^^^^

* Correlation analyses are no longer limited to BAM files -- bigwig files are possible, too! (see :doc:`tools/multiBigwigSummary`)

* Correlation coefficients can now be computed even if the data contains NaNs.

* Added **new quality control** tools:
- use :doc:`tools/plotCoverage` to plot the coverage over base pairs
- use :doc:`tools/plotPCA` for principal component analysis
- :doc:`tools/bamPEFragmentSize` can be used to calculate the average fragment size for paired-end read data

* Added the possibility for **hierarchical clustering**, besides *k*-means to ``plotProfile`` and ``plotHeatmap``

* ``plotProfile`` has many more options to make compelling summary plots


Minor changes
-------------

Changed parameters names and settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* ``computeMatrix`` can now read files with DOS newline characters.
* ``--missingDataAsZero`` was renamed to ``--skipNonCoveredRegions`` for clarity in ``bamCoverage`` and ``bamCompare``.
* Read extension was made optional and we removed the need to specify a default fragment length for most of the tools: ``--fragmentLength`` was thus replaced by the new optional parameter ``--extendReads``.
* Added option ``--skipChromosomes`` to ``multiBigwigSummary``, which can be used to, for example, skip all 'random' chromosomes.
* Added the option for adding titles to QC plots.

Bug fixes
^^^^^^^^^

* Resolved an error introduced by ``numpy version 1.10`` in ``computeMatrix``.
* Improved plotting features for ``plotProfile`` when using as plot type: 'overlapped_lines' and 'heatmap'
* Fixed problem with BED intervals in ``multiBigwigSummary`` and ``multiBamSummary`` that returned wrongly labeled raw counts.
* ``multiBigwigSummary`` now also considers chromosomes as identical when the names between samples differ by 'chr' prefix, e.g. chr1 vs. 1.
* Fixed problem with wrongly labeled proper read pairs in a BAM file. We now have additional checks to determine if a read pair is a proper pair: the reads must face each other and are not allowed to be farther apart than 4x the mean fragment length.
* For ``bamCoverage`` and ``bamCompare``, the behavior of ``scaleFactor`` was updated such that now, if given in combination with the normalization options (``--normalizeTo1x`` or ``--normalizeUsingRPKM``), the given scaling factor will be multiplied with the factor computed by the respective normalization method.


* computeMatrix
- --sortRegions 'no' option no longer exists
- Sorting ascend / descend no longer has subsorting by position.
- --quiet / -q option no longer exists.
- bed files in computeMatrix no longer support '#' to define groups.
- 'chromosome matching' i.e. chr1 <-> 1, chrMT <-> MT is no longer performed.

* normalization
- Exactscaling is no longer an option, it's always performed.

* alignmentSieve
- options label, smartLabels, genomeChunkLength are removed.
- ignoreDuplicates is removed, and (if wanted) should be set by the SamFlagExclude setting.

Testing
-------
48 changes: 19 additions & 29 deletions pydeeptools/deeptools/bamCompare2.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,11 @@ def getOptionalArgs():
'This is determined BEFORE any applicable pseudocount '
'is added.',
action='store_true')
optional.add_argument('--no_collapse',
help='By default adjacent bins that have the same value are collapsed. This reduces the size of the output file (drastically).'
'If you like to opt out of this behavior, you can set this flag.',
default=True,
action='store_false')

return parser

Expand Down Expand Up @@ -252,6 +257,15 @@ def main(args=None):
args.samFlagExclude = 0
if not args.region:
args.region = 'None'
if not args.extendReads:
args.extendReads = False
args.extendReadsLen = 0
elif isinstance(args.extendReads, bool):
args.extendReadsLen = 0
args.extendReads = True
elif isinstance(args.extendReads, int):
args.extendReadsLen = args.extendReads
args.extendReads = True
if not args.blackListFileName:
args.blackListFileName = 'None'
else:
Expand All @@ -272,8 +286,10 @@ def main(args=None):
args.scaleFactorsMethod, # scaling method
args.operation,
args.pseudocount,
args.extendReads,
args.extendReadsLen,
args.centerReads,
args.blackListFileName,
args.ignoreDuplicates,
args.minMappingQuality,
args.samFlagInclude,
args.samFlagExclude,
Expand All @@ -283,32 +299,6 @@ def main(args=None):
args.ignoreForNormalization,
args.binSize, # bin size
args.region, # regions
True # verbose
args.verbose, # verbose
args.no_collapse, # collapse the ofile or not.
)

# #if args.normalizeUsing == "RPGC":
# # sys.exit("RPGC normalization (--normalizeUsing RPGC) is not supported with bamCompare!")
# #if args.normalizeUsing == 'None':
# args.normalizeUsing = None # For the sake of sanity
# if args.scaleFactorsMethod != 'None' and args.normalizeUsing:
# sys.exit("`--normalizeUsing {}` is only valid if you also use `--scaleFactorsMethod None`! To prevent erroneous output, I will quit now.\n".format(args.normalizeUsing))

# # Get mapping statistics
# bam1, mapped1, unmapped1, stats1 = bamHandler.openBam(args.bamfile1, returnStats=True, nThreads=args.numberOfProcessors)
# bam1.close()
# bam2, mapped2, unmapped2, stats2 = bamHandler.openBam(args.bamfile2, returnStats=True, nThreads=args.numberOfProcessors)
# bam2.close()

# scale_factors = get_scale_factors(args, [stats1, stats2], [mapped1, mapped2])
# if scale_factors is None:
# # check whether one of the depth norm methods are selected
# if args.normalizeUsing is not None:
# args.scaleFactor = 1.0
# # if a normalization is required then compute the scale factors
# args.bam = args.bamfile1
# scale_factor_bam1 = get_scale_factor(args, stats1)
# args.bam = args.bamfile2
# scale_factor_bam2 = get_scale_factor(args, stats2)
# scale_factors = [scale_factor_bam1, scale_factor_bam2]
# else:
# scale_factors = [1, 1]
24 changes: 20 additions & 4 deletions pydeeptools/deeptools/bamCoverage2.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,12 @@ def get_optional_args():
choices=['forward', 'reverse'],
default=None)

optional.add_argument('--no_collapse',
help='By default adjacent bins that have the same value are collapsed. This reduces the size of the output file (drastically).'
'If you like to opt out of this behavior, you can set this flag.',
default=True,
action='store_false')

return parser

def scaleFactor(string):
Expand Down Expand Up @@ -136,9 +142,18 @@ def main(args=None):
if not args.normalizeUsing:
args.normalizeUsing = 'None'
if not args.Offset:
args.Offset = [1, -1]
args.Offset = [0, 0]
elif len(args.Offset) == 1:
args.Offset = [args.Offset[0], 0]
if not args.extendReads:
args.extendReads = 0
args.extendReads = False
args.extendReadsLen = 0
elif isinstance(args.extendReads, bool):
args.extendReadsLen = 0
args.extendReads = True
elif isinstance(args.extendReads, int):
args.extendReadsLen = args.extendReads
args.extendReads = True
if not args.filterRNAstrand:
args.filterRNAstrand = 'None'
if not args.blackListFileName:
Expand Down Expand Up @@ -170,6 +185,7 @@ def main(args=None):
args.MNase,
args.Offset,
args.extendReads,
args.extendReadsLen,
args.centerReads,
args.filterRNAstrand,
args.blackListFileName,
Expand All @@ -178,7 +194,6 @@ def main(args=None):
args.smoothLength,
args.binSize, # bin size
# Filtering options
args.ignoreDuplicates,
args.minMappingQuality,
args.samFlagInclude,
args.samFlagExclude,
Expand All @@ -187,5 +202,6 @@ def main(args=None):
# running options
args.numberOfProcessors, # threads
args.region, # regions
args.verbose # verbose
args.verbose, # verbose
args.no_collapse,
)
11 changes: 10 additions & 1 deletion pydeeptools/deeptools/multiBamSummary2.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,15 @@ def process_args(args=None):
args.outRawCounts = "None"
if not args.scalingFactors:
args.scalingFactors = "None"
if not args.extendReads:
args.extendReads = False
args.extendReadsLen = 0
elif isinstance(args.extendReads, bool):
args.extendReadsLen = 0
args.extendReads = True
elif isinstance(args.extendReads, int):
args.extendReadsLen = args.extendReads
args.extendReads = True
# defaults for the filtering options
if not args.samFlagInclude:
args.samFlagInclude = 0
Expand All @@ -248,7 +257,6 @@ def main(args=None):
"""
args = process_args(args)
print(f"args = {args}")
print("running r_mbams")
r_mbams(
args.command,
args.bamfiles,
Expand All @@ -264,6 +272,7 @@ def main(args=None):
args.blackListFileName,
args.verbose,
args.extendReads,
args.extendReadsLen,
args.centerReads,
args.samFlagInclude,
args.samFlagExclude,
Expand Down
16 changes: 8 additions & 8 deletions pydeeptools/deeptools/parserCommon.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,14 +257,14 @@ def normalization_options():
default=None,
required=False)

group.add_argument('--exactScaling',
help='Instead of computing scaling factors based on a sampling of the reads, '
'process all of the reads to determine the exact number that will be used in '
'the output. This requires significantly more time to compute, but will '
'produce more accurate scaling factors in cases where alignments that are '
'being filtered are rare and lumped together. In other words, this is only '
'needed when region-based sampling is expected to produce incorrect results.',
action='store_true')
# group.add_argument('--exactScaling',
# help='Instead of computing scaling factors based on a sampling of the reads, '
# 'process all of the reads to determine the exact number that will be used in '
# 'the output. This requires significantly more time to compute, but will '
# 'produce more accurate scaling factors in cases where alignments that are '
# 'being filtered are rare and lumped together. In other words, this is only '
# 'needed when region-based sampling is expected to produce incorrect results.',
# action='store_true')

group.add_argument('--ignoreForNormalization', '-ignore',
help='A list of space-delimited chromosome names '
Expand Down
Loading
Loading