deeptools · WardDeb · Apr 3, 2025 · Feb 4, 2025 · Feb 5, 2025 · Feb 5, 2025
diff --git a/deeptools4.0.0_changes.md b/deeptools4.0.0_changes.md
diff --git a/docs/content/changelog.rst b/docs/content/changelog.rst
@@ -1,7 +1,8 @@
 Changes in deepTools4.0
 =======================
 
-Changes:
+Plotting
+--------
 
 * Plots in general:
     - Removed Plotly for all graphics.
@@ -25,97 +26,25 @@ Changes:
 	- Scree plot is showing lines for individual and accumulated variation.
 	- Points are by default rainbow colored circles.
 
-Changes in deepTools2.0
-========================
+Core
+----
 
-.. contents:: 
-    :local:
-
-Major changes
--------------
-
-.. note:: The major changes encompass features for **increased efficiency**, **new sequencing data types**, and **additional plots**, particularly for QC.
-
-Moreover, deepTools modules can now be used by other python programs.
-The :ref:`api` is part of the new documentation.
-
-Accommodating additional data types
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-* correlation and comparisons can now be calculated for **bigWig files** (in addition to BAM files) using ``multiBigwigSummary`` and ``bigwigCompare``
-
-* **RNA-seq:** split-reads are now natively supported
+* bamCoverage
+    - --no_collapse flag to not merge bins with equal coverage values together.
 
-* **MNase-seq:** using the new option ``--MNase`` in ``bamCoverage``, one can now compute read coverage only taking the 2 central base pairs of each mapped fragment into account.
-
-Structural updates
-^^^^^^^^^^^^^^^^^^^
-
-* All modules have comprehensive and automatic tests that evaluate proper functioning after any modification of the code.
-* Virtualization for stability: we now provide a ``docker`` image and enable the easy deployment of deepTools via the Galaxy ``toolshed``.
-* Our documentation is now version-aware thanks to readthedocs and ``sphinx``.
-* The API is public and documented.
-
-Renamed tools
-^^^^^^^^^^^^^
-
-* **heatmapper** to :doc:`tools/plotHeatmap`
-* **profiler** to :doc:`tools/plotProfile`
-* **bamCorrelate** to :doc:`tools/multiBamSummary`
-* **bigwigCorrelate** to :doc:`tools/multiBigwigSummary`
-* **bamFingerprint** to :doc:`tools/plotFingerprint`
-
-
-Increased efficiency
-^^^^^^^^^^^^^^^^^^^^
-
-* We dramatically improved the **speed** of bigwig related tools (:doc:`tools/multiBigwigSummary` and ``computeMatrix``) by using the new `pyBigWig module <https://github.com/dpryan79/pyBigWig>`_.
-
-* It is now possible to generate one composite heatmap and/or meta-gene image based on **multiple bigwig files** in one go (see :doc:`tools/computeMatrix`, :doc:`tools/plotHeatmap`, and :doc:`tools/plotProfile` for examples)
-
-* ``computeMatrix`` now also accepts multiple input BED files. Each is treated as a group within a sample and is plotted independently.
-
-* We added **additional filtering options for handling BAM files**, decreasing the need for prior filtering using tools other than deepTools: The ``--samFlagInclude`` and ``--samFlagExclude`` parameters can, for example, be used to only include (or exclude) forward reads in an analysis.
-
-* We separated the generation of read count tables from the calculation of pairwise correlations that was previously handled by ``bamCorrelate``. Now, read counts are calculated first using ``multiBamSummary`` or ``multiBigWigCoverage`` and the resulting output file can be used for calculating and plotting pairwise correlations using ``plotCorrelation`` or for doing a principal component analysis using ``plotPCA``.
-
-New features and tools
-^^^^^^^^^^^^^^^^^^^^^^
-
-* Correlation analyses are no longer limited to BAM files -- bigwig files are possible, too! (see :doc:`tools/multiBigwigSummary`)
-
-* Correlation coefficients can now be computed even if the data contains NaNs.
-
-* Added **new quality control** tools:
-      - use :doc:`tools/plotCoverage` to plot the coverage over base pairs
-      - use :doc:`tools/plotPCA` for principal component analysis
-      - :doc:`tools/bamPEFragmentSize` can be used to calculate the average fragment size for paired-end read data
-
-* Added the possibility for **hierarchical clustering**, besides *k*-means to ``plotProfile`` and ``plotHeatmap``
-
-* ``plotProfile`` has many more options to make compelling summary plots
-
-
-Minor changes
--------------
-
-Changed parameters names and settings
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-* ``computeMatrix`` can now read files with DOS newline characters.
-* ``--missingDataAsZero`` was renamed to ``--skipNonCoveredRegions`` for clarity in ``bamCoverage`` and ``bamCompare``.
-* Read extension was made optional and we removed the need to specify a default fragment length for most of the tools: ``--fragmentLength`` was thus replaced by the new optional parameter ``--extendReads``.
-* Added option ``--skipChromosomes`` to ``multiBigwigSummary``, which can be used to, for example, skip all 'random' chromosomes.
-* Added the option for adding titles to QC plots.
-
-Bug fixes
-^^^^^^^^^
-
-* Resolved an error introduced by ``numpy version 1.10`` in ``computeMatrix``.
-* Improved plotting features for ``plotProfile`` when using as plot type: 'overlapped_lines' and 'heatmap'
-* Fixed problem with BED intervals in ``multiBigwigSummary`` and ``multiBamSummary`` that returned wrongly labeled raw counts.
-* ``multiBigwigSummary`` now also considers chromosomes as identical when the names between samples differ by 'chr' prefix, e.g. chr1 vs. 1.
-* Fixed problem with wrongly labeled proper read pairs in a BAM file. We now have additional checks to determine if a read pair is a proper pair: the reads must face each other and are not allowed to be farther apart than 4x the mean fragment length.
-* For ``bamCoverage`` and ``bamCompare``, the behavior of ``scaleFactor`` was updated such that now, if given in combination with the normalization options (``--normalizeTo1x`` or ``--normalizeUsingRPKM``), the given scaling factor will be multiplied with the factor computed by the respective normalization method.
-
-
+* computeMatrix
+    - --sortRegions 'no' option no longer exists
+    - Sorting ascend / descend no longer has subsorting by position.
+    - --quiet / -q option no longer exists.
+    - bed files in computeMatrix no longer support '#' to define groups.
+    - 'chromosome matching' i.e. chr1 <-> 1, chrMT <-> MT is no longer performed.
+
+* normalization
+    - Exactscaling is no longer an option, it's always performed.
+
+* alignmentSieve
+    - options label, smartLabels, genomeChunkLength are removed.
+    - ignoreDuplicates is removed, and (if wanted) should be set by the SamFlagExclude setting.
+
+Testing
+-------
diff --git a/pydeeptools/deeptools/bamCompare2.py b/pydeeptools/deeptools/bamCompare2.py
@@ -135,6 +135,11 @@ def getOptionalArgs():
                           'This is determined BEFORE any applicable pseudocount '
                           'is added.',
                           action='store_true')
+    optional.add_argument('--no_collapse',
+                          help='By default adjacent bins that have the same value are collapsed. This reduces the size of the output file (drastically).'
+                          'If you like to opt out of this behavior, you can set this flag.',
+                          default=True,
+                          action='store_false')
 
     return parser
 
@@ -252,6 +257,15 @@ def main(args=None):
         args.samFlagExclude = 0
     if not args.region:
         args.region = 'None'
+    if not args.extendReads:
+        args.extendReads = False
+        args.extendReadsLen = 0
+    elif isinstance(args.extendReads, bool):
+        args.extendReadsLen = 0
+        args.extendReads = True
+    elif isinstance(args.extendReads, int):
+        args.extendReadsLen = args.extendReads
+        args.extendReads = True
     if not args.blackListFileName:
         args.blackListFileName = 'None'
     else:
@@ -272,8 +286,10 @@ def main(args=None):
         args.scaleFactorsMethod, # scaling method
         args.operation,
         args.pseudocount,
+        args.extendReads,
+        args.extendReadsLen,
+        args.centerReads,
         args.blackListFileName,
-        args.ignoreDuplicates,
         args.minMappingQuality,
         args.samFlagInclude,
         args.samFlagExclude,
@@ -283,32 +299,6 @@ def main(args=None):
         args.ignoreForNormalization,
         args.binSize, # bin size
         args.region, # regions
-        True # verbose
+        args.verbose, # verbose
+        args.no_collapse, # collapse the ofile or not.
     )
-
-    # #if args.normalizeUsing == "RPGC":
-    # #    sys.exit("RPGC normalization (--normalizeUsing RPGC) is not supported with bamCompare!")
-    # #if args.normalizeUsing == 'None':
-    #     args.normalizeUsing = None  # For the sake of sanity
-    # if args.scaleFactorsMethod != 'None' and args.normalizeUsing:
-    #     sys.exit("`--normalizeUsing {}` is only valid if you also use `--scaleFactorsMethod None`! To prevent erroneous output, I will quit now.\n".format(args.normalizeUsing))
-
-    # # Get mapping statistics
-    # bam1, mapped1, unmapped1, stats1 = bamHandler.openBam(args.bamfile1, returnStats=True, nThreads=args.numberOfProcessors)
-    # bam1.close()
-    # bam2, mapped2, unmapped2, stats2 = bamHandler.openBam(args.bamfile2, returnStats=True, nThreads=args.numberOfProcessors)
-    # bam2.close()
-
-    # scale_factors = get_scale_factors(args, [stats1, stats2], [mapped1, mapped2])
-    # if scale_factors is None:
-    #     # check whether one of the depth norm methods are selected
-    #     if args.normalizeUsing is not None:
-    #         args.scaleFactor = 1.0
-    #         # if a normalization is required then compute the scale factors
-    #         args.bam = args.bamfile1
-    #         scale_factor_bam1 = get_scale_factor(args, stats1)
-    #         args.bam = args.bamfile2
-    #         scale_factor_bam2 = get_scale_factor(args, stats2)
-    #         scale_factors = [scale_factor_bam1, scale_factor_bam2]
-    #     else:
-    #         scale_factors = [1, 1]
diff --git a/pydeeptools/deeptools/bamCoverage2.py b/pydeeptools/deeptools/bamCoverage2.py
@@ -101,6 +101,12 @@ def get_optional_args():
                           choices=['forward', 'reverse'],
                           default=None)
 
+    optional.add_argument('--no_collapse',
+                          help='By default adjacent bins that have the same value are collapsed. This reduces the size of the output file (drastically).'
+                          'If you like to opt out of this behavior, you can set this flag.',
+                          default=True,
+                          action='store_false')
+
     return parser
 
 def scaleFactor(string):
@@ -136,9 +142,18 @@ def main(args=None):
     if not args.normalizeUsing:
         args.normalizeUsing = 'None'
     if not args.Offset:
-        args.Offset = [1, -1]
+        args.Offset = [0, 0]
+    elif len(args.Offset) == 1:
+        args.Offset = [args.Offset[0], 0]
     if not args.extendReads:
-        args.extendReads = 0
+        args.extendReads = False
+        args.extendReadsLen = 0
+    elif isinstance(args.extendReads, bool):
+        args.extendReadsLen = 0
+        args.extendReads = True
+    elif isinstance(args.extendReads, int):
+        args.extendReadsLen = args.extendReads
+        args.extendReads = True
     if not args.filterRNAstrand:
         args.filterRNAstrand = 'None'
     if not args.blackListFileName:
@@ -170,6 +185,7 @@ def main(args=None):
         args.MNase,
         args.Offset,
         args.extendReads,
+        args.extendReadsLen,
         args.centerReads,
         args.filterRNAstrand,
         args.blackListFileName,
@@ -178,7 +194,6 @@ def main(args=None):
         args.smoothLength,
         args.binSize, # bin size
         # Filtering options
-        args.ignoreDuplicates,
         args.minMappingQuality,
         args.samFlagInclude,
         args.samFlagExclude,
@@ -187,5 +202,6 @@ def main(args=None):
         # running options
         args.numberOfProcessors, # threads
         args.region, # regions
-        args.verbose # verbose
+        args.verbose, # verbose
+        args.no_collapse, 
     )
diff --git a/pydeeptools/deeptools/multiBamSummary2.py b/pydeeptools/deeptools/multiBamSummary2.py
@@ -224,6 +224,15 @@ def process_args(args=None):
         args.outRawCounts = "None"
     if not args.scalingFactors:
         args.scalingFactors = "None"
+    if not args.extendReads:
+        args.extendReads = False
+        args.extendReadsLen = 0
+    elif isinstance(args.extendReads, bool):
+        args.extendReadsLen = 0
+        args.extendReads = True
+    elif isinstance(args.extendReads, int):
+        args.extendReadsLen = args.extendReads
+        args.extendReads = True
     # defaults for the filtering options
     if not args.samFlagInclude:
         args.samFlagInclude = 0
@@ -248,7 +257,6 @@ def main(args=None):
     """
     args = process_args(args)
     print(f"args = {args}")
-    print("running r_mbams")
     r_mbams(
         args.command,
         args.bamfiles,
@@ -264,6 +272,7 @@ def main(args=None):
         args.blackListFileName,
         args.verbose,
         args.extendReads,
+        args.extendReadsLen,
         args.centerReads,
         args.samFlagInclude,
         args.samFlagExclude,

diff --git a/pydeeptools/deeptools/parserCommon.py b/pydeeptools/deeptools/parserCommon.py
@@ -257,14 +257,14 @@ def normalization_options():
                        default=None,
                        required=False)
 
-    group.add_argument('--exactScaling',
-                       help='Instead of computing scaling factors based on a sampling of the reads, '
-                       'process all of the reads to determine the exact number that will be used in '
-                       'the output. This requires significantly more time to compute, but will '
-                       'produce more accurate scaling factors in cases where alignments that are '
-                       'being filtered are rare and lumped together. In other words, this is only '
-                       'needed when region-based sampling is expected to produce incorrect results.',
-                       action='store_true')
+    # group.add_argument('--exactScaling',
+    #                    help='Instead of computing scaling factors based on a sampling of the reads, '
+    #                    'process all of the reads to determine the exact number that will be used in '
+    #                    'the output. This requires significantly more time to compute, but will '
+    #                    'produce more accurate scaling factors in cases where alignments that are '
+    #                    'being filtered are rare and lumped together. In other words, this is only '
+    #                    'needed when region-based sampling is expected to produce incorrect results.',
+    #                    action='store_true')
 
     group.add_argument('--ignoreForNormalization', '-ignore',
                        help='A list of space-delimited chromosome names '