Skip to content

Releases: sof202/ChromBinarize

v1.3.0

20 Sep 14:09
02b9dd3
Compare
Choose a tag to compare

Features

  • Breaking change: Most R functions have been moved to the chrombinarize R package which is now found in this repository (#2)
    • Tests are now available to show functions are working as expected
  • Error messages spanning from shell scripts are now coloured red for readability (ANSI codes are used for this)

Refactors

  • Renamed files stemming from processing WGBS and oxBS together (now just referred to as 5hmc files)
  • File types are now more consistent across scripts thanks to (#2). All functions now take in 'bedmethyl', 'comparison_bedmethyl' or 'bin_counts' files.
  • Many more checks are implemented when reading in files to avoid errors due to bad file format (with helpful error messages to explain why files are being rejected).
  • The method for combining 5hmC and 5mC signals in bedmethyl files has been cleaned up.
    • Instead of relying on lead and lag functions, it instead extracts 5mC and 5hmC signals separately, then combines the two.

Fixes

  • The method for finding the average methylation percent of surrounding/nearby CpGs is now faster and cleaner. It allows for more than 2 CpGs to be included in the average calculation (rather than the immediate neighbours being factored only).

Documentation

  • Moved all documentation over to mkdocs (#1) for better organisation
    • Docs can now be found here
  • Documentation for individual R functions are available upon loading the chrombinarize R package (provided by Roxygen)

Other (GitHub specific)

  • Added CI for testing changes to R code
  • Added badge to readme to show CI status
  • Added contributing guidelines
  • Added a PR template

v1.2.0

01 Aug 09:29
0616288
Compare
Choose a tag to compare

Features

  • [BREAKING CHANGE]: Software requirements have been slashed. New build process has been implemented that uses conda and the R package renv. This in principle allows for more portability and reduces prerequisites for running the scripts.
  • Added an option to use only CpG sites that are in CpG islands when fitting the binomial distribution to erroneous read probability.
    • This is possibly less desirable for ONT data where base calling might be affected by CpG density. It is more likely to be useful for BS-Seq data
    • This step is optional in case you cannot obtain a CpG island reference for your dataset (hg19 and hg38 are provided for you)
  • Added a script that allows the user to convert a binary file from one bin size to another (useful with datasets with multiple modalities)
  • [BREAKING CHANGE]: Changed method for determining densely methylated CpG sites. New method uses density information with the beta distribution rather than frequency/count information with the Poisson distribution.
    • This will work nicer with more varied bin sizes.
    • Changed log file locations to be more structured (sorted by job name and user)
    • Changed some default values in the config-setup file

Refactors

  • Moved parameters back into config file (keeping all configuration in one file is easier)
  • Renamed some files to be more informative of what they do
  • Moved end location of sparse and dense binary files to make it easier to use ChromHMM's MergeBinary command
  • Removed obsolete Rscript that installed R libraries (in favour of renv)

v1.1.0

18 Jul 12:58
Compare
Choose a tag to compare

Features

  • Added a script to install any R libraries you may be missing
  • Overhauled approach to isolating 5hmC using oxBS and WGBS
    • This is outlined in README. In essence, the pipeline now uses confidence intervals to discern whether a site has a greater combined
      methylation signal over a 5mC signal (implying significant 5hmC signal).
  • Added logging
    • Logs are only displayed if DEBUG_MODE (in parameters) is set to 1 as most of this information is only useful for timing
    • Common sources of errors (from thresholds) are printed when relevant
  • If you are convinced that sites with extreme read depths (in comparison to other reads) are invalid, a maximum read depth threshold has been added to the config file
    • If you aren't convinced, just set this value to a very high number (default)
  • A separate script has been created for binarizing WGBS data (for when you do not have accompanying oxBS data)

Bug fixes

  • ChromBinarize now correctly assigns 5mC to oxBS data (rather than 5hmC)

Other

  • Updated software requirements
  • Several test Rscripts have been removed
  • Error messages on supplying the incorrect config file location are now more helpful
  • README has been updated to aid in explanations
  • All output files are now tsvs (usually with .bed)
  • R script that calculates probabilities of erroneous methylation has been cleaned up (removal of previously used columns)
  • DEBUG_MODE has been moved to parameters.txt

v1.0.1

04 Jul 12:43
3635184
Compare
Choose a tag to compare

Minor bug fixes

  • Binarization was previously forcing the thresholds used for hydroxymethylation (regardless of methylation type)
  • Correctly uses gzip to compress files instead of gunzip (which does the opposite, whoops)

Features

  • Added a 'debug mode' option to the config file. If set to true, intermediate files for binarization steps are kept (previous behaviour). Otherwise these files are deleted (new default behaviour).
  • Added setup script to ease usage of scripts outside of specific UoE HPC system

v1.0.0

03 Jul 09:53
Compare
Choose a tag to compare

Initial release

Main features:

  • Binarize ONT bed files created using modkit
  • Binarize oxBS/WGBS bed files created using wgbs_tools
  • Binarize ATAC/ChIP peak called bed files created using MACS
  • Analyse ONT/BS bed files further for sense checking and cross comparison