Releases: sof202/ChromBinarize
Releases · sof202/ChromBinarize
v1.3.0
Features
- Breaking change: Most R functions have been moved to the
chrombinarize
R package which is now found in this repository (#2)- Tests are now available to show functions are working as expected
- Error messages spanning from shell scripts are now coloured red for readability (ANSI codes are used for this)
Refactors
- Renamed files stemming from processing WGBS and oxBS together (now just referred to as 5hmc files)
- File types are now more consistent across scripts thanks to (#2). All functions now take in 'bedmethyl', 'comparison_bedmethyl' or 'bin_counts' files.
- Many more checks are implemented when reading in files to avoid errors due to bad file format (with helpful error messages to explain why files are being rejected).
- The method for combining 5hmC and 5mC signals in bedmethyl files has been cleaned up.
- Instead of relying on lead and lag functions, it instead extracts 5mC and 5hmC signals separately, then combines the two.
Fixes
- The method for finding the average methylation percent of surrounding/nearby CpGs is now faster and cleaner. It allows for more than 2 CpGs to be included in the average calculation (rather than the immediate neighbours being factored only).
Documentation
- Moved all documentation over to mkdocs (#1) for better organisation
- Docs can now be found here
- Documentation for individual R functions are available upon loading the chrombinarize R package (provided by Roxygen)
Other (GitHub specific)
- Added CI for testing changes to R code
- Added badge to readme to show CI status
- Added contributing guidelines
- Added a PR template
v1.2.0
Features
- [BREAKING CHANGE]: Software requirements have been slashed. New build process has been implemented that uses
conda
and the R packagerenv
. This in principle allows for more portability and reduces prerequisites for running the scripts. - Added an option to use only CpG sites that are in CpG islands when fitting the binomial distribution to erroneous read probability.
- This is possibly less desirable for ONT data where base calling might be affected by CpG density. It is more likely to be useful for BS-Seq data
- This step is optional in case you cannot obtain a CpG island reference for your dataset (hg19 and hg38 are provided for you)
- Added a script that allows the user to convert a binary file from one bin size to another (useful with datasets with multiple modalities)
- [BREAKING CHANGE]: Changed method for determining densely methylated CpG sites. New method uses density information with the beta distribution rather than frequency/count information with the Poisson distribution.
- This will work nicer with more varied bin sizes.
- Changed log file locations to be more structured (sorted by job name and user)
- Changed some default values in the config-setup file
Refactors
- Moved parameters back into config file (keeping all configuration in one file is easier)
- Renamed some files to be more informative of what they do
- Moved end location of sparse and dense binary files to make it easier to use ChromHMM's MergeBinary command
- Removed obsolete Rscript that installed R libraries (in favour of
renv
)
v1.1.0
Features
- Added a script to install any R libraries you may be missing
- Overhauled approach to isolating 5hmC using oxBS and WGBS
- This is outlined in README. In essence, the pipeline now uses confidence intervals to discern whether a site has a greater combined
methylation signal over a 5mC signal (implying significant 5hmC signal).
- This is outlined in README. In essence, the pipeline now uses confidence intervals to discern whether a site has a greater combined
- Added logging
- Logs are only displayed if DEBUG_MODE (in parameters) is set to 1 as most of this information is only useful for timing
- Common sources of errors (from thresholds) are printed when relevant
- If you are convinced that sites with extreme read depths (in comparison to other reads) are invalid, a maximum read depth threshold has been added to the config file
- If you aren't convinced, just set this value to a very high number (default)
- A separate script has been created for binarizing WGBS data (for when you do not have accompanying oxBS data)
Bug fixes
- ChromBinarize now correctly assigns 5mC to oxBS data (rather than 5hmC)
Other
- Updated software requirements
- Several test Rscripts have been removed
- Error messages on supplying the incorrect config file location are now more helpful
- README has been updated to aid in explanations
- All output files are now tsvs (usually with .bed)
- R script that calculates probabilities of erroneous methylation has been cleaned up (removal of previously used columns)
- DEBUG_MODE has been moved to parameters.txt
v1.0.1
Minor bug fixes
- Binarization was previously forcing the thresholds used for hydroxymethylation (regardless of methylation type)
- Correctly uses
gzip
to compress files instead ofgunzip
(which does the opposite, whoops)
Features
- Added a 'debug mode' option to the config file. If set to true, intermediate files for binarization steps are kept (previous behaviour). Otherwise these files are deleted (new default behaviour).
- Added
setup
script to ease usage of scripts outside of specific UoE HPC system
v1.0.0
Initial release
Main features:
- Binarize ONT bed files created using modkit
- Binarize oxBS/WGBS bed files created using wgbs_tools
- Binarize ATAC/ChIP peak called bed files created using MACS
- Analyse ONT/BS bed files further for sense checking and cross comparison