- Concat operation now supports special modes to concatenate large number of files in parallel when the downstream operation is passed to sort (reported by @ekg). This mode is ~30-fold faster / thread and is parallelizable.
- Introducing the experimental
aggregation
subroutine. Modelled based on the Datashader pipeline, Tomahawk can now natively construct aggregated datasets of billions of points into datasets of pixels that can be plotted using raster operations efficiently. - Slicing operator for
two
files has been re-instated. Use-O
for output type (two
orld
) when invoking theview
subroutine
- Bug fixes
- Problem when importing variants within 1-base of each other (reported by @ekg)
- Memory requirement problems when parallel sort merging very large files (reported by @ekg)
- Removed default missingness filter (from 0.2 to 1.0) to support importing
bcf
with few samples - Fixed problem with linked intervals during slicing. Now correctly selects targetted associations based on both mates (reported by @ekg)
- Default build now uses SSE4.2 only
- Will correctly build
dylib
on OSX - Added filtering for maxmimum cell counts when viewing
two
files (flags-1
,-2
,-3
,-4
)
- Bug fixes
- Viewing a
two
file with-l
or-u
now correctly filters output when passed as the sole parameter
- Viewing a
- Now correctly filters
two
files
- Meta index for faster queries of
two
files now builds correctly - Interval slicing for sorted
two
files now outputs filtering hits for the from positions only - Haplotype counts in binary
two
files are now stored asdouble
instead offloat
- All changes are breaking because of structural rechanges. No files generated prior to this release are supported.
- Build process now smoother
- Shared C++ library is now built by default
- All changes are breaking because of structural rechanges. No files generated prior to this release are supported.
- Added large number of new FLAG fields to
two
files - Reintroduced speed mode for calculations when contingency tables are not required
- None
- Bug fixes
- Viewing sorted two files fixed for edge cases
- Using old sorted files does not break functionality but have to be resorted
- Sorting
two
files not correctly produces two indices used for fast queries - Viewing sorted two files is significantly faster
- Bug fixes
- Sort merge (
tomahawk sort -M
) now produces the correct index
- Sort merge (
- None
- Bug fixes
- Sort merge (
tomahawk sort -M
) now produces the correct output view
ABI command now correctly triggers-h
/-H
flag
- Sort merge (
- All changes are breaking. There is no backwards compatibility from this release!
- All major classes are now implemented as STL-like container and are as decoupled as possible
- External sort file handles now buffer a small amount of data to reduce random access lookups
two
/twk
entries are no longer forced to be accessed by unaligned memory addresses- Both
two
andtwk
are now self-indexing. All external indices are now invalid
- Bug fixes
- Too many to list here
- Examples updates:
- Added R script to demonstrate simple plotting using
base
- Added several figures demonstrating some keys concepts of Tomahawk
- Added R script to demonstrate simple plotting using