Discuss "compressive genomics" #52

jeromekelleher · 2023-12-07T13:06:01Z

Loh et al argue for the idea of compressive genomics and follow up with ideas of Compressive acceleration.

These are attractive ideas, but only work in certain situations and cleaned up data. We will always start out with messy variant calls initially, and we need a software stack and data structures to work with this.

hammer · 2023-12-08T15:57:54Z

These ideas are not in conflict with also being able to scale work out across processors and servers, and can apply even for messy data. There was a lot of effort in the Hadoop ecosystem to identify compression codecs that were splittable (our friend Tom White wrote about the topic in his book) and had the right tradeoff of computation and storage efficiency (e.g. Snappy was an improvement at the time). Much of the work since then has gone into using instruction set extensions to make hardware-friendly codecs, and algorithms to operate directly on compressed data, discussed as far back as Data compression and database performance (1991) for example.

hammer · 2023-12-08T15:59:56Z

This was a nice prompt to scan for recent work on this topic in the databases world; https://github.com/maxi-k/btrblocks looks quite interesting!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discuss "compressive genomics" #52

Discuss "compressive genomics" #52

jeromekelleher commented Dec 7, 2023

hammer commented Dec 8, 2023

hammer commented Dec 8, 2023

Discuss "compressive genomics" #52

Discuss "compressive genomics" #52

Comments

jeromekelleher commented Dec 7, 2023

hammer commented Dec 8, 2023

hammer commented Dec 8, 2023