Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss "compressive genomics" #52

Open
jeromekelleher opened this issue Dec 7, 2023 · 2 comments
Open

Discuss "compressive genomics" #52

jeromekelleher opened this issue Dec 7, 2023 · 2 comments

Comments

@jeromekelleher
Copy link
Collaborator

Loh et al argue for the idea of compressive genomics and follow up with ideas of Compressive acceleration.

These are attractive ideas, but only work in certain situations and cleaned up data. We will always start out with messy variant calls initially, and we need a software stack and data structures to work with this.

@hammer
Copy link

hammer commented Dec 8, 2023

These ideas are not in conflict with also being able to scale work out across processors and servers, and can apply even for messy data. There was a lot of effort in the Hadoop ecosystem to identify compression codecs that were splittable (our friend Tom White wrote about the topic in his book) and had the right tradeoff of computation and storage efficiency (e.g. Snappy was an improvement at the time). Much of the work since then has gone into using instruction set extensions to make hardware-friendly codecs, and algorithms to operate directly on compressed data, discussed as far back as Data compression and database performance (1991) for example.

@hammer
Copy link

hammer commented Dec 8, 2023

This was a nice prompt to scan for recent work on this topic in the databases world; https://github.com/maxi-k/btrblocks looks quite interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants