Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there any means to speedup intersect? #1091

Open
lokapal opened this issue May 18, 2024 · 0 comments
Open

Are there any means to speedup intersect? #1091

lokapal opened this issue May 18, 2024 · 0 comments

Comments

@lokapal
Copy link

lokapal commented May 18, 2024

I have 4C experiments to process. Naturally, I have MILLIONS of anchor zone reads. And the speed of intersection creation is unbearable slow. Each of the replicates contains almost 15 million reads in total. 4 hours have passed already, bedtools intersect has only processed 3.8 million reads, and the file is growing VERY, VERY slowly.
The commandline is:

bedtools intersect -bed -iobuf 100G -sorted -wa -wb -u -g hg38.genome.txt -a rep1.bam -b rep2.bam > intersect.bed

I have PE reads so I don't like the idea to convert it all to bedgraph and then find intersections in 5 seconds.
bedtools uses only 1 CPU core, although computer has a plenty of free cores.
Ubuntu 22.04 x64, SSD Samsung 980 Pro 1TB.
I don't need in 'true' intersection, I need in complete list of unmodified reads/alignments that are intersected in replicates files to obtain the BAM file with intersected reads to process it further with featureCounts. Are there alternatives to bedtools for this task?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant