Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The bedtools multiinter -header option outputs broken BED #1085

Open
dbolser opened this issue Apr 5, 2024 · 2 comments
Open

The bedtools multiinter -header option outputs broken BED #1085

dbolser opened this issue Apr 5, 2024 · 2 comments

Comments

@dbolser
Copy link

dbolser commented Apr 5, 2024

Hi, Thanks for creating bedtools @arq5x !

When I use:
bedtools multiinter -header -i a.bed b.bed

the resulting output looks like this:

chrom   start   end     num     list    a.bed        b.bed
chr1    16366   16374   1       1       1       0
chr1    18235   18249   1       2       0       1
chr1    18461   18475   1       2       0       1
chr1    25597   25605   1       1       1       0
chr1    34568   34576   1       1       1       0
chr1    36631   36645   1       2       0       1
chr1    42427   42435   1       1       1       0
chr1    44984   44992   1       1       1       0
...

Note that this is not bed format, which should be:

#chrom   start   end     num     list    a.bed        b.bed
chr1    16366   16374   1       1       1       0
chr1    18235   18249   1       2       0       1
chr1    18461   18475   1       2       0       1
chr1    25597   25605   1       1       1       0
chr1    34568   34576   1       1       1       0
chr1    36631   36645   1       2       0       1
chr1    42427   42435   1       1       1       0
chr1    44984   44992   1       1       1       0
...

At least I think that's what it should be ... I'm actually not sure now I check the BED specification...

HOWEVER, the lack of the # DOES break your own tools, e.g. multiinter itself expects the BED file header to start with #, and so does genomecov (and so does LiftOver from UCSC...)

So perhaps this bug is better described as "multiinter and genomecov fall over when given a valid bed header...", with a separate bug for LiftOver?

THx

@dbolser
Copy link
Author

dbolser commented Apr 5, 2024

I don't want to do this:

time bedtools genomecov \
    -i <(echo -n '#' && bgzcat Data3/PWM/multiinter.bed.gz)
    -g ../../ENCFF792NJK.tsv \
    > genomecov-multiinter.tsv # hashtag sad face

@dbolser
Copy link
Author

dbolser commented Apr 5, 2024

Also tabix doesn't like the header without a preceding #, e.g. although tabix -S 1 -p bed multiinter.bed.gz works, tabix -H multiinter.bed.gz doesn't (it does when you have the #) and as an extension, Python's pysam.TabixFile("multiinter.bed.gz").header also 'fails'.

So although nothing in BED formally requires a # for the header, it seems a lot of tooling has adopted that convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant