Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with showing IGV pileup for CRAM served from Nebula #1127

Open
tonydisera opened this issue Dec 5, 2024 · 28 comments
Open

Problem with showing IGV pileup for CRAM served from Nebula #1127

tonydisera opened this issue Dec 5, 2024 · 28 comments
Assignees
Milestone

Comments

@tonydisera
Copy link
Collaborator

Asking Ralph R. for a URL to a CRAM to troubleshoot this problem...

@tonydisera tonydisera self-assigned this Dec 5, 2024
@tonydisera tonydisera added this to the 4.11.4 milestone Dec 5, 2024
@zorgster
Copy link

zorgster commented Dec 9, 2024

I have a Nebula CRAM file... and it hangs when I press the Read Pileup link (I've not logged in for a while, so this is new to me)..

I have subsetted my CRAM file to gene STAT3.. and subsetted my VCF to the same region... this much smaller file also hangs when I press 'Read Pileup'.
I have converted the file to BAM and it still hangs.

[In order to read the CRAM file I had to download GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz.. use samtools seq_cache_populate.pl script to create a cache.. and then set environment variables that samtools uses to use the reference cache.. No other GRCh38 worked... but that wouldn't affect the BAM file... ]

Both the BAM and the CRAM load into https://igv.org/ and appear the same.

@tonydisera tonydisera pinned this issue Dec 9, 2024
@tonydisera
Copy link
Collaborator Author

tonydisera commented Dec 9, 2024

Thank you @zorgster for verifying that another Nebula user has this same problem. It would be a great help if you could send me your subsetted BAM for gene STAT3 to me for troubleshooting. Please email me directly to coordinate.

Best regards,
Tony

@zorgster
Copy link

The dependency for igv in vue-pileup is "igv": "^2.2.13". In the modules folder, I see IGV 2.4.0. Could this have been a bug in 2.4.0?

In the history for IGV - a few changes spotted - https://github.com/igvteam/igv.js/releases:
3.0.5 - a bug fix in 3.0.5 in which some actions could cause the browser to freeze when loading large regions..
2.13.8 - BAM index optimisations
2.13.3 - The URIs in CRAM files did not work.. igvteam/igv.js#1548

@tonydisera
Copy link
Collaborator Author

Wow, Oliver @zorgster, that is impressive troubleshooting! I bet you nailed it. I am away from my desk, but will give this a try when I am back at my computer. The IGV pileup is working for some bams/crams web served, but it could be failing under specific conditions triggered by the newer version of igv.js. Many thanks!

@zorgster
Copy link

zorgster commented Dec 11, 2024

I have loaded the [email protected] project, and edited their example "cram-vcf.html" file to load hg38 + the STAT3.vcf and .cram ... everything loaded ok.

I've cloned Vue-pileup... edited the file to load up the cram file... Could this be relevant for the CRAM files?

Error: CRAM version 67 not supported
    at new e (igv.esm.js:18358:1)
    at t.eval (igv.esm.js:18385:1)
    at w (igv.esm.js:18385:1)
    at Generator.eval [as _invoke] (igv.esm.js:18385:1)
    at t.<computed> [as next] (igv.esm.js:18385:1)
    at n (igv.esm.js:18378:1)
    at eval (igv.esm.js:18378:1)

and for the BAM file:

Error: http://localhost:8080/STAT3.bam.bai is not a bai file
    at eval (igv.esm.js:19487:1)

@zorgster
Copy link

Vue-pileup seems to be failing to detect the BAI as an index file. Is it reading the bytes in little endian? Are there two problems? one for CRAM (version 67?) and the other for the BAM index .. BAI?

Using this code (in bash) - it confirms the 'magic number' is 21578050 for BAI which matches the number in the IGV.esm.js file in the loadIndex function. Although it must be fine because I can use it in other IGV products.

magic=$(xxd -l 4 -p STAT3.bam.bai | awk '{print "0x" substr($0,7,2) substr($0,5,2) substr($0,3,2) substr($0,1,2)}')
magic_int=$((magic))
echo $magic_int

21578050

@tonydisera
Copy link
Collaborator Author

Many thanks, @zorgster, for your excellent troubleshooting! I'm going to bring @anderspitman into the conversation. He wrote the Vue.js component that wraps igv.js.

@anderspitman, we have a recent problem with viewing IGV pileup for certain bams and crams. This was noticed by some of our Nebula users. I may need your help figuring out why the pileup component is failing for these bams/crams. The good news is that we have this incredible user @zorgster, who has kindly spent time narrowing in on the problem. He has provided a subsetted bam that we can use to troubleshoot as well. Can you think of any changes to the pileup component that would have resulted in a 'CRAM version 67 not supported' error message or '..... is not a bai file'?

@anderspitman
Copy link
Member

It looks like vue-pileup hasn't changed in 5 years, so if this is a new problem I suspect it has to do with the data files. Maybe the file formats have evolved and we're now out of date?

It's interesting that it fails even after you update igv.js though. One thing to note is that when using the pileup dialog it's going to use the igv.js embedded in vue-pileup, but when opening in a separate tab it's going to navigate you to igv.iobio.io, which has it's own (old) version. I honestly can't remember why I did it this way but it seems like a terrible design now.

@tonydisera
Copy link
Collaborator Author

Interesting. Well, I won't troubleshoot that way!

This is strange that read pileup works on the demo data in gene.iobio. But seems to freeze with Oliver's provided file.

@zorgster
Copy link

I spoke too soon - I have restarted from a fresh clone and now I can load these into Vue-pileup - I may have altered the IGV version unwittingly.

image

@tonydisera
Copy link
Collaborator Author

I wonder if this is a build blunder on my part. @zorgster, noticed this -

The dependency for igv in vue-pileup is "igv": "^2.2.13". In the modules folder, I see IGV 2.4.0. Could this have been a bug in 2.4.0?

@tonydisera
Copy link
Collaborator Author

Thanks @zorgster, that is good info! I'm communicating with @anderspitman on Slack, so I will pass this along.

@tonydisera
Copy link
Collaborator Author

Cool @zorgster. If you can load the vue-pileup standalone, then the problem must be in the gene.iobio build. I will try to isolate the problem from that angle.

@zorgster
Copy link

zorgster commented Dec 11, 2024

I used the following for the Proband track... (in App.vue of Vue-pileup)

        {
          name: 'Proband',
          alignmentURL: "http://localhost:8080/STAT3.cram",
          alignmentIndexURL: "http://localhost:8080/STAT3.cram.crai",
          variantURL: "http://localhost:8080/STAT3.vcf.gz",
        },

And for Pileup:

    <pileup
      heading="Read pileup BALLY SNP 17:42322301"
      referenceURL="https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa"
      indexURL="https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa.fai"
      locus="chr17:42322290-42322330"
      :tracks='tracks'
    />

@tonydisera
Copy link
Collaborator Author

I spoke too soon - I have restarted from a fresh clone and now I can load these into Vue-pileup - I may have altered the IGV version unwittingly.

image

Hi @zorgster, what version of igv.js in your local build of vue-pileup is used? I'm seeing the same 'Magic number' problem you are w/ my local vue-pileup build. Also, I am curious what version of node you used to build vue-pileup. I had to use v16, but this could be a local environment issue.

@zorgster
Copy link

zorgster commented Dec 11, 2024

It wasn't clear which version of node I needed for Vue.pileup... I ran nvm use 13 before every npm command from within VSCode because it kept slipping back to 23.. I ran it with 13. Other versions of IGV I loaded were 2.6.0, 2.6.6 and 3.1.1...

Along the way I also spotted that there is a version change from CRAM v3.0 to v3.1... (around 2022).
https://academic.oup.com/bioinformatics/article/38/6/1497/6499262 ...

But I'm back thinking the bam and cram loading in gene.iobio may have a similar cause.. In gene.iobio, they are uploaded to the FL Proxy? and then used from that location? Mine worked when they were on http on the same location... if that helps...?

@anderspitman
Copy link
Member

anderspitman commented Dec 11, 2024

@tonydisera noticed that even though the latest version of vue-pileup on npm is 0.11.0, the repo is only 0.10.0. Apparently I never push the latest changes. That code may be lost now. I did a diff on the two versions:

Only in 0.10: lib
diff -r 0.10/package.json 0.11/package.json
3c3
<   "version": "0.10.0",
---
>   "version": "0.11.0",
11c11
<     "igv": "^2.2.13",
---
>     "igv": "^2.4.1",
diff -r 0.10/src/App.vue 0.11/src/App.vue
27a28
>           variantIndexURL: "https://s3.amazonaws.com/iobio/samples/vcf/platinum-exome.vcf.gz.tbi",
33a35
>           variantIndexURL: "https://s3.amazonaws.com/iobio/samples/vcf/platinum-exome.vcf.gz.tbi",
diff -r 0.10/src/components/Pileup.vue 0.11/src/components/Pileup.vue
31c31
< import igv from '../../lib/igv.esm'
---
> import igv from 'igv'
162a163
>       indexURL: tracks[0].variantIndexURL,
166c167
<   const url = 'https://s3.amazonaws.com/static.iobio.io/dev/igv.iobio.io/index.html?config=' + JSON.stringify(igvConfig)
---
>   const url = 'https://igv.iobio.io?config=' + JSON.stringify(igvConfig)

@anderspitman
Copy link
Member

Are we feeling confident that the latest version of igv.js is working? If so, and to avoid build issues in the future, maybe we should just make a new vanilla JS web component and wrap the latest version. The wrapper is very minimal anyway and Vue plays nicely with web components.

@zorgster
Copy link

zorgster commented Dec 11, 2024

I cloned the current igv.js repo, edited the cram-vcf.html and it loaded it with the VCF, BAM and CRAM all at once... (via npx http-server ) (on node 13.14.0) ...with the BAM, CRAM and VCF files in the same folder as the examples/cram-vcf.html file.

@tonydisera
Copy link
Collaborator Author

Are we feeling confident that the latest version of igv.js is working? If so, and to avoid build issues in the future, maybe we should just make a new vanilla JS web component and wrap the latest version. The wrapper is very minimal anyway and Vue plays nicely with web components.

Yes! This is a great suggestion, @anderspitman . We may need the vue-pileup component for tighter integration in other apps, like our new rnasplice.iobio.iobio, but for our use case in gene.iobio, this would be the way to go.

@tonydisera
Copy link
Collaborator Author

I cloned the current igv.js repo, edited the cram-vcf.html and it loaded it with the VCF, BAM and CRAM all at once... (via npx http-server ) (on node 13.14.0) ...with the BAM, CRAM and VCF files in the same folder as the examples/cram-vcf.html file.

Thank you @zorgster. I build with node v13.14.0 for gene.iobio. I can reproduce the bug in vue-pileup v 0.10 with the STAT3.bam file you provided. It works with all other files I have tried. I verified that the proxy is work on other files, so we can eliminate that variable.

@zorgster
Copy link

zorgster commented Dec 12, 2024

Can I just check... The full CRAM file from Nebula is 60 GB for 30X... does this get uploaded to the proxy? I suspect that this wouldn't be possible in the brief time it takes to load the website. How does this work... if it must then interrogate the CRAM file for reads aligning to the region displayed in Vue-pileup? Is it perhaps trying to load the whole file? (Which should work quickly for the smaller file..)
EDIT: Just brushed up all about Omnistreams (used by gene.iobio), and HTTP Range Requests, CORS in the IGV docs.

@anderspitman there's a note in the docs saying that a server might require HTTP Range headers to be set for the requests to work from IGV.js.. one thing I do think is that the freeze is because something is trying to pull in a large amount of data? It might be noticeable on a network traffic monitor for the server at the time the Pileup is requested?

Edit.. realised that would affect all files...

@anderspitman
Copy link
Member

Yeah bioinformatics tools usually don't work at all with servers that don't support range requests. The files tend to just be too big.

@zorgster
Copy link

zorgster commented Dec 12, 2024

Are there any restrictions on row length when reading in the bam files? My Nebula reads are 150 nt reads + the 150-length Q. plus about 6 additional fields over the 11 required. And gatk ValidateSamFiles reports about the read groups not starting with a number (although that happens a lot anyway)...

@AlistairNWard
Copy link
Member

No, there shouldn't be. 150bp reads are pretty standard, but can get a lot longer than that

@zorgster
Copy link

zorgster commented Dec 14, 2024

Hi...
I have been adding console.log points into the code to isolate where it stops.

I added one before and after extractQuery in IGV.js and the log after extractQuery (called 'b4 create browser') was not hit. So i set a few inside.

This image is where my browser appears frozen - but the hits on "eq tokens.length " is still increasing by a 1000 every 7 seconds...

image

This is using the example (WES) data in gene.iobio:

image

@zorgster
Copy link

zorgster commented Dec 14, 2024

The browser freezing issue is - if Sex is not selected by the user in gene.iobio - in the href, sex0 has no '=' and the code is trying to split the variables by '='...

because tokens.length == 1, i is never incremented in the for loop... (fixed in [email protected], with an else statement...)

From extractQuery in igv.js

	    for (i = i1 + 1; i < i2;) {
	      j = uri.indexOf("&", i);
	      if (j < 0) j = i2;
	      s = uri.substring(i, j);
	      tokens = s.split("=", 2);

        console.log(" i1 " + i1 + " - i2 " + i2 + " - i " + i + " - j " + j);
        console.log("eq s = " + s);
	      if (tokens.length === 2) {
	        key = tokens[0];
	        value = decodeURIComponent(tokens[1]);

	        if ('file' === key) {
	          // IGV desktop style file parameter
	          files = value.split(',');
	        } else if ('index' === key) {
	          // IGV desktop style index parameter
	          indexURLs = value.split(',');
	        } else {
	          config[key] = value;
	        }

	        i = j + 1; 
	      }
	    }

http://localhost:4026/?gene=STAT3&genes=STAT3&species=Human&build=GRCh38&affectedSibs=&unaffectedSibs=&rel0=proband&sex0&vcf0=https%3A%2F%2Flf-proxy.iobio.io%2Fpyvh-vd5u%2FBALLYBROOM.vcf.gz&tbi0=https%3A%2F%2Flf-proxy.iobio.io%2Fpyvh-vd5u%2FBALLYBROOM.vcf.gz.tbi&bam0=https%3A%2F%2Flf-proxy.iobio.io%2Fb04f-mv5c%2FBALLYBROOM.cram&bai0=https%3A%2F%2Flf-proxy.iobio.io%2Fb04f-mv5c%2FBALLYBROOM.cram.crai&sample0=BALLYBROOM&affectedStatus0=affected

Selecting a gender loads the modal IGV... but now shows the track error.

Error: Error accessing resource: 
https://lf-proxy.iobio.io/i3bd-3cg1/BALLYBROOM.cram?someRandomSeed=0.en5e8ufrzsp 
Status: 0
    at handleError (igv.js:25641:1)
    at xhr.onerror (igv.js:25622:1)
    ...
    asyncGeneratorStep  @ igv.js:2419

At least that solves the hanging browser problem!

@zorgster
Copy link

zorgster commented Dec 14, 2024

Also i now see:

Access to XMLHttpRequest at 'https://lf-proxy.iobio.io/3p2m-60em/STAT3.cram?someRandomSeed=0.31t1siraqkk' 
from origin 'http://localhost:4026' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is 
present on the requested resource.

igv.js:25634 
GET https://lf-proxy.iobio.io/3p2m-60em/STAT3.cram?someRandomSeed=0.31t1siraqkk 
net::ERR_FAILED 206 (Partial Content)

Now I have worked around the browser hanging - by selecting 'Male' - I also see these errors on the main site:

origin 'https://gene.iobio.io' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is 
present on the requested resource.

The sample data loads ok - it is being loaded from AWS S3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants