Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: iteration function without allocation #28

Open
romgrk opened this issue Sep 11, 2019 · 5 comments
Open

Request: iteration function without allocation #28

romgrk opened this issue Sep 11, 2019 · 5 comments

Comments

@romgrk
Copy link

romgrk commented Sep 11, 2019

Hey,

So for performance reasons would it be possible to implement an API that doesn't allocate an object for each entry? It could look something like this:

async function fillBuffer() {
  const ti = new BigWig({ path: 'volvox.bw' })
  const header = await ti.getHeader()
  const length = header.refsByNumber[0].length
  const buffer = Buffer.allocate(length)
  await ti.iterate('chr1', 0, length, { scale: 1 }, (start, end, score) => {
    for (let position = start; position < end; position++)
      buffer.writeFloatLE(score, position * 4)
  })
  return buffer
}
@cmdcolin
Copy link
Collaborator

I have heard of this technique referred to as "bring your own buffer"...It may be possible to do this. Do you have significant evidence of the performance degradation?

I can see you are using scale: 1 so that would probably be intensive across the whole length of a chromosome. You could consider using one of the other reductionLevels to make this involve less data, probably would be faster, but if you require the lowest scale then I can see that would be resource intensive.

@romgrk
Copy link
Author

romgrk commented Sep 11, 2019

Yes, we indeed require to use scale: 1. We're converting bigWig files into loompy files for an implementation of the ga4gh-rnaseq API, and we need to fill a buffer with every value, and this is for multiple tracks at once so we're filling lots of buffers with lots of entries. The API also allows for returning multiple tracks of the whole bigWig file, I'm pretty sure that any saved allocation can decrease the memory-cost of the process. (speed is not an issue though, I meant performance in terms of memory)

@cmdcolin
Copy link
Collaborator

cmdcolin commented Sep 11, 2019

I'll just ask a couple more questions

  1. Is there anything particular about this library (bbi-js) that makes this particularly well suited to your app? Do you do these conversions on the fly? I am imagining for a large data ingestion, I would probably just convert to bedgraph or regular wig and stream that into your data warehouse/hdf5/loompy
  2. Do you have interest in implementing this yourself? I had another request similar to this here Possible unnecessary data copying from remotefile/blobfile generic-filehandle#20 and I'd love to see progress on it but until it becomes a bottleneck for my use cases (primarily genome browser apps) it's hard for me to push it into the priority queue

@cmdcolin
Copy link
Collaborator

The purpose is compatibility with https://github.com/romgrk/node-loompy ? so it keeps it all in the js ecosystem?

@romgrk
Copy link
Author

romgrk commented Sep 12, 2019

Yes, we're using that module that we wrote to keep it all in JS.

For your points:

  1. We have tons (not sure how many but >10,000, maybe >100,000) of bigWig tracks and those files are provided to us in that format and we need them that way for other purposes, so it's not practical to convert them and have them in both formats as we would run out of space.
  2. Sure, I'll try to find some time and open a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants