Request: iteration function without allocation #28

romgrk · 2019-09-11T16:04:02Z

Hey,

So for performance reasons would it be possible to implement an API that doesn't allocate an object for each entry? It could look something like this:

async function fillBuffer() {
  const ti = new BigWig({ path: 'volvox.bw' })
  const header = await ti.getHeader()
  const length = header.refsByNumber[0].length
  const buffer = Buffer.allocate(length)
  await ti.iterate('chr1', 0, length, { scale: 1 }, (start, end, score) => {
    for (let position = start; position < end; position++)
      buffer.writeFloatLE(score, position * 4)
  })
  return buffer
}

cmdcolin · 2019-09-11T16:29:24Z

I have heard of this technique referred to as "bring your own buffer"...It may be possible to do this. Do you have significant evidence of the performance degradation?

I can see you are using scale: 1 so that would probably be intensive across the whole length of a chromosome. You could consider using one of the other reductionLevels to make this involve less data, probably would be faster, but if you require the lowest scale then I can see that would be resource intensive.

romgrk · 2019-09-11T17:48:54Z

Yes, we indeed require to use scale: 1. We're converting bigWig files into loompy files for an implementation of the ga4gh-rnaseq API, and we need to fill a buffer with every value, and this is for multiple tracks at once so we're filling lots of buffers with lots of entries. The API also allows for returning multiple tracks of the whole bigWig file, I'm pretty sure that any saved allocation can decrease the memory-cost of the process. (speed is not an issue though, I meant performance in terms of memory)

cmdcolin · 2019-09-11T22:11:06Z

I'll just ask a couple more questions

Is there anything particular about this library (bbi-js) that makes this particularly well suited to your app? Do you do these conversions on the fly? I am imagining for a large data ingestion, I would probably just convert to bedgraph or regular wig and stream that into your data warehouse/hdf5/loompy
Do you have interest in implementing this yourself? I had another request similar to this here Possible unnecessary data copying from remotefile/blobfile generic-filehandle#20 and I'd love to see progress on it but until it becomes a bottleneck for my use cases (primarily genome browser apps) it's hard for me to push it into the priority queue

cmdcolin · 2019-09-11T22:17:53Z

The purpose is compatibility with https://github.com/romgrk/node-loompy ? so it keeps it all in the js ecosystem?

romgrk · 2019-09-12T21:24:17Z

Yes, we're using that module that we wrote to keep it all in JS.

For your points:

We have tons (not sure how many but >10,000, maybe >100,000) of bigWig tracks and those files are provided to us in that format and we need them that way for other purposes, so it's not practical to convert them and have them in both formats as we would run out of space.
Sure, I'll try to find some time and open a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: iteration function without allocation #28

Request: iteration function without allocation #28

romgrk commented Sep 11, 2019

cmdcolin commented Sep 11, 2019

romgrk commented Sep 11, 2019 •

edited

Loading

cmdcolin commented Sep 11, 2019 •

edited

Loading

cmdcolin commented Sep 11, 2019

romgrk commented Sep 12, 2019

Request: iteration function without allocation #28

Request: iteration function without allocation #28

Comments

romgrk commented Sep 11, 2019

cmdcolin commented Sep 11, 2019

romgrk commented Sep 11, 2019 • edited Loading

cmdcolin commented Sep 11, 2019 • edited Loading

cmdcolin commented Sep 11, 2019

romgrk commented Sep 12, 2019

romgrk commented Sep 11, 2019 •

edited

Loading

cmdcolin commented Sep 11, 2019 •

edited

Loading