Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting an ArrayOfBufferBackedObjects doesn't actually work? #18

Open
ibsusu opened this issue Jul 31, 2021 · 7 comments
Open

Sorting an ArrayOfBufferBackedObjects doesn't actually work? #18

ibsusu opened this issue Jul 31, 2021 · 7 comments

Comments

@ibsusu
Copy link

ibsusu commented Jul 31, 2021

Should I create my own sort function in order to make this work? The built in ArrayOfBufferBackedObjects sort doesn't actually sort the buffer.

// worker.js
importScripts(
  "comlink.js",
  "buffer-backed-object.modern.js" // this had to be modified becasue of the exports to properly name AOBBO, BBO, and structSize
); 

function testSort(buf, key){
  console.log("sort buf", buf);
  const descriptors = {
    name: BufferBackedObject.UTF8String(90),
    otherKey: BufferBackedObject.UTF8String(10)
  };

  const boar = new ArrayOfBufferBackedObjects(buf, descriptors);
  console.log("worker sort boar", boar[0].name, boar[1].name);
  function compare(itemA, itemB){
      if (itemA[key] < itemB[key]){
        return -1;
      }
      if (itemA[key] > itemB[key]){
        return 1;
      }
      return 0;
  }

  let u8 = new Uint8Array(boar.buffer);
  console.log("worker buffer u8 before sort", u8.slice(100));

  boar.sort(compare);
  console.log("worker boar 0 and 1", boar[0].name, boar[1].name);

  console.log("worker buffer u8 after sort", u8.slice(100));

  // return buf;//Comlink.transfer(buf, [buf]);
  return Comlink.transfer(boar.buffer, [boar.buffer]);
}
//ui.js

let namePromises = [];
for (let i=0;i<1000;++i){
    namePromises.push(utils.getRandomName());
}

let names = await Promise.all(namePromises);
let items = [];
console.log("creating items");
let start = window.performance.now();
for(let i=0;i<names.length;++i){
    items.push({otherKey: "yay", name: names[i]});
}

console.log(`finished creating items in ${(window.performance.now() - start)/1000} seconds`);

let bbos = [];
console.log("creating bbos");
start = window.performance.now();
let descriptors = {
    name: BufferBackedObject.UTF8String(90),
    otherKey: BufferBackedObject.UTF8String(10)
};

const descriptorSize = window.structSize(descriptors);
const buf = new ArrayBuffer(descriptorSize*names.length);
const boar = new ArrayOfBufferBackedObjects(buf, descriptors);

const buf2 = new ArrayBuffer(descriptorSize*names.length);
let boar2 = new ArrayOfBufferBackedObjects(buf2, descriptors);

for(let i=0;i<names.length;++i){
    // sorted on this thread
    const bbo = boar[i];
    bbo.name = names[i];
    bbo.otherKey = "yay";

    // sorted on the worker
    const bbo2 = boar2[i];
    bbo2.name = names[i];
    bbo2.otherKey = "yay";
}

console.log(`finished creating bbos in ${(window.performance.now() - start)/1000} seconds`);

function compare(itemA, itemB){
    if (itemA.name < itemB.name){
      return -1;
    }
    if (itemA.name > itemB.name){
      return 1;
    }
    return 0;
}

// sorted just fine.
console.log("starting the item sort");
start = window.performance.now();
let sorted = items.sort(compare);
console.log(`finished sorting in ${(window.performance.now() - start)/1000} seconds`);

// It's sorted but the buffer remains the same
console.log("starting the bbo sort");
start = window.performance.now();
let sorted2 = boar.sort(compare);
console.log(`finished sorting in ${(window.performance.now() - start)/1000} seconds`);

let worker = new Worker("worker.js");
let bboWorker = Comlink.wrap(worker);

// this is never sorted because the buffer isn't changed on the worker thread.
console.log("starting the bbo2 sort");
start = window.performance.now();
let boar2Check = await bboWorker.testSort(Comlink.transfer(boar2.buffer, [boar2.buffer]), "name");
console.log(`finished sorting in ${(window.performance.now() - start)/1000} seconds`);
@surma
Copy link
Collaborator

surma commented Jul 31, 2021

Interesting. Yeah that is possible.

Do you mind writing a test/example that is a lot more reduced (for example without Comlink). The buffer-backed-object.test.js file has a lot of examples.

@ibsusu
Copy link
Author

ibsusu commented Jul 31, 2021

I was trying to make a pull request but it wants me to sign something. Here you go.

 it("can be sorted", function () {
    // Copyright: user ibsusu on github.com
    // License: I don't care what you do with the code below. You have all rights to the below code and the rights will never be revoked by ibsusu or anyone else.
    // I'm not signing the contributor license agreement.
    // I just want sorting to work.

    const descriptors = {
      firstKey: BufferBackedObject.UTF8String(80),
      middleKey: BufferBackedObject.ArrayBuffer(10),
    };

    const descriptorSize = structSize(descriptors);

    const randomNames = [
      "bill",
      "mike",
      "dave",
      "kat",
      "brian",
      "adam",
      "sammy",
      "martin",
      "yanaris",
    ];

    // the expected order of the names
   const firstKeyOrderedNames = [
      "adam",
      "bill",
      "brian",
      "dave",
      "kat",
      "martin",
      "mike",
      "sammy",
      "yanaris",
    ];

    const buffer = new ArrayBuffer(descriptorSize * randomNames.length);
    const boar = new ArrayOfBufferBackedObjects(buffer, descriptors);

    // this may be unnecessary for testing
    let randomNumberArrays = [
      [61, 129, 35, 3, 65, 151, 190, 139, 89, 212],
      [43, 196, 83, 213, 118, 216, 142, 132, 220, 85],
      [223, 245, 232, 30, 114, 42, 254, 129, 112, 59],
      [159, 73, 73, 38, 199, 212, 60, 100, 181, 248],
      [22, 197, 38, 37, 46, 139, 77, 202, 181, 183], // dupe
      [178, 69, 33, 61, 59, 53, 116, 117, 66, 81],
      [232, 97, 56, 20, 140, 122, 162, 88, 242, 58],
      [24, 147, 73, 201, 205, 253, 200, 191, 99, 183],
      [22, 197, 38, 37, 46, 139, 77, 202, 181, 183], // dupe
      [190, 154, 53, 52, 94, 109, 131, 101, 43, 172],
      [168, 220, 117, 172, 64, 219, 129, 80, 207, 43],
    ];

    // fill in the BBOs
    for (let i = 0; i < randomNames.length; ++i) {
      // fill the strings
      const bbo = boar[i];
      bbo.firstKey = randomNames[i];

      // fill the arraybuffers
      // let dataView = new DataView(bbo.middleKey);
      // for(let j=0;j<10;++j){
      //   dataView.setUint8(j, randomNumberArrays[i][j]);
      // }
    }

    // our compare function
    function compareFirst(itemA, itemB) {
      if (itemA.firstKey < itemB.firstKey) {
        return -1;
      }
      if (itemA.firstKey > itemB.firstKey) {
        return 1;
      }
      return 0;
    }

    // sort it and get the list of names
    boar.sort(compareFirst);
    let sortedNames = boar.map((bbo) => { return bbo.firstKey;});
    console.log("sortedNames", sortedNames); 

    // check that the boar is sorted via the normal 'get' routine
    for (let i = 0; i < sortedNames; ++i) {
      expect(sortedNames[i]).toBe(firstKeyOrderedNames[i]);
    }

    // check that the boar is sorted via pulling directly from the buffer.
    for (let i = 0; i < boar.length; ++i) {
      let nameInBuffer = new TextDecoder()
        .decode(new Uint8Array(boar.buffer, i * descriptorSize, 80))
        .replace(/\u0000+$/, "");
      console.log("name in buffer", nameInBuffer);
      expect(nameInBuffer).toBe(firstKeyOrderedNames[i]);
    }
  });

@surma
Copy link
Collaborator

surma commented Aug 2, 2021

Thanks for that! This was really helpful.

So you are right. ArrayOfBufferBackedObject (AOBBO) creates an array full of proxies. The sorting works in that the proxies are being reordered inside the array, but this shuffling is not mirrored to the underlying ArrayBuffer. I’ll have a think on how to best handle this. I might just special-case sort() for now.

(cc @developit in case he has any ideas)

@developit
Copy link

One option would be to instead provide a Set-compatible API rather than an Array lookalike. Arrays are tricky to proxy with support for push(), splice(), .length=n, etc.

It's also interesting to consider the performance implications and potential that a Set derivative would have. Removals could be implemented using copyWithin, and has() could search only at each byte offset within the backing buffer to compare the buffers directly.

One thing I ran into with someone who was using the AOBBO setup from this library was that assignment of whole objects to an Array index doesn't work (which is reasonable, but the Array API surface makes it less obvious that this will be the case).

@ibsusu
Copy link
Author

ibsusu commented Aug 2, 2021

got assignment working. check this gist for the line.
https://gist.github.com/ibsusu/9a3d902bad83d23cb3b50c395e9fa1e6#file-buffer-backed-objects-js-L241

The funny thing is that using TimSort on this doesn't work. I think the reason is that assignment of the BBO to a tmp variable is essentially just creating a reference to that location in the ArrayBuffer rather than a copy of the Uint8Array and the underlying buffer. I'm not sure how to handled this tbh.

@ibsusu
Copy link
Author

ibsusu commented Aug 2, 2021

Got sorting working. The implementation is a bit different from the original so just pull what you need from it or use it all. It passes all of the original tests + the sorting test. it utilizes a custom sort but any(?) sort will work so long as any pivot or swap value is modified to use a separate buffer.

https://gist.github.com/ibsusu/cc95a8f7aed4a1d58b54c1db256bcb79

I'm done, going to sleep.

@ibsusu
Copy link
Author

ibsusu commented Aug 4, 2021

Alright, last iteration since I have a passable implementation working in my development code now. There are some things to note regarding functionality:

  1. The Constructor of the AOBBO now optionally takes an object that implements a sort function of some kind. This makes it easy to have a webworker handle the sorting of the AOBBO and pass it back - but the sort method is now async.
  2. The AOBBO.sort() can optionally take a string of a key or a comparison function. Comparison functions are handled in the thread context they're created because the webworker can't utilize them without serializing and eval. If a string is passed in it will be used as the lookup for the text encoded bytes of the descriptor and processed in a webworker handler if it exists.
  3. when the webworker finishes its sorting it transfers the buffer back, the dataview is recreated and all of the Proxied Uint8Arrays are deleted so they can be remade again upon their individual access.

all the files + some simple benchmarking
https://gist.github.com/ibsusu/0c424504b206949f809516b04e82de5d

Just some thoughts here:

  1. Proxies are weird and I don't know how they're different from a class that extends some other class yet.
  2. The webworker approach that I took utilizing only one worker is about 8x faster than using a comparison function on the the originating thread because the first one accesses the ArrayBuffer directly vs accessing it through the get methods.
  3. Sorting 100,000 raw objects with two fields similar to
let descriptors = {
     name: Bbo.UTF8String(90),
     otherKey: Bbo.UTF8String(10)
 };

took 0.1235 seconds with a simple comparison function on the name key. Sorting 100,000 AOBBOs on the originating thread using a comparison function and quicksort took 27.09 seconds, and sorting 100,000 AOBBOs on a webworker using the direct ArrayBuffer accessing TextEncoded string comparison in quicksort on the webworker took 4.3975 seconds so it's still pretty slow.
4. I think splitting the tasks among a pool of workers might make this more worth it but the goal at the moment is just to keep tasks off the ui thread.
5. When I tried creating DataViews in the webworker to do comparison via Uint32s or BigUint64s in the quicksort it was much slower than using the TextEncoder then string comparsion. I don't know why this is but I assume something is being optimized under the hood. It would be nice to have some sort of zero copy way of doing comparisons but I've tried DataViews, subarrays and TextEncoding and TextEncoding won so, ¯_(ツ)_/¯ .

Anyways I'm going to move on in my project now. Thanks for creating this library, it's really interesting. Feel free to use some or none of the code. ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants