Skip to content

Making Buddy both async and sync

Andreas edited this page May 18, 2016 · 7 revisions

When you write a library for all Haxe targets, including Node.js, which is a special variant of the js target, things can get a bit tricky.

Node is designed for asynchronous behavior, and it has a nice and simple "error-first callback" pattern for that. An in-depth explanation is available here, but essentially the last argument to an async method should follow these two rules:

  1. The first argument of the callback is reserved for an error object. If an error occurred, it will be returned by the first err argument.
  2. The second argument of the callback is reserved for any successful response data. If no error occurred, err will be set to null and any successful data will be returned in the second argument.

Using such a method in Haxe will look like:

customers.getById(123, function(err, customer) {
    if(err != null) { trace(err); return; }
    trace(customer.email);
});

This could lead to callback hell:

function done(err) {
    // Handle errors and move on
}

customers.getById(123, function(err, customer) {
    if(err != null) return done(err);
    orders.getByCustomerId(customer.id, function(err, orders) {
        if(err != null) return done(err);
        for(order in orders) {
            // Only three levels? Keep going...
        }
    });
});

Fortunately Haxe, with it's macro magic, can transform these into:

var err, customer = @async customers.getById(123);
if(err != null) return done(err);

var err, orders = @async orders.getByCustomerId(customer.id);
if(err != null) return done(err);

for(order in orders) {
    // Better
}

This is thanks to the asynctools library (originating from haxe-js-kit, thank you!), and with a callback for error handling, you can make it even simpler:

function done(err) {
    // Handle errors
}

var err, customer = @async(err => done) customers.getById(123);
// if(err != null) return done(err); is injected here by the macro

var err, orders = @async(err => done) orders.getByCustomerId(customer.id);
// Same here

for(order in orders) {
    // Better
}

That takes care of callbacks quite nicely. You might use promises instead, if you like return values and a more verbose syntax. I like to keep it clean.

The next problem is the for loop just introduced:

for(order in orders) {
    // What happens here?
    var err = @async(err => done) sendReminderEmail(order);
}

Since sendReminderEmail is asynchronous, it will return immediately and the loop will process all the emails in parallel, as fast as possible. This could be useful for sending email, but in Buddy I need to wait for each test to complete before running the next. Parallel test execution would be too messy.

Fortunately, asynctools to the rescue again:

using AsyncTools;

var err = @async(err => done) orders.aEachSeries(function(order, done2) {
    sendReminderEmail(order, done2);
});

// All email successfully sent in series here

Let's sync

At this point I thought that everything was fine, just plan for asynchronous from the start, implement as above and you're set! Unfortunately I got a feeling that something was wrong when I saw a stack trace from a larger test suite. It was hundreds, even thousands of levels deep! And for larger test suites, it crashed on many targets. C#, Java, PHP, it didn't matter.

I eventually realized that most tests in the test suites were synchronous. And this is how the asynctools library works, simplified:

function aEach(items, itemFunction, done) {
    var it = items.iterator();
    function next() {
        if(!it.hasNext()) return done(null);
        itemFunction(it.next(), function(err) {
            // We're back from the user-supplied itemFunction
            if(err != null) done(err);
            else next(); // Looping... with a twist
        });
    }
    next(); // Start everything
}

When things are async, as in javascript for Node and the browser, everything's fine, the methods will return immediately. But on many other targets, or for any synchronous itemFunction for that matter, what happens? next() will be called recursively. The loop behavior for async has turned into a stack-eating monster when things are running synchronously!

This was a tough revelation, because Buddy can mix sync and async anywhere:

public function new() {
    describe("Mixing sync and async", {
        it("is convenient", function(done) { // Now async, with a callback
            makeHttpRequest('http://slowsite.com', function(response) {
                response.status.should.be(200);
                done();
            });
        });

        it("should not crash with stack overflow", { // Back to sync
            haxe.CallStack.callStack().length.should.not.be(Math.INF);
        });

        describe("With multiple describe levels", {
            it("should still not crash", {
                // The stack-eating monster lives here
            });
        });
    });
}

Combine that with all the before/after functions that should be run on each test execution, async or sync:

beforeAll, beforeEach, afterEach, afterAll

...try to do that without popping the stack, and we're heading into trouble.

What to do? I really wanted to keep the convenience of defining the tests any way you like. Separating the code is not easy, all it takes is one async test in the middle of 50 sync tests, and you need to treat the subsequent tests asynchronously.

A pattern emerges

After some time, the outlines of a pattern started to emerge:

function testsDone(err) {
    // We're done
}

var result = runTests(tests, function(err) {
    if(err != null) trace('error: $err');
    testsDone(err);
});
// If we're running sync, we get a result back:
if(result != null) {
    if(result.err != null) trace('error: $err');
    testsDone(result.err);
}

And by factorizing the callback method:

function testsDone(err) {
    // We're done
}

function runTestsCompleted(err) {
    if(err != null) trace('error: $err');
    testsDone(err);
}

var result = runTests(tests, runTestsCompleted);
if(result != null) runTestsCompleted(result.err);

Could this be made to work? It's up to the runTests implementation.

var results = [];

function runTests(tests : Iterable<Test>, done : Error -> Void) : Null<{err: Null<Error>}> {
    var isAsync = tests.exists(function(t) return t.isAsync);

    return if(isAsync) {
        tests.aEachSeries(tests, function(test, done) {
            // Processing the test here...
            test.run(function(err, result) {
                if(err != null) results.push(result);
                done(err); // aEachSeries
            });
        }, done); // runTests
    } else {
        var returnValue = {err: null};

        // A synchronous loop, no problem with the stack.
        for(test in tests) {
            test.run(function(err, result) {
                if(err != null) returnValue = {err: err};
                else results.push(result);
            });
            if(returnValue.err != null) return returnValue;
        });

        returnValue;
    }
}

There is a code split, and differences in implementation, but there's not much to do about it. Async and sync simply don't mix! But it works!

There can surely be optimizations, and some are made in Buddy, for example if the first N tests are synchronous, you can run them first. You could split the tests into two queues, run all the sync tests first, then asyncs (if they exist). But this shows it's at least possible to handle a mixed sequence of async and sync tests, without being eaten by monsters like Zalgo.

The Zalgo article is written for Node, which has some additional measures like process.nextTick, but for an "all targets" library in Haxe we don't have that luxury. So common sense has to be applied, and careful consideration of things. Ending with a quote by Trygve Reenskaug, the MVC and DCI author, and surely many others:

Programming is hard.

Clone this wiki locally