Skip to content

tomquas/miyamoto

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Miyamoto Task Queue

Miyamoto is a fast, clusterable task queue inspired by Google App Engine's task 
queue. This means it speaks HTTP with a RESTful API for enqueuing tasks and 
uses HTTP callbacks (webhooks) for processing tasks. Worker daemons can be any 
web daemon.

End result? Super easy asynchronous processing.

Like the App Engine task queue, Miyamoto features task scheduling and rate 
limiting. Unlike the App Engine task queue, it provides idempotency semantics
and helps you avoid duplicate execution of tasks.

Miyamoto is designed for:
 - High throughput (fast)
 - Fault tolerance (highly available)
 - Data replication (minimal lost messages)
 - No major dependencies or client libs (easy to set up and use)
 - Simple design/implementation (easy to hack!)

Miyamoto is bad if you want:
 - Slow disk persistence (but we may add it)
 - Traditional task polling
 - Smart clients
 - AMQP, JMS, STOMP or other interfaces*

Not out of the box, but you can still achieve:
 - Synchronous tasks
 - Return values / result storage
 - Workers behind firewall / NAT
 - Dynamically adding to the worker pool
 
* = Excluding ZeroMQ, which is used internally, so there are optional ZMQ interfaces
 
GETTING STARTED

Make sure you have Python 2.6+ installed with gevent. Run on port 8088 with:

    INTERFACE=127.0.0.1 python scripts/start.py

USAGE

Once it's running (currently hardcoded to 8088), set up a pretend web server with netcat:

    nc -l 9099

Now let's post a task to it scheduled to run in 5 seconds: 

    curl -d "task.url=http://localhost:9099/worker&task.countdown=5&foo=bar" http://localhost:8088/queue

You should get back some JSON:

    {"status": "scheduled", "id": "7e998f34-bed7-44f4-80cd-49620b0ab562"}

And in 5 seconds, you should see on your listening netcat:

    POST /worker HTTP/1.1
    Accept-Encoding: identity
    Content-Length: 7
    Host: localhost:9099
    Content-Type: application/x-www-form-urlencoded
    X-Task-Id: 7e998f34-bed7-44f4-80cd-49620b0ab562
    Connection: close
    User-Agent: miyamoto/0.1
    
    foo=bar

You can also add tasks as JSON objects. Here's one without a delay:

    curl -H "Content-Type: applicatoin/json" -d '{"url": "http://localhost:9099/worker", "params": {"foo": "bar"}}' http://localhost:8088/queue
   

WHAT'S GOING ON?

In theory, you'd run a cluster of Miyamoto nodes. Each one is exactly the same, 
except that one will be the leader. You seed the leader with the first node, but
otherwise, leaders are elected if, say, the leader dies. Internally there is a list
of hosts in the cluster that form a ring. Whenever you add tasks to any node, it 
will send the task round robin to another node in the cluster for basic load 
balancing. But it will do this N times where N is a replication factor. Each replicant
has an added delay to execution. If any of them run, the others are canceled.
In this way you can schedule tasks and not worry that hosts might go down, achieving
HA of service and data. 

ADVERTISED FEATURES NOT YET DONE

- rate limiting
- idempotency safeguards
- error handling

INSPIRATION

App Engine - API, features
Celery - features, anti-features
Memcache - ideas around consistent hashing
Membase - replication, clustering
ZooKeeper - realtime group membership, leader election
Kestrel - speed, simplicity
ZeroMQ - oh, the possibilities

CONTRIBUTORS

Evan Cooke - offset replica scheduling idea
JJ Behrens - code review, feedback

LICENSE

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published