Skip to content

Sparrow is a high throughput, low latency, and fault-tolerant distributed cluster scheduler. Sparrow is designed for applications that require resource allocations frequently for very short jobs, such as analytics frameworks. Sparrow schedules from a distributed set of schedulers that maintain no shared state. Instead, to schedule a job, a sched…

License

Notifications You must be signed in to change notification settings

vishalsingh8989/sparrow

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparrow Scheduler

Sparrow is a high throughput, low latency, and fault-tolerant distributed cluster scheduler. Sparrow is designed for applications that require resource allocations frequently for very short jobs, such as analytics frameworks. Sparrow schedules from a distributed set of schedulers that maintain no shared state. Instead, to schedule a job, a scheduler obtains intantaneous load information by sending probes to a subset of worker machines. The scheduler places the job's tasks on the least loaded of the probed workers. This technique allows Sparrow to schedule in milliseconds, two orders of magnitude faster than existing approaches. Sparrow also handles failures: if a scheduler fails, a client simply directs scheduling requests to an alternate scheduler.

To read more about Sparrow, check out our paper.

If you're interested in using Sparrow, we recommend that you join the Sparrow mailing list.

Code Layout

/sparrow/src (maven-style source directory)

/sparrow/src/main/java/edu/berkeley/sparrow (most of the interesting java code)

/sparrow/src/main/java/edu/berkeley/sparrow/examples (Example applications that use Sparrow)

/deploy/ (contains ec2 deployment scripts)

Building Sparrow

You can build Sparrow using Maven:

$ mvn compile
$ mvn package -Dmaven.test.skip=true

Sparrow and Spark

We have ported Spark, an in-memory analytics framework, to schedule using Sparrow. To use Spark with Sparrow, you'll need to use the our forked version of Spark, which includes the Sparrow scheduling plugin.

Research

Sparrow is a research project within the U.C. Berkeley AMPLab. The Sparrow team, listed roughly order of height, consists of Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Please contact us for more information.

About

Sparrow is a high throughput, low latency, and fault-tolerant distributed cluster scheduler. Sparrow is designed for applications that require resource allocations frequently for very short jobs, such as analytics frameworks. Sparrow schedules from a distributed set of schedulers that maintain no shared state. Instead, to schedule a job, a sched…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.7%
  • Java 43.1%
  • Shell 5.2%
  • Thrift 1.0%