The key contributions are:
- GAS (gather, apply, scatter) programming model
- Using vertex cut instead of edge cut to layout data for power-law graphs
- Balancing computation & minimizing communication
- Introduction
- Graph-Parallel Abstractions
- Pregel
- GraphLab
- Characterization
- Challenges of Natural Graphs
- PowerGraph Abstraction
- GAS Vertex-Programs
- Delta Caching
- Initiating Future Computation
- Bulk Synchronous Execution
- Asynchronous Execution
- Comparison with GraphLab/Pregel
- Distributed Graph Placement
- Balanced p-way Vertex-Cut
- Greedy Vertex-Cuts
- Abstraction Comparison
- Computation Imbalance
- Communication Imbalance
- Runtime Comparison
- Implementation and Evaluation
- Graph Loading and Placement
- Synchronous Engine (Sync)
- Asynchronous Engine (Async)
- Async. Serializable Engine (Async+S)
- Fault Tolerance
- MLDM Applications
- Related Work
- Conclusions and Future Work
- Background 1: Natural Graphs
- Graphs IRL (e.g., social networks/the Internet) follow a power-law degree distribution
- A small subset of the vertices have very high degrees, while most vertices have a small degree
- Existing graph-parallel frameworks depend on a balanced degree distribution for performance
- Graphs IRL (e.g., social networks/the Internet) follow a power-law degree distribution
- Background 2: Existing frameworks (Pregel, GraphLab) cannot handle natural graphs well
- Work balancing: Existing graph-parallel frameworks treat vertices symmetrically and have storage/communication/computation costs linear in degree
- Partitioning: Pregel/GraphLab depends on partitioning the graph, which is hard to do in natural graphs. Their solution, random partitioning, is bad.
- Communication/storage: Major bottlenecks at high-degree vertices due to the skewed distribution
- Computation: Existing frameworks do not parallelize individual vertex programs, limiting their scalability in skewed graphs
- Gather: Information from adjacent vertices/edges is reduced by a generalized "sum" operation (commutative and associative)
- Apply: The gathered sum is used with the current value to update the current vertex value
- Scatter: The new value is used to update data on adjacent edges
- Edge-Cuts
- Every vertex is placed on a machine, and edges span across machines
- If adjacent vertices are on different machines, they use "ghost" vertices -> changes need to be synchronized to ghosts
- In natural graphs, there are lots of edges spanned across machines; Balanced edge-cut algorithms perform poorly, so GraphLab and Pregel uses randomized placement (bad)
- Every vertex is placed on a machine, and edges span across machines
- Vertex-Cuts
- Every edge is placed on a machine, and vertices may be across machines
- Intuition: The distribution of vertex degree is highly skewed, but the number of vertices adjacent to a given edge is constant (always 2)
- Each vertex is replicated ("mirrors") across the machines where its adjacent edges lie
- This results in a better balance for natural graphs
- Every edge is placed on a machine, and vertices may be across machines
- Delta caching
- At each vertex, the accumulator values are cached, and the scatter function can return a delta value to directly apply to the neighboring cached accumulator.
- If this value is not returned, the neighboring cache is cleared
- Execution model: Sync vs. Async
- Sync (bulk synchronous)
- 3 "minor-steps": Gather for all active vertices -> Apply -> Scatter
- Barrier after each minor-step; Changes are committed at the end of each minor-step and visible on the next
- Async (asynchronous)
- Changes are immediately available to other vertices
- Execute active vertices as cores become available
- Sync (bulk synchronous)