- Personal Blog - Index
- How to Create Picture-in-Picture Effect / Video Overlay for a Presentation Video
- How to Do Your Part to Protect the Environment in Wisconsin
- How to Get a Driver's License in Wisconsin
- How to Travel from the U.S. to China onboard AA127 in June 2021
- How to Transfer Credits Back to UW-Madison
- Resources on Learning Academic Writing (for Computer Science)
- Towards applying to CS Ph.D. programs
- Machine Learning Systems - Index
- MLSys Papers - Short Notes
- [2011 NSDI] Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
- [2014 OSDI] Scaling Distributed Machine Learning with the Parameter Server
- [2018 OSDI] Gandiva: Introspective Cluster Scheduling for Deep Learning
- [2018 SIGCOMM] Chameleon: Scalable Adaptation of Video Analytics via Temporal and Cross-camera ...
- [2018 NIPS] Dynamic Space-Time Scheduling for GPU Inference
- [2019 ATC] Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
- [2019 NSDI] Tiresias: A GPU Cluster Manager for Distributed Deep Learning
- [2019 SOSP] ByteScheduler: A Generic Communication Scheduler for Distributed DNN Training ...
- [2019 SOSP] PipeDream: Generalized Pipeline Parallelism for DNN Training
- [2019 SOSP] Parity Models: Erasure-Coded Resilience for Prediction Serving Systems
- [2019 NIPS] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
- [2019 SC] ZeRO: memory optimizations toward training trillion parameter models
- [2020 OSDI] Gavel: Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
- [2020 OSDI] AntMan: Dynamic Scaling on GPU Clusters for Deep Learning
- [2020 OSDI] BytePS: A High Performance and Generic Framework for Distributed DNN Training
- [2020 SIGCOMM] Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics
- [2020 EuroSys] AlloX: Compute Allocation in Hybrid Clusters
- [2020 VLDB] PyTorch Distributed: Experiences on Accelerating Data Parallel Training
- [2020 NetAI] Is Network the Bottleneck of Distributed Training?
- [2020 NSDI] Themis: Fair and Efficient GPU Cluster Scheduling
- [2021 MLSys] Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
- [2021 VLDB] Analyzing and Mitigating Data Stalls in DNN Training
- [2021 FAST] CheckFreq: Frequent, Fine-Grained DNN Checkpointing
- [2021 EuroMLSys] Interference-Aware Scheduling for Inference Serving
- [2021 OSDI] Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
- [2021 MLSys] Wavelet: Efficient DNN Training with Tick-Tock Scheduling
- [2021 NSDI] SwitchML: Scaling Distributed Machine Learning with In-Network Aggregation
- Big Data Systems - Index
- Big Data Systems Papers - Short Notes
- [2003 SOSP] The Google File System
- [2004 OSDI] MapReduce: Simplified Data Processing on Large Clusters
- [2010 SIGMOD] Pregel: A System for Large-Scale Graph Processing
- [2011 NSDI] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- [2012 NSDI] Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster ...
- [2012 OSDI] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
- [2019 FAST] DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed...
- [2021 HotOS] From Cloud Computing to Sky Computing
- [2021 EuroSys] NextDoor: Accelerating graph sampling for graph machine learning using GPUs
- High Performance Computing Course Notes
- Lecture 1: Course Overview
- Lecture 2: From Code to Instructions. The FDX Cycle. Instruction Level Parallelism.
- Lecture 3: Superscalar architectures. Measuring Computer Performance. Memory Aspects.
- Lecture 4: The memory hierarchy. Caches.
- Lecture 5: Caches, wrap up. Virtual Memory.
- Lecture 6: The Walls to Sequential Computing. Moore’s Law.
- Lecture 7: Parallel Computing. Flynn's Taxonomy. Amdahl's Law.
- Lecture 8: GPU Computing Intro. The CUDA Programming Model. CUDA Execution Configuration.
- Lecture 9: GPU Memory Spaces
- Lecture 10: GPU Scheduling Issues.
- Lecture 11: Execution Divergence. Control Flow in CUDA. CUDA Shared Memory Issues.
- Lecture 12: Global Memory Access Patterns and Implications.
- Lecture 13: Atomic operations in CUDA. GPU ode optimization rules of thumb.
- Lecture 14: CUDA Case Studies. (1) 1D Stencil Operation. (2) Vector Reduction in CUDA.
- Lecture 15: CUDA Case Studies. (3) Parallel Prefix Scan on the GPU. Using Multiple Streams in CUDA.
- Lecture 16: Streams, and overlapping data copy with execution.
- Lecture 17: GPU Computing: Advanced Features.
- Lecture 18: GPU Computing with thrust and cub.
- Lecture 19: Hardware aspects relevant in multi-core, shared memory parallel computing.
- Lecture 20: Multi-core Parallel Computing with OpenMP. Parallel Regions.
- Lecture 21: OpenMP Work Sharing.
- Lecture 22: OpenMP Work Sharing
- Lecture 23: OpenMP NUMA Aspects. Caching and OpenMP.
- Lecture 24: Critical Thinking. Code Optimization Aspects.
- Lecture 25: Computing with Supercomputers.
- Lecture 26: MPI Parallel Programming General Introduction. Point-to-Point Communication.
- Lecture 27: MPI Parallel Programming Point-to-Point communication: Blocking vs. Non-blocking sends.
- Lecture 28: MPI Parallel Programming: MPI Collectives. Overview of topics covered in the class.
- Cloud Computing Course Notes
- Operating Systems Papers - Index
- CS 736 @ UW-Madison Fall 2020 Reading List
- All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications
- ARC: A Self-Tuning, Low Overhead Replacement Cache
- A File is Not a File: Understanding the I/O Behavior of Apple Desktop Applications
- Biscuit: The benefits and costs of writing a POSIX kernel in a high-level language
- Data Domain: Avoiding the Disk Bottleneck in the Data Domain Deduplication File System
- Disco: Running Commodity Operating Systems on Scalable Multiprocessors
- FFS: A Fast File System for UNIX
- From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees
- LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation
- LFS: The Design and Implementation of a Log-Structured File System
- Lottery Scheduling: Flexible Proportional-Share Resource Management
- Memory Resource Management in VMware ESX Server
- Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks
- NFS: Sun's Network File System
- OptFS: Optimistic Crash Consistency
- RAID: A Case for Redundant Arrays of Inexpensive Disks
- RDP: Row-Diagonal Parity for Double Disk Failure Correction
- Resource Containers: A New Facility for Resource Management in Server Systems
- ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay
- Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism
- SnapMirror: File-System-Based Asynchronous Mirroring for Disaster Recovery
- The Linux Scheduler: a Decade of Wasted Cores
- The Unwritten Contract of Solid State Drives
- Venti: A New Approach to Archival Storage
- Earlier Notes