Add ZeroRedundancyOptimizer to chapters 2 & 3

Docs: https://pytorch.org/docs/2.4/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer

```python
optimizer = ZeroRedundancyOptimizer(
        model.parameters(),
        optimizer_class=torch.optim.AdamW,
        lr=args.lr,
        fused=True
)
```

Very easy to use and immediately reduces memory usage.