0.11.1a2
Pre-release
Pre-release
This release is a weekly alpha version of PaddlePaddle. It should be only used for internal tests. This is not a production-ready version.
Release log
Performance gain and memory optimization
Config and Env:
- model: SE-ResNet-150
- Input: 3 x 224 x 224
- batch_size: 25
- CentOS 6.3, Tesla P40, single card.
The comparison results before optimization:
Speed | Memory | |
---|---|---|
Fluid(before) | 1.95 sec/iter | 18341 MB |
PyTorch | 1.154 sec/iter | 13359 MB |
Fluid/PyTorch | 1.6898 | 1.3729 |
After optimizing the speed:
Speed | Memory | |
---|---|---|
Fluid(opti_speed) | 1.45 sec/iter | 17222 MB |
PyTorch | 1.154 sec/iter | 13359 MB |
Fluid/PyTorch | 1.2565 | 1.2892 |
After optimizing the memory usage:
Speed | Memory | |
---|---|---|
Fluid(opti_mem) | 1.93 sec/iter | 14388 MB |
PyTorch | 1.154 sec/iter | 13359 MB |
Fluid/PyTorch | 1.6724 | 1.0770 |
- Overall performance gain.
- Details issue: #8990
- Delete GPU memory while training.
- [WIP] Feed data from C++
- Add basic RecordIO API
- Polish C++ Reader operators
- Add DoubleBuffer Reader
Distributed training
- now support distributed sparse update
- [WIP] send recv using zerocopy grpc transfer