Skip to content

0.11.1a2

Pre-release
Pre-release
Compare
Choose a tag to compare
@reyoung reyoung released this 13 Mar 08:07
· 39846 commits to develop since this release
1f757f5

This release is a weekly alpha version of PaddlePaddle. It should be only used for internal tests. This is not a production-ready version.

Release log

Performance gain and memory optimization

Config and Env:

  • model: SE-ResNet-150
  • Input: 3 x 224 x 224
  • batch_size: 25
  • CentOS 6.3, Tesla P40, single card.

The comparison results before optimization:

  Speed Memory
Fluid(before) 1.95 sec/iter 18341 MB
PyTorch 1.154 sec/iter 13359 MB
Fluid/PyTorch 1.6898 1.3729

After optimizing the speed:

  Speed Memory
Fluid(opti_speed) 1.45 sec/iter 17222 MB
PyTorch 1.154 sec/iter 13359 MB
Fluid/PyTorch 1.2565 1.2892

After optimizing the memory usage:

  Speed Memory
Fluid(opti_mem) 1.93  sec/iter 14388 MB
PyTorch 1.154 sec/iter 13359 MB
Fluid/PyTorch 1.6724 1.0770
  • Overall performance gain.
  • Delete GPU memory while training.
  • [WIP] Feed data from C++
    • Add basic RecordIO API
    • Polish C++ Reader operators
    • Add DoubleBuffer Reader

Distributed training

  • now support distributed sparse update
  • [WIP] send recv using zerocopy grpc transfer