Skip to content

PaddlePaddle 1.1.0

Compare
Choose a tag to compare
@panyx0718 panyx0718 released this 31 Oct 02:48
· 20 commits to release/1.1 since this release
66024e9

Release Notes

Major New Features and Improvements

Framework

  • Memory optimization strategy "eager deletion" now supports sub-block in control flow operators (e.g. if-else, while). Significantly reduce memory consumption of models with control flow operators.

  • Optimize split operator, significantly improve performance.

  • Extend multiclass_nms operator, supports polygon bounding box.

  • Added generate_proposals operator CUDA implementation, significantly improve performance.

  • Support fusing affine_channel operator and batch_norm operator, significantly improve performance.

  • Optimize depthwise_conv operator, significantly improve performance.

  • Optimize reduce_mean operator, significantly improve performance.

  • Optimize sum operator, significantly improve performance.

  • Optimize top_k operator, significantly improve performance.

  • Added new sequence_slice operator. For a sequence, slice sub-sequence based on specified start and length.

  • Added new sequence_unpad operator. Support padding Tensor to LoDTensor conversion.

  • Added new sequence_reverse operator. roi_align operator, affine_channel operator.

Server Inference

  • Added avx, noavx auto switch feature, allow major models to automatically switch among avx, avx2, avx512.

  • Improve inference usability: Only need to include 1 header and 1 library.

  • Significantly improve inference performance.

Mobile Inference

  • Added Mali GPU and Andreno GPU support for mobilenet v1 model.

  • Added ZU5, ZU9 FPGA support for resnet34 and resnet50 models.

发布日志

主要新功能和优化

基础框架

  • 显存优化策略eager deletion支持control flow (e.g. if-else, while)中子block的优化。显著降低包含control flow的模型的显存开销。

  • 优化了split operator,显著提升性能。

  • 扩展multiclass_nms operator,支持多边形的预测框。

  • 新增generatoe_proposals operator的CUDA实现,显著提升性能。

  • 通过affine_channel operator融合batch_norm operator,显著提升性能。

  • 优化depthwise_conv operator的forward和backward,显著提升性能。

  • 优化reduce_mean operator。

  • 优化sum operator,该operator在输入是Tensor的情况下,减少一次zero memory耗时。

  • 优化top_k operator,显著提升性能。

  • 新增sequence_slice operator,对于一个sequence,可以从指定位置开始,slice出指定长度的subsequence。

  • 新增sequence_unpad operator,支持padding Tensor转LoDTensor。

  • 新增sequence_reverse operator,roi_align operator,affine_channel operator。

服务端预测

  • 增加了部署时 AVX 和 NOAVX 自动切换的feature,可以针对重点模型实现AVX, AVX2, AVX512自动切换
  • 提升预测库易用性:只需要 include一个头文件和一个库。
  • ICNet 预测性能大幅提升。

移动端预测

  • 新增Mali GPU和Andreno GPU上mobilenet v1模型支持。
  • 新增ZU5、ZU9等FPGA开发板上resnet34和resnet50模型支持。