PaddlePaddle 1.1.0
Release Notes
Major New Features and Improvements
Framework
-
Memory optimization strategy "eager deletion" now supports sub-block in control flow operators (e.g. if-else, while). Significantly reduce memory consumption of models with control flow operators.
-
Optimize split operator, significantly improve performance.
-
Extend multiclass_nms operator, supports polygon bounding box.
-
Added generate_proposals operator CUDA implementation, significantly improve performance.
-
Support fusing affine_channel operator and batch_norm operator, significantly improve performance.
-
Optimize depthwise_conv operator, significantly improve performance.
-
Optimize reduce_mean operator, significantly improve performance.
-
Optimize sum operator, significantly improve performance.
-
Optimize top_k operator, significantly improve performance.
-
Added new sequence_slice operator. For a sequence, slice sub-sequence based on specified start and length.
-
Added new sequence_unpad operator. Support padding Tensor to LoDTensor conversion.
-
Added new sequence_reverse operator. roi_align operator, affine_channel operator.
Server Inference
-
Added avx, noavx auto switch feature, allow major models to automatically switch among avx, avx2, avx512.
-
Improve inference usability: Only need to include 1 header and 1 library.
-
Significantly improve inference performance.
Mobile Inference
-
Added Mali GPU and Andreno GPU support for mobilenet v1 model.
-
Added ZU5, ZU9 FPGA support for resnet34 and resnet50 models.
发布日志
主要新功能和优化
基础框架
-
显存优化策略eager deletion支持control flow (e.g. if-else, while)中子block的优化。显著降低包含control flow的模型的显存开销。
-
优化了split operator,显著提升性能。
-
扩展multiclass_nms operator,支持多边形的预测框。
-
新增generatoe_proposals operator的CUDA实现,显著提升性能。
-
通过affine_channel operator融合batch_norm operator,显著提升性能。
-
优化depthwise_conv operator的forward和backward,显著提升性能。
-
优化reduce_mean operator。
-
优化sum operator,该operator在输入是Tensor的情况下,减少一次zero memory耗时。
-
优化top_k operator,显著提升性能。
-
新增sequence_slice operator,对于一个sequence,可以从指定位置开始,slice出指定长度的subsequence。
-
新增sequence_unpad operator,支持padding Tensor转LoDTensor。
-
新增sequence_reverse operator,roi_align operator,affine_channel operator。
服务端预测
- 增加了部署时 AVX 和 NOAVX 自动切换的feature,可以针对重点模型实现AVX, AVX2, AVX512自动切换
- 提升预测库易用性:只需要 include一个头文件和一个库。
- ICNet 预测性能大幅提升。
移动端预测
- 新增Mali GPU和Andreno GPU上mobilenet v1模型支持。
- 新增ZU5、ZU9等FPGA开发板上resnet34和resnet50模型支持。