Skip to content

Latest commit

 

History

History
60 lines (38 loc) · 3.74 KB

README.md

File metadata and controls

60 lines (38 loc) · 3.74 KB

Quantized Human Action Recognition on edge AMD SoC-FPGAs


This is the official efficient real-time HW/SW Co-design for quantized two-stream CNN of Human Action Recognition (FPGA-QHAR) on PYNQ SoC-FPGAs that is accepted as a conference paper in the IEEE Xplore Digital Library as FPGA-QHAR: Throughput-Optimized for Quantized Human Action Recognition on The Edge, and will be presented in December 2023 at the IEEE 20th International Conference on SmartCommunities: Improving Quality of Life Using AI, Robotics and IoT.

Description & System Overview

This paper proposed an end-to-end efficient customized and quantized Two-Stream HAR SimpleNet-PyTorch CNN architecture trained on UCF101 & UCF24 datasets and implemented as HW/SW co-design on AMD PYNQ SoC-FPGAs using partially streaming dataflow architecture that achieved real-time performance of 24FPS with 81% prediction accuracy on connected camera.

image

Contributions

Fig2 drawio (4)

  • Developed a scalable inference accelerator for QHAR on top of SimpleNet-PyTorch CNN & NetDBFPGA.
  • The developed network accelerator fused all convolutional, batch-norm, and ReLU operations into a single homogeneous layer and utilized the Lucas-Kanade motion flow method to enable an optimized on-chip engine computing on FPGA, while GPU, CPU, and Jetson don't have this capability.
  • Provided a complete open-source framework (training to implementation stack) for QHAR on SoC-FPGA and different hardware platforms.
  • Demonstrated that the small version of UCF101 which is UCF24 datasets effect positively the performance & accuracy, resource utilization, and throughput.
  • The community can build upon our code, explore, and search efficient implementation of Multimodal fusion for comprehensive ADAS with HAR action understanding on low-power FPGAs which are considered as a solution for a wide range of Autonomous applications.

Getting Started

Requirement

  • Local Nvidia GPU or Google Cloud GPU with Colab
  • Linux Ubuntu 18.04
  • Python 3.7+
  • Pytorch v1.12.0+
  • Vivado 2018.3
  • PYNQ framework 2.6
  • AMD SoC-FPGAs Pynq supported (ex: Kria KV260 & ZCU104)

HW/SW training & implementation

  • PyTorch folder for training.
  • HLS folder for the synthesis of the accelerator.
  • PYNQ_Hardware folder for deployment on xilinx SOC-FPGAs having pynq linux.

License

All source code is made available under a BSD 3-clause license. You can freely use and modify the code, without warranty, so long as you provide attribution to the authors. See LICENSE.md for the full license text.

The manuscript text is currently accepted, and will be published soon as a conference paper in the IEEE Xplore Digital Library.

Citation

TBD

Acknowledgments

Inspiration, code snippets, references, etc.