FPGA-based acceleration of transposed convolution
- Input padding + convolution
- Removal of input buffer
- removal of input buffer
- tiling output channels and input/output dim
- unrolling in by 256 in input channel dim
- unrolling in by 256 in output channel dim
- Multiple AXI ports
- 512-bit for input and kernel
- 256-bit for output and bias (due to channel tile size of 16)
- unrolling in by 256 in input channel dim
- Multiple AXI ports
- Increase tile size in output channel to 32
- 512-bit for all ports
- all v2.1 optimizations
- additional unrolling by one kernel dim
- input 512-bit stream
- bias 16-bit stream
- kernel 512-bit stream
- output 256-bit stream
- output 512-bit stream
- output channel tile size 32
- conditional temp assignment within compute loop
- deeper streams
- bias 512-bit stream
- unrolling in kernel dimension