Skip to content

Latest commit

 

History

History
87 lines (53 loc) · 5.18 KB

README.md

File metadata and controls

87 lines (53 loc) · 5.18 KB

GPGPU with GLES

A library for performing GPGPU (General Purpose GPU) computations on BeagleBone Black using OpenGL ES.

Originally concevied during Google Summer of Code 2021 for beagleboard.org

Although this README contains most of the necessary details, for more in-depth descriptions and development diary please visit the project blog.

TL;DR:

With this library you can accelerate your computations using built-in SGX GPU onboard the BBB. Beware! GPU <=> CPU transfers induce an overhead so choose your computations wisely!

Motivation for the project was scarcity of heteregenousity on the BBB platform which means that most computations were done either on the CPU or in the PRU or both of them. Meanwhile GPU on the SoC was laying mostly untouched apart from some rare occasions where rendering was required - this is unacceptable and this project aims to change that!

Usually, one would simply use compute shaders and just focus on writing efficient GPU computing code. However, we are limited with OpenGL ES version to 2.0 which does not support this kind of shaders. Therefore, we must do some hacks and trick out GPU that we want to render something while we actually do just computations in our shaders.

The targetted chip is SGX530 (on the BBB) and SGX544 (on the BBAI - TODO: yet untested).

Support for other platforms should follow quite easily (BBB was target of this project therefore it is the preferred platform), assuming the system has some OpenGL ES and EGL libraries preinstalled and has the appropriate video devices (/dev/dri/render*).

Caveat:

For the time being you require a headless dummy plug (similar to this one) to simulate having a display device connected. This is being worked on with Imagination engineers here.

The library structure

This library allows you to call either:

  • a single operation (single-shot)
  • a chain of same/different operations (chain)

Deducting from experience, you will most likely be calling a chain of commands which will be offloaded to the GPU and then collected only after they are all done. Nonetheless, there might be operations which are heavy enough to be executed just once!

Preparing the environment

As mentioned earlier, this will guide you through the BBB-specific steps (stating which steps are common).

Download and flash the image

In order to prepare your environment, you first need to install the BBB image containing the necessary libraries. Flashing is described in detail here.

You can also follow the steps listed here to move the system to on-board eMMC.

You can run below commands either on the host or the target depending on where you want to run the library.

Install prerequisites (on the target board)

sudo apt update && sudo apt install cmake

Clone and build the repository

Clone the repository:

git clone https://github.com/JDuchniewicz/GPGPU-with-GLES

Create the build folder and enter it:

mkdir GPGPU-with-GLES/cmake-build && cd GPGPU-with-GLES/cmake-build

Run the cmake command:

cmake ..

If running on your host, you need to specify EGL and OpenGL ES libraries directories:

cmake .. -DEGL_INCLUDE_DIR=/usr/include/ -DGLES2_INCLUDE_DIR=/usr/include

Finally, build it:

make

Examples

The examples of how to use the library are under examples directory and are compiled to the bin folder. They can be run from this folder.

Benchmarks

This repository contains benchmarking code, which allows you to extend it and benchmark your own shaders/sequences of operations. The binary that runs these benchmarks is in the bin/benchmark repository.

The following two graphs show the performance of 2D convolution with 5x5 kernel when called using the chain API and repeated twice:

More benchmarks can be seen in the benchmarking post.

Important! Refer to benchmarks to assess whether your algorithm is suitable for GPGPU computing and specifically choose the proper size of your data. Remember that the texture sizes must be powers of 2 and a maximum of 2048 (on BBB). Feel free to submit a PR with your proposed algorithm and benchmarks for it :)

Development and extensions

All contributions are welcome! Most importantly the library is missing several more operations:

  • [] 1D convolution
  • [] Matrix multiplication
  • [] Your operation of choice :)