-
Notifications
You must be signed in to change notification settings - Fork 86
MultiGPU workflow of CARLsim4
This document explains the simulation workflow of CARLsim4 to leverage multiple Graphics Processing Units (GPUs).
The above figure shows the different states of the CARLsim simulator during execution. The general workflow follows from CARLsim3 with three major states, viz. Config state, Setup state, and Runtime state. Readers are requested to look at http://www.socsci.uci.edu/~jkrichma/CARLsim/doc/ch2_basic_concepts.html
for details on the general workflow. Here, we will only focus on the new additions in the CARLsim4 kernel for multi GPU simulation.
In comparison to the CARLsim3 kernel, two major changes have been made in the CARLsim4 kernel:
-
Modularity: The configuration and setup tasks of the workflow are aggregated into one module named SNN-Manager, and the runtime tasks are divided into two functionally equivalent CPU and GPU based modules (based on type of simulation)
-
Partition of the network to GPU-local space: All the network data-structures are partitioned into new spaces local to each of the GPUs in an effective manner. For each GPU, only data related to its neurons, groups, and connections are needed during runtime.
The SNN-manager module is shown in the above state. It performs the config and setup stages of the network and these operations are performed on the CPU. The config stage is almost unchanged in comparison to CARLsim3. The most important change here is the user defined 'numGPUs' parameter to the SNN constructor.
int numGPUs = 4; CARLsim sim("netName", GPU_MODE, USER, numGPUs, 42);
The blue colored rectangle in the above figure shows the additions in the setup stage of 1. * CARLsim4. Overall functionality of it is to convert the network data-structures into local index space of each GPU. We will go through each of these functions below:
- [compileGroupConfig] Find the maximum delay for each group according to incoming connection
- [compileGroupConfig] Assign global neuron ids to each group in the order: Excitatory-Regular | Inhibitory-Regular | Inhibitory-Poisson | Excitatory-Poisson
- [collectGlobalNetworkConfig] Scan all connect configs to find the maximum delay in the global network
- [collectGlobalNetworkConfig] Scan all group configs to find the total number of (reg, pois, exc, inh) neuron in the global network
- TODO: verifyNetwork to perform all the consistency checks, such as 'numNeurons vs. sum of all neurons'
- Create groupPartitionLists: Assign each group to one of the GPUs. The user can select the GPU for each group.
- Create localConnectLists: Connection configs that connect local groups of each GPU
- Create externalConnectLists: