Memory consumption of VoxelNet limits the number of muons and voxels that can be used #97

GilesStrong · 2022-03-25T14:49:12Z

Problem

VoxelNet acts on tensors of the size (volumes, voxels, muons, features) and as part of its graph construction expands these into (volumes, voxels, muons, muons, new features) before collapsing back to the original shape. Although the forward method runs a loop over the volumes, (so the actual shape is just (voxels, muons, features)), the memory consumption is still very high.

Potential solutions

Loop over voxels in the first part of the network

The first part of the network computes a muon representation per voxel (voxels, muon representation) and this computation is performed irrespective of the other voxels. Meaning that the muon reps. could be computed serially rather than in parallel. This reduces the memory consumption at the cost of processing time.

Compile parts of the network

PyTorch makes it "easy" to compile parts of the network in c++ and CUDA. According to Jan Kieseler this heavily reduces memory consumption and processing time at the cost of development time and model flexibility. He has sent me some examples, and I have also gone through the official PyTorch tutorial for writing and compiling kernels. The main difficulty is that the backwards pass to compute the gradients must also be written manually, and the optimality of the writing of this can have a heavy impact on performance: in my testing of PyTorch's examples, the backwards pass was actually slower when compiled, but the forwards pass was slightly quicker.

There are several parts of the GNN that care candidates for compilation:

When expanding to the (voxels, muons, muons, new features) tensor, what we actually want is (voxels, muons, k-nearest muons, new features), but we still have to compute the distances between all the muons. This kNN indexing could be compiled to reduce memory consumption.
(voxels, muons, k-nearest muons, new features) is then later collapsed by aggregating across the k-nearest muons into (voxels, muons, aggregate features). The whole kNN+aggregation could be compiled to save even more memory, at the cost of some model flexibility.
In the graph collapse stage where we convert (voxels, muons, aggregate features) to (voxels, muon representation), the muon features go through a self-attention step which internally computes a temporary (voxels, muons, muons) tensor. This could also be compiled to save memory.

GilesStrong added enhancement New feature or request medium priority Should be fixed soon, but doesn't disastrously impact project Functionality Issue adds to the functionality of the package labels Mar 25, 2022

GilesStrong self-assigned this Mar 25, 2022

GilesStrong added this to To do (Essential) in Publication 2: Inference comparison via automation Mar 25, 2022

GilesStrong added this to To do in VoxelNet via automation Mar 25, 2022

GilesStrong added this to To do (Essential) in ULorry benchmark via automation Mar 25, 2022

GilesStrong added this to To do (Essential) in Publication 1: Detector Optimisation via automation Mar 25, 2022

GilesStrong moved this from To do (Essential) to Nice to have in Publication 1: Detector Optimisation Mar 25, 2022

GilesStrong moved this from To do (Essential) to Nice to have in Publication 2: Inference comparison Mar 25, 2022

GilesStrong moved this from To do (Essential) to Nice to have in ULorry benchmark Mar 25, 2022

GilesStrong added this to Nice to have in Furnace benchmark Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory consumption of VoxelNet limits the number of muons and voxels that can be used #97

Memory consumption of VoxelNet limits the number of muons and voxels that can be used #97

GilesStrong commented Mar 25, 2022

Memory consumption of VoxelNet limits the number of muons and voxels that can be used #97

Memory consumption of VoxelNet limits the number of muons and voxels that can be used #97

Comments

GilesStrong commented Mar 25, 2022

Problem

Potential solutions

Loop over voxels in the first part of the network

Compile parts of the network