Update geoformer to use cuda 11.3, pytorch 1.11.0, and spconv 2.3.6 #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, I am interested in using GeoFormer, but my GPU does not support CUDA 10.2, so I was forced to update to try it out. I've encountered some issues, so this is a PR/Issue. I hope that you will be able to provide me with some guidance so that I can try GeoFormer and complete the PR.
The error
When I run test.py or test_fs.py, there are no detections returned, and then at scene 77 the following error is raised:
I've used a breakpoint to trace the issue, and at some point the scores are just too low, so an empty list is returned. From the spconv documentation, weight layout in 1.x uses RSCK, but 2.x uses RSKC or KRSC. With the 2.x default set, the model weights will only load if I uncomment your weight permutation in the loading script. However, I get the empty detections and the error described above. I have also tried re-commenting those lines with the weight layout set to "RSCK", "RSKC", and "KRSC", but all result in shape mismatches between the model definition and the loaded weights. I suspect that I am getting these empty detections due to an incorrect loading order of the weights. Perhaps the input and output are swapped, or the weight for x coordinate swapped with the weight for z, etc.? That would result in correct layer sizes, but spurious model results. Could you print out the weight values of one of the model layers as it loads for you with spconv 1.x and cuda 10.2? If I know the values I can check if the order of the weights is loading correctly.
Changes Made
I've added a Dockerfile to use CUDA 11.3, spconv 2.3.6 for CUDA 11.3, and pytorch 1.11.0. Pytorch was upgraded to this version to avoid this bug with MinkowskiEngine. If you use my Dockerfile, note that I installed pointgroup_ops and pointnet2 from within the container rather than during the image build. You may not encounter this problem, but I was unable to get docker to recognize CUDA during the build to install those packages, but it did inside the container.
I removed THC,
and updated the imports for spconv.
Those are the relevant changes. Other changes were automatic from my python linter.
Please let me know if you have any other thoughts on why the detections returned are empty.