This repository contains the code for my thesis on LiDAR-based UAV pose estimation in ocean environments. The work presents a deep point transformer network to predict UAV poses from LiDAR point cloud data.
For more details, refer to the full thesis:
Simon, Karl Philip. 2025. Ship-relative UAV pose estimation with 3D LiDAR. MS thesis., The George Washington University, ProQuest.
- Python 3.8 or higher
- PyTorch
- For additional dependencies, see the original PointTransformer repository.
-
Clone the repository:
git clone https://github.com/fdcl-gwu/point-transformer.git cd point-transformer
-
To train the model, run:
python train_pose.py
-
Training output is written to
/log/pose_estimation
and contains the model .pth file. -
To run inference, execute:
python test_pose.py
Predictions are saved in as a JSON file named
results.json
with the following structure:
{
"file": "000001.txt", // Name of the input file
"gt_kp": [[x1, y1, z1], [x2, y2, z2], ...], // Ground truth keypoints (list of 3D coordinates)
"pred_kp": [[x1, y1, z1], [x2, y2, z2], ...], // Predicted keypoints (list of 3D coordinates)
"gt_rotation": [[r11, r12, r13], [r21, r22, r23], [r31, r32, r33]], // Ground truth rotation matrix
"gt_translation": [tx, ty, tz], // Ground truth translation vector
"pred_rotation": [[r11, r12, r13], [r21, r22, r23], [r31, r32, r33]], // Predicted rotation matrix
"pred_translation": [tx, ty, tz], // Predicted translation vector
"scale": s, // Scale factor used to normalize the input point cloud
"centroid": [cx, cy, cz], // Centroid of the input point cloud
"points": [[x1, y1, z1], [x2, y2, z2], ...], // Original input point cloud (list of 3D points)
"refined_rotation": [[r11, r12, r13], [r21, r22, r23], [r31, r32, r33]], // Refined rotation matrix after GICP
"refined_translation": [tx, ty, tz], // Refined translation vector after GICP
"gicp_converged": true, // Boolean indicating if GICP refinement converged
"gicp_iterations": 3 // Number of iterations performed by GICP
}
The dataloader expects the dataset to follow this structure:
Dataset/
└── gazebo-metadata/
├── clouds/ # Point cloud .txt files (1024 points each, normalized)
├── poses/ # Ground-truth poses in quaternion format
└── keypoints/ # Ship keypoints (x, y, z) for alignment
Note:
gazebo-metadata
is just an intermediate directory and can be renamed or left as a placeholder.
-
All files must be text files, and must be named sequentially as
000000.txt
,000001.txt
, etc -
Point clouds must contain 1024 points, scaled to the unit sphere (see
SimNetDataLoader.py
for details). -
Poses are represented as quaternions and resolved in the ship frame. Each pose has the following structure:
{ "x": -3.0525446766862956, "y": 1.8613828372730112, "z": 2.7186098109594017, "qx": 0.025889202460656503, "qy": 0.1006027945007082, "qz": -0.14496384882175387, "qw": 0.9839686526863461 }
-
The code uses 40 keypoints, each containing
x
,y
, andz
coordinates.