This repository contains a complete training pipeline that integrates the NVIDIA TAO Toolkit with Valohai, using the KITTI dataset for object detection with the DetectNet_v2 model.
This project demonstrates how to:
- Preprocess KITTI dataset images and labels into TFRecord format
- Configure and train a DetectNet_v2 model using NVIDIA TAO Toolkit
- Evaluate the trained model performance
- Run the entire workflow using a Valohai pipeline
load_data.py
: Downloads and preprocesses KITTI data (images, labels, specs)train.py
: Launches DetectNet_v2 training with NVIDIA TAOevaluate.py
: Evaluates the model using TAO Toolkitvalohai.yaml
: Defines the pipeline and steps for Valohai executionrequirements.txt
: Contains the Python packages required for preprocessing and training orchestration
Before running the project. Make sure you add your NGC_API_KEY in Valohai project registry
- image pattern:
nvcr.io/*
- username:
$oauthtoken
- password:
YOUR_NGC_API_KEY
The pipeline automates the full model development workflow:
- Downloads KITTI object detection dataset (images and labels)
- Extracts, parses, and optionally subsets the dataset
- Converts data to TFRecord format compatible with TAO Toolkit
- Outputs zipped datasets for Valohai input versioning
Parameters:
subset
: Number of images to includenum_plot_images
: Visualizes a few samples during preprocessing
- Trains a DetectNet_v2 model using NVIDIA's TAO Toolkit Docker container
- Uses the TFRecords and spec files created in the previous step
- Saves the resulting
.hdf5
model file and logs - Outputs training progress
Configurable parameters:
epochs
batch_size_per_gpu
use_batch_norm
val_split
Check Training spec file for more configurable parameters.
Environment variables (defined in valohai.yaml
) handle:
- GPU usage
- TAO Docker flags
- Output and data directories
- NGC API authentication
Training progress
- Evaluates trained
.hdf5
models using TFRecords and original validation data - Generates metrics (e.g., precision, recall) and visual output snapshots
- Uses the same TAO container and config setup as training
vh pipeline run train_and_evaluate
This command will:
- Load and preprocess the KITTI dataset
- Train a DetectNet_v2 model using TAO Toolkit
- Evaluate the trained model
This project uses the KITTI Object Detection dataset:
- Images: KITTI Images
- Labels: KITTI Labels
Specs must follow TAO Toolkit formatting.
The training pipeline uses DetectNet_v2, an NVIDIA TAO object detection architecture optimized for real-time applications. Model and training parameters are defined in spec files, which are version-controlled and passed via Valohai inputs.
Ensure your Valohai executions install required packages:
pip install -r requirements.txt
TAO Toolkit itself runs within NVIDIA’s prebuilt containers.
This project uses:
See each tool’s license for details.
- NVIDIA TAO Toolkit for powerful model training
- Valohai for automating machine learning workflows
- KITTI Dataset by Karlsruhe Institute of Technology & Toyota Technological Institute at Chicago