🤖 Azure Robotics Reference Architecture with NVIDIA OSMO

This reference architecture provides a production-ready framework for orchestrating robotics and AI workloads on Microsoft Azure using NVIDIA technologies such as Isaac Lab, Isaac Sim, and OSMO. It demonstrates end-to-end reinforcement learning workflows, scalable training pipelines, and deployment processes with Azure-native authentication, storage, and ML services.

🚀 Key Features

OSMO handles workflow orchestration and job scheduling while Azure provides elastic GPU compute, persistent checkpointing, MLflow experiment tracking, and enterprise grade security.

Infrastructure as Code - Terraform modules referencing microsoft/edge-ai components for reproducible deployments
Containerized Workflows - Docker-based Isaac Lab training with NVIDIA GPU support
CI/CD Integration - Automated deployment pipelines with GitHub Actions
MLflow Integration - Automatic experiment tracking and model versioning
- Automatic metric logging from SKRL agents to Azure ML
- Comprehensive tracking of episode statistics, losses, optimization metrics, and timing data
- Configurable logging intervals and metric filtering
- See MLflow Integration Guide for details
Scalable Compute - Auto-scaling GPU nodes based on workload demands
Cost Optimization - Pay-per-use compute with automatic scaling
Enterprise Security - Entra ID integration
Global Deployment - Multi-region support for worldwide teams

🗼 Architecture Overview

This reference architecture integrates:

NVIDIA OSMO - Workflow orchestration and job scheduling
Azure Machine Learning - Experiment tracking and model management
Azure Kubernetes Service - Software in the Loop (SIL) training
Azure Arc for Kubernetes - Software in the Loop (SIL) and Hardware in the Loop (HIL) training
Azure Storage - Persistent data and checkpoint storage
Azure Key Vault - Secure credential management
Azure Monitor - Comprehensive logging and metrics

🌍 Real World Examples

OSMO orchestration on Azure enables production-scale robotics training across industries. Some examples include:

Warehouse AMRs - Train navigation policies with 1000+ parallel environments on auto-scaling AKS GPU nodes, checkpoint to Azure Storage, track experiments in Azure ML
Manufacturing Arms - Develop manipulation strategies with physics-accurate simulation, leveraging Azure's global regions for distributed teams and pay-per-use GPU compute
Legged Robots - Optimize locomotion policies with MLflow experiment tracking for sim-to-real transfer
Collaborative Robots - Create safe interaction policies with Azure Monitor logging and metrics, enabling compliance auditing and performance diagnostics at scale

See OSMO workflow examples for job configuration templates.

🧑🏽‍💻 Prerequisites and Requirements

Required Tools

uv - Python package manager and environment tool
Python 3.11 (required by Isaac Sim 5.X)
Azure CLI (v2.50+)
Terraform (v1.5+)
NVIDIA OSMO CLI (latest)
Docker with NVIDIA Container Toolkit
hve-core - Copilot-assisted development workflows (install guide)

Azure Requirements

Azure subscription with contributor access
Sufficient quota for GPU VMs (Standard_NC6s_v3 or higher)
Azure Machine Learning workspace (or permissions to create one)

NVIDIA Requirements

NVIDIA Developer account with OSMO access
NGC API key for container registry access

🏃‍➡️ Quick Start

./setup-dev.sh

The setup script installs Python 3.11 via uv, creates a virtual environment at .venv/, and installs training dependencies.

Install hve-core for Copilot-assisted development workflows. The simplest method is the VS Code Extension.

VS Code Configuration

The workspace is configured with python.analysis.extraPaths pointing to src/, enabling imports like:

from training.utils import AzureMLContext, bootstrap_azure_ml

Select the .venv/bin/python interpreter in VS Code for IntelliSense support

The workspace .vscode/settings.json also configures Copilot Chat to load instructions, prompts, and chat modes from hve-core:

Setting	hve-core Paths
`chat.modeFilesLocations`	`../hve-core/.github/chatmodes`, `../hve-core/copilot/beads/chatmodes`
`chat.instructionsFilesLocations`	`../hve-core/.github/instructions`, `../hve-core/copilot/beads/instructions`
`chat.promptFilesLocations`	`../hve-core/.github/prompts`, `../hve-core/copilot/beads/prompts`

These paths resolve when hve-core is installed as a peer directory or via the VS Code Extension. Without hve-core, Copilot still functions but shared conventions, prompts, and chat modes are unavailable.

🧪 Running Tests

Once a tests/ directory exists, run the test suite:

uv run pytest tests/

Run tests selectively by category:

# Unit tests only (fast, no external dependencies)
uv run pytest tests/ -m "not slow and not gpu"

See the Testing Requirements section in CONTRIBUTING.md for test organization, markers, and coverage targets.

🧹 Cleanup and Uninstall

Reverse the changes made by setup-dev.sh and tear down deployed infrastructure.

Remove Development Environment

# Remove Python virtual environment
rm -rf .venv

# Remove cloned IsaacLab repository
rm -rf external/IsaacLab

# Remove Node.js linting dependencies (if installed separately via npm install)
rm -rf node_modules

# Remove uv cache (optional, frees disk space)
uv cache clean

Destroy Azure Infrastructure

cd deploy/001-iac
terraform destroy -var-file=terraform.tfvars

Warning

terraform destroy permanently deletes all deployed Azure resources including storage, AKS clusters, and Key Vault. Ensure training data and model checkpoints are backed up before destroying infrastructure.

See Cost Considerations for details on resource costs and cleanup timing.

🧱 Repository Structure

.
├── deploy/
│   ├── 000-prerequisites/              # Prerequisites validation and setup
│   ├── 001-iac/                        # Infrastructure as Code deployment
│   ├── 002-setup/                      # Post-infrastructure setup
│   ├── 003-data/                       # Data preparation and upload
│   └── 004-workflow/                   # Training workflow execution
│       ├── job-templates/              # Job configuration templates
│       └── osmo/                       # OSMO inline workflow submission (see osmo/README.md)
├── src/
│   ├── terraform/                      # Infrastructure as Code
│   │   └── modules/                    # Reusable Terraform modules
│   └── training/                       # Training code and tasks
│       ├── common/                     # Shared utilities
│       ├── scripts/                    # Framework-specific training scripts configured for Azure services
│       │   ├── rsl_rl/                 # RSL_RL training scripts
│       │   ├── skrl/                   # SKRL training scripts
│       └── tasks/                      # Placeholder for Isaac Lab training tasks

🪪 License

This project is licensed under the MIT License. See LICENSE for details.

🔒 Security

Review security guidance before deploying this reference architecture:

SECURITY.md - vulnerability reporting and security considerations for deployers
Security Guide - detailed security configuration inventory, deployment responsibilities, and checklist

🤝 Support

For issues and questions:

Review microsoft/edge-ai documentation

🗺️ Roadmap

See the project roadmap for priorities, timelines, and success metrics covering Q1 2026 through Q1 2027.

🙏 Acknowledgments

This reference architecture builds upon:

microsoft/edge-ai - Edge AI infrastructure components
NVIDIA Isaac Lab - RL task framework
NVIDIA Isaac Sim - Physics simulation
NVIDIA OSMO - Workflow orchestration
NVIDIA OSMO GitHub - Workflow orchestration

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.cspell		.cspell
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
deploy		deploy
docs		docs
scripts		scripts
src		src
workflows		workflows
.cspell.json		.cspell.json
.gitignore		.gitignore
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
release-please-config.json		release-please-config.json
release-please-manifest.json		release-please-manifest.json
requirements.txt		requirements.txt
setup-dev.sh		setup-dev.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Azure Robotics Reference Architecture with NVIDIA OSMO

🚀 Key Features

🗼 Architecture Overview

🌍 Real World Examples

🧑🏽‍💻 Prerequisites and Requirements

Required Tools

Azure Requirements

NVIDIA Requirements

🏃‍➡️ Quick Start

VS Code Configuration

🧪 Running Tests

🧹 Cleanup and Uninstall

Remove Development Environment

Destroy Azure Infrastructure

🧱 Repository Structure

🪪 License

🔒 Security

🤝 Support

🗺️ Roadmap

🙏 Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

Azure-Samples/azure-nvidia-robotics-reference-architecture

Folders and files

Latest commit

History

Repository files navigation

🤖 Azure Robotics Reference Architecture with NVIDIA OSMO

🚀 Key Features

🗼 Architecture Overview

🌍 Real World Examples

🧑🏽‍💻 Prerequisites and Requirements

Required Tools

Azure Requirements

NVIDIA Requirements

🏃‍➡️ Quick Start

VS Code Configuration

🧪 Running Tests

🧹 Cleanup and Uninstall

Remove Development Environment

Destroy Azure Infrastructure

🧱 Repository Structure

🪪 License

🔒 Security

🤝 Support

🗺️ Roadmap

🙏 Acknowledgments

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages