OM1 Video Processor

A Docker-based video streaming solution for OpenMind that captures video from local cameras, performs face recognition, and streams the results to OpenMind's video ingestion API.

Overview

This tool uses the OM1 modules to create an intelligent video streaming pipeline that:

Captures video from a local camera (e.g., /dev/video0)
Performs real-time face recognition with bounding boxes and name overlays
Captures audio from a microphone (e.g., default_mic_aec)
Streams the processed video and audio directly to OpenMind's video ingestion API via RTSP

Features

GPU-accelerated processing: Optimized for NVIDIA Jetson platforms with CUDA support
Real-time face recognition: Live face detection with bounding boxes and name overlays
Audio capture: Integrated microphone support with PulseAudio
Direct RTSP streaming: Streams directly to OpenMind's API without intermediate relay
Automatic restart: All services restart automatically if they fail
FPS monitoring: Real-time performance metrics display
Configurable devices: Supports multiple camera and microphone configurations

Prerequisites

Docker and Docker Compose
NVIDIA Jetson device with JetPack 6.1 (or compatible NVIDIA GPU system)
A USB camera or built-in webcam (default: /dev/video0)
A microphone device (default: default_mic_aec)
OpenMind API credentials
Linux system with V4L2 and ALSA support

Setup

Clone this repository:

git clone https://github.com/OpenMind/OM1-video-processor.git
cd OM1-video-processor

Set your OpenMind API credentials as environment variables:

export OM_API_KEY_ID="your_api_key_id"
export OM_API_KEY="your_api_key"

(Optional) Configure camera and microphone devices:

export CAMERA_INDEX="/dev/video0"    # Default camera device
export MICROPHONE_INDEX="default_mic_aec"     # Default microphone device

Note

Please refer to the OpenMind Avatar documentation for the audio and video device configuration details.

Ensure your devices are accessible:

# Check available video devices
ls /dev/video*

# List video devices with v4l2
v4l2-ctl --list-devices

# Check available audio devices
pactl list sources short
pactl list sinks short

Usage

Start the streaming service:

docker-compose up -d

View logs:

docker-compose logs -f

Stop the service:

docker-compose down

Configuration

The system is configured through several components:

Docker Compose Configuration

The docker-compose.yml file configures:

NVIDIA runtime: GPU acceleration for face recognition processing
Network mode: Host networking for direct device access
Privileged mode: Required for camera and audio device access
Device mapping: Camera (default /dev/video0) and audio (/dev/snd) devices
Environment variables: OpenMind API credentials, device indices, and PulseAudio configuration
Shared memory: 4GB allocated for efficient video processing

Processing Pipeline

The streaming pipeline consists of two processes managed by Supervisor:

MediaMTX: RTSP server for stream routing and management
OM Face Recognition Stream: Main processing service that:
- Captures video from the specified camera device
- Performs real-time face recognition with GPU acceleration
- Overlays bounding boxes, names, and FPS information
- Captures audio from the specified microphone
- Streams directly to OpenMind's RTSP ingestion endpoint

Environment Variables

The following environment variables can be configured:

OM_API_KEY_ID: Your OpenMind API key ID (required)
OM_API_KEY: Your OpenMind API key (required)
CAMERA_INDEX: Camera device path (default: /dev/video0)
MICROPHONE_INDEX: Microphone device identifier (default: default_mic_aec)

Ports

The following ports are used internally:

8554: RTSP (MediaMTX local server)
1935: RTMP (MediaMTX local server)
8889: HLS (MediaMTX local server)
8189: WebRTC (MediaMTX local server)

Note: The main video stream is sent directly to OpenMind's RTSP endpoint at rtsp://api-video-ingest.openmind.org:8554/

Troubleshooting

Camera not detected:

# Check available video devices
ls /dev/video*

# Test camera with v4l2
v4l2-ctl --list-devices

# Test specific camera device
v4l2-ctl --device=/dev/video0 --list-formats-ext

Audio issues:

# Check available audio recording devices
pactl list sources short
pactl list sinks short

# Test microphone recording
arecord -D default_mic_aec -f cd test.wav
aplay test.wav

Note

The pactl has the noise suppression module enabled by default for better audio quality. Use arecord to test the raw microphone input without noise suppression.

Permission issues:

# Add your user to video and audio groups
sudo usermod -a -G video,audio $USER

# Ensure device permissions
sudo chmod 666 /dev/video0

Check container logs:

# View all logs
docker-compose logs

# View specific service logs
docker-compose logs om1_video_processor

# Follow logs in real-time
docker-compose logs -f om1_video_processor

GPU/CUDA issues:

# Check NVIDIA runtime availability
docker info | grep nvidia

# Test CUDA in container
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.0-base nvidia-smi

Architecture

┌──────────────┐    ┌─────────────────────────────────┐    ┌─────────────────┐
│    Camera    │───▶│     OM Face Recognition         │───▶│   OpenMind API  │
│  /dev/video0 │    │   - GPU-accelerated processing  │    │   RTSP Ingest   │
└──────────────┘    │   - Face detection & naming     │    │                 │
                    │   - Bounding box overlay        │    └─────────────────┘
┌──────────────┐    │   - FPS monitoring              │
│  Microphone  │───▶│   - Audio capture & streaming   │
│ default_mic_ │    └─────────────────────────────────┘
│    aec       │
└──────────────┘

Development

Building the image:

docker-compose build

Customizing processing settings:

Edit the command in video_processor/supervisord.conf to modify the om_face_recog_stream parameters:

--device: Camera device path
--rtsp-mic-device: Microphone device identifier
--draw-boxes: Enable/disable bounding box overlays
--draw-names: Enable/disable name overlays
--show-fps: Enable/disable FPS display
--no-window: Run in headless mode
--remote-rtsp: OpenMind RTSP ingestion endpoint

Local development:

# Install dependencies locally
uv sync --all-extras

# Run the face recognition stream locally
uv run om_face_recog_stream --help

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

Support

For issues related to:

OpenMind API: Contact OpenMind support
OM1 Modules: Check the OM1 modules repository
MediaMTX: Check the MediaMTX documentation
NVIDIA Jetson: Check the JetPack documentation
This tool: Open an issue in this repository

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
data		data
models		models
video_processor		video_processor
.dockerignore		.dockerignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

OpenMind/OM1-video-processor

Folders and files

Latest commit

History

Repository files navigation

OM1 Video Processor

Overview

Features

Prerequisites

Setup

Usage

Start the streaming service:

View logs:

Stop the service:

Configuration

Docker Compose Configuration

Processing Pipeline

Environment Variables

Ports

Troubleshooting

Camera not detected:

Audio issues:

Permission issues:

Check container logs:

GPU/CUDA issues:

Architecture

Development

Building the image:

Customizing processing settings:

Local development:

License

Contributing

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages