A Docker-based video streaming solution for OpenMind that captures video from local cameras, performs face recognition, and streams the results to OpenMind's video ingestion API.
This tool uses the OM1 modules to create an intelligent video streaming pipeline that:
- Captures video from a local camera (e.g.,
/dev/video0) - Performs real-time face recognition with bounding boxes and name overlays
- Captures audio from a microphone (e.g.,
default_mic_aec) - Streams the processed video and audio directly to OpenMind's video ingestion API via RTSP
- GPU-accelerated processing: Optimized for NVIDIA Jetson platforms with CUDA support
- Real-time face recognition: Live face detection with bounding boxes and name overlays
- Audio capture: Integrated microphone support with PulseAudio
- Direct RTSP streaming: Streams directly to OpenMind's API without intermediate relay
- Automatic restart: All services restart automatically if they fail
- FPS monitoring: Real-time performance metrics display
- Configurable devices: Supports multiple camera and microphone configurations
- Docker and Docker Compose
- NVIDIA Jetson device with JetPack 6.1 (or compatible NVIDIA GPU system)
- A USB camera or built-in webcam (default:
/dev/video0) - A microphone device (default:
default_mic_aec) - OpenMind API credentials
- Linux system with V4L2 and ALSA support
-
Clone this repository:
git clone https://github.com/OpenMind/OM1-video-processor.git cd OM1-video-processor -
Set your OpenMind API credentials as environment variables:
export OM_API_KEY_ID="your_api_key_id" export OM_API_KEY="your_api_key"
-
(Optional) Configure camera and microphone devices:
export CAMERA_INDEX="/dev/video0" # Default camera device export MICROPHONE_INDEX="default_mic_aec" # Default microphone device
Note
Please refer to the OpenMind Avatar documentation for the audio and video device configuration details.
- Ensure your devices are accessible:
# Check available video devices ls /dev/video* # List video devices with v4l2 v4l2-ctl --list-devices # Check available audio devices pactl list sources short pactl list sinks short
docker-compose up -ddocker-compose logs -fdocker-compose downThe system is configured through several components:
The docker-compose.yml file configures:
- NVIDIA runtime: GPU acceleration for face recognition processing
- Network mode: Host networking for direct device access
- Privileged mode: Required for camera and audio device access
- Device mapping: Camera (default
/dev/video0) and audio (/dev/snd) devices - Environment variables: OpenMind API credentials, device indices, and PulseAudio configuration
- Shared memory: 4GB allocated for efficient video processing
The streaming pipeline consists of two processes managed by Supervisor:
- MediaMTX: RTSP server for stream routing and management
- OM Face Recognition Stream: Main processing service that:
- Captures video from the specified camera device
- Performs real-time face recognition with GPU acceleration
- Overlays bounding boxes, names, and FPS information
- Captures audio from the specified microphone
- Streams directly to OpenMind's RTSP ingestion endpoint
The following environment variables can be configured:
OM_API_KEY_ID: Your OpenMind API key ID (required)OM_API_KEY: Your OpenMind API key (required)CAMERA_INDEX: Camera device path (default:/dev/video0)MICROPHONE_INDEX: Microphone device identifier (default:default_mic_aec)
The following ports are used internally:
- 8554: RTSP (MediaMTX local server)
- 1935: RTMP (MediaMTX local server)
- 8889: HLS (MediaMTX local server)
- 8189: WebRTC (MediaMTX local server)
Note: The main video stream is sent directly to OpenMind's RTSP endpoint at rtsp://api-video-ingest.openmind.org:8554/
# Check available video devices
ls /dev/video*
# Test camera with v4l2
v4l2-ctl --list-devices
# Test specific camera device
v4l2-ctl --device=/dev/video0 --list-formats-ext# Check available audio recording devices
pactl list sources short
pactl list sinks short
# Test microphone recording
arecord -D default_mic_aec -f cd test.wav
aplay test.wavNote
The pactl has the noise suppression module enabled by default for better audio quality. Use arecord to test the raw microphone input without noise suppression.
# Add your user to video and audio groups
sudo usermod -a -G video,audio $USER
# Ensure device permissions
sudo chmod 666 /dev/video0# View all logs
docker-compose logs
# View specific service logs
docker-compose logs om1_video_processor
# Follow logs in real-time
docker-compose logs -f om1_video_processor# Check NVIDIA runtime availability
docker info | grep nvidia
# Test CUDA in container
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.0-base nvidia-smi┌──────────────┐ ┌─────────────────────────────────┐ ┌─────────────────┐
│ Camera │───▶│ OM Face Recognition │───▶│ OpenMind API │
│ /dev/video0 │ │ - GPU-accelerated processing │ │ RTSP Ingest │
└──────────────┘ │ - Face detection & naming │ │ │
│ - Bounding box overlay │ └─────────────────┘
┌──────────────┐ │ - FPS monitoring │
│ Microphone │───▶│ - Audio capture & streaming │
│ default_mic_ │ └─────────────────────────────────┘
│ aec │
└──────────────┘
docker-compose buildEdit the command in video_processor/supervisord.conf to modify the om_face_recog_stream parameters:
--device: Camera device path--rtsp-mic-device: Microphone device identifier--draw-boxes: Enable/disable bounding box overlays--draw-names: Enable/disable name overlays--show-fps: Enable/disable FPS display--no-window: Run in headless mode--remote-rtsp: OpenMind RTSP ingestion endpoint
# Install dependencies locally
uv sync --all-extras
# Run the face recognition stream locally
uv run om_face_recog_stream --helpMIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
For issues related to:
- OpenMind API: Contact OpenMind support
- OM1 Modules: Check the OM1 modules repository
- MediaMTX: Check the MediaMTX documentation
- NVIDIA Jetson: Check the JetPack documentation
- This tool: Open an issue in this repository