This guide covers the data import script for ingesting data from the Pyronear platform API into the annotation API. This script provides an end-to-end workflow that fetches sequences and detections from the production platform and transfers them to your annotation API, automatically generating annotations and preparing them for human review.
The data ingestion system uses a single comprehensive script:
import - End-to-end processing: Fetches platform data and generates annotations in one streamlined workflow
This script provides a complete pipeline from raw platform data to annotation-ready sequences with proper processing stage management, combining data fetching and annotation generation into a single, efficient workflow.
- Python environment with
uvpackage manager - Access to the Pyronear platform API
- Running annotation API instance (local or remote)
- Valid platform API credentials (both regular and admin access)
You need both regular and admin credentials for the platform API:
- Regular credentials: For accessing sequences, detections, and cameras
- Admin credentials: For accessing organization information via
/api/v1/organizationsendpoints
All credentials live in annotation_api/.env. Copy annotation_api/.env.example to annotation_api/.env and fill in the values:
PLATFORM_LOGIN=your_platform_username
PLATFORM_PASSWORD=your_platform_password
PLATFORM_ADMIN_LOGIN=your_admin_username
PLATFORM_ADMIN_PASSWORD=your_admin_passwordEach script in scripts/data_transfer/ingestion/platform/ loads .env at startup via python-dotenv (which handles dotenv quoting correctly, including values with $). Make does not parse .env. Shell-level env vars take priority, so MAIN_ANNOTATION_LOGIN=foo make ... still overrides the file.
The import script provides a streamlined workflow that combines platform data fetching with automated annotation generation. This is the recommended approach for all use cases as it takes sequences from the platform API all the way to annotation-ready status in a single command.
The script executes the following pipeline:
- Fetch Platform Data: Retrieves sequences and detections from platform API → posts to annotation API
- For Each Sequence:
- Generate Annotation: Analyzes AI predictions and creates sequence annotations → sets stage to
READY_TO_ANNOTATE
- Generate Annotation: Analyzes AI predictions and creates sequence annotations → sets stage to
- Sequential Processing: Processes sequences one by one for better error control
- Automatic Overwriting: Always updates existing annotations (no force flag needed)
- Error Resilient: Continues processing other sequences if one fails, logs errors clearly
- Stage Management: Proper transitions from no annotation →
READY_TO_ANNOTATE - Comprehensive Statistics: Tracks success/failure rates for sequences and annotations
# Full pipeline for a date range (recommended approach)
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-02 --loglevel info
# Process all AI predictions (no confidence filtering)
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --confidence-threshold 0.0 --loglevel info
# Dry run to preview what would be processed
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --dry-run --loglevel debug# Skip platform fetch (use existing sequences in annotation API)
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --skip-platform-fetch --loglevel info
# Custom API endpoints and detection limits
uv run python -m scripts.data_transfer.ingestion.platform.import \
--url-api-platform "https://alertapi.pyronear.org" \
--url-api-annotation "http://localhost:5050" \
--date-from 2024-01-01 --date-end 2024-01-07 \
--detections-limit 50 --detections-order-by desc \
--loglevel info
# Fine-tune annotation generation parameters
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 \
--confidence-threshold 0.5 \
--iou-threshold 0.4 \
--min-cluster-size 2 \
--loglevel debug| Parameter | Description | Default | Required |
|---|---|---|---|
--date-from |
Start date (YYYY-MM-DD format) | - | Yes |
--date-end |
End date (YYYY-MM-DD format) | Current date | No |
| Parameter | Description | Default | Required |
|---|---|---|---|
--url-api-platform |
Platform API base URL | https://alertapi.pyronear.org |
No |
--url-api-annotation |
Annotation API base URL | http://localhost:5050 |
No |
| Parameter | Description | Default | Required |
|---|---|---|---|
--detections-limit |
Max detections per sequence | 30 |
No |
--detections-order-by |
Order detections by created_at | asc |
No |
| Parameter | Description | Default | Required |
|---|---|---|---|
--confidence-threshold |
Min AI prediction confidence (0.0-1.0) | 0.0 |
No |
--iou-threshold |
Min IoU for clustering overlapping boxes | 0.3 |
No |
--min-cluster-size |
Min boxes required in a cluster | 1 |
No |
| Parameter | Description | Default | Required |
|---|---|---|---|
--dry-run |
Preview actions without execution | false |
No |
--skip-platform-fetch |
Skip platform data fetching | false |
No |
--loglevel |
Logging level (debug/info/warning/error) | info |
No |
# 1. Test with dry run (no side effects)
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-02 \
--dry-run --loglevel debug
# 2. If successful, run the full pipeline
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-02 --loglevel info# Process a week's worth of data with higher confidence filtering
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-08 \
--confidence-threshold 0.7 --detections-limit 100 \
--loglevel info# Use custom annotation API endpoint (e.g., staging environment)
uv run python -m scripts.data_transfer.ingestion.platform.import \
--url-api-annotation "http://staging.annotation-api.com" \
--date-from 2024-01-01 --date-end 2024-01-02 \
--loglevel info# Skip platform fetch and just generate annotations for existing sequences
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-02 \
--skip-platform-fetch --loglevel info- Uses ProcessPoolExecutor for concurrent data fetching
- Progress bars (via tqdm) show real-time progress for long operations
- Efficient handling of large date ranges and multiple sequences
- Fetches platform sequences, detections, cameras, and organizations
- Transforms data to match the annotation API schema format
- Downloads detection images and uploads them to the annotation API
- Handles coordinate normalization and prediction data formatting
- Analyzes AI predictions to create sequence annotations automatically
- Clusters overlapping bounding boxes across temporal frames
- Applies configurable confidence and IoU thresholds
- Sets processing stage to
READY_TO_ANNOTATEfor human review
- Validates all required environment variables before execution
- Comprehensive error reporting for API failures
- Detailed logging at multiple levels (debug, info, warning, error)
- Graceful handling of missing data or network issues
- Continues processing if individual sequences fail
The script manages annotation processing stages automatically:
- No Annotation: Sequence exists but has no annotation
- READY_TO_ANNOTATE: Annotation created from AI predictions and ready for human review
Platform Data → Annotation API → Generate Annotations (READY_TO_ANNOTATE)
The script provides comprehensive statistics upon completion:
Processing completed!
Final Statistics:
Total sequences: 15
Successful sequences: 14
Failed sequences: 1
Annotations created: 14
Error: Missing platform credentials...
Solution: Ensure all four variables are set in annotation_api/.env:
PLATFORM_LOGIN=your_username
PLATFORM_PASSWORD=your_password
PLATFORM_ADMIN_LOGIN=your_admin
PLATFORM_ADMIN_PASSWORD=your_admin_password(copied from annotation_api/.env.example).
Error: Failed to fetch access token or 401 Unauthorized
Solutions:
- Verify credentials are correct for the platform API
- Check that admin credentials have organization access permissions
- Ensure platform API endpoint is accessible
Error: Invalid combination of --date-from and --date-end parameters
Solution: Ensure --date-from is earlier than or equal to --date-end:
# Correct
--date-from 2024-01-01 --date-end 2024-01-02
# Incorrect
--date-from 2024-01-02 --date-end 2024-01-01Error: Connection errors to annotation API Solutions:
- Verify annotation API is running:
curl http://localhost:5050/docs - Check the
--url-api-annotationparameter - Ensure network connectivity between script and annotation API
Issue: Script timeout with large date ranges Solutions:
- Use smaller date ranges and run multiple times
- Increase
--detections-limitif you need more detections per sequence - Use
--dry-runfirst to test data fetching performance
Issue: Some sequences or detections fail to process Expected Behavior: Script reports partial success and continues processing Action: Review logs for specific failure reasons, often related to:
- Invalid data from platform API
- Network timeouts for image downloads
- Data validation failures in annotation API
- Insufficient AI predictions for annotation generation
Use --loglevel debug for detailed troubleshooting information:
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-02 \
--loglevel debugThis will show:
- Detailed API request/response information
- Data transformation steps
- Individual sequence and detection processing results
- Annotation generation details and statistics
- Timing information for performance analysis
Use --dry-run to test the pipeline without making changes:
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-02 \
--dry-run --loglevel infoThis is useful for:
- Validating credentials and API connectivity
- Testing data transformation logic
- Previewing annotation generation results
- Estimating processing time for large datasets
- Debugging without side effects
- Data Import: Use this script to import platform data and generate initial annotations
- Human Review: Review and validate the auto-generated annotations
- Quality Control: Refine annotations, mark false positives, add missing smoke
- Export: Use the API client to export annotated data for ML training
The script maintains all relationships between:
- Organizations → Cameras → Sequences → Detections
- Geographic coordinates and camera metadata
- Temporal sequencing and detection timing
- AI prediction data and bounding boxes
- Generated sequence annotations with processing stages
This ensures your local annotation API has complete context for annotation work.
This script is ideal when you want to:
- Batch Process: Import and prepare multiple sequences for annotation work
- Automate Pipeline: Set up regular imports from platform to annotation API
- Quality Control: Generate annotations for human review and validation
- ML Training: Prepare annotated datasets with bounding boxes
After running this script, sequences will be in READY_TO_ANNOTATE stage and ready for:
- Human annotation review and validation
- Bounding box refinement
- False positive classification
- Quality control workflows
# Daily import routine (last 24 hours)
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from $(date -d '1 day ago' '+%Y-%m-%d') \
--date-end $(date '+%Y-%m-%d') \
--loglevel info
# Weekly batch processing with high confidence filtering
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-08 \
--confidence-threshold 0.7 \
--detections-limit 100 \
--loglevel info
# Development/testing with existing data (skip platform fetch)
uv run python -m scripts.data_transfer.ingestion.platform.import \
--date-from 2024-01-01 --date-end 2024-01-02 \
--skip-platform-fetch --dry-run --loglevel debug