edge-endpoint can be configured by python-sdk by timmarkhuff · Pull Request #359 · groundlight/edge-endpoint

timmarkhuff · 2026-03-20T23:38:29Z

Summary

Adds HTTP endpoints for reading and modifying the edge endpoint configuration at runtime, without requiring a Helm redeploy. A Python SDK companion PR provides the client-side methods.

New endpoints:

GET /edge-config -- returns the active EdgeEndpointConfig
PUT /edge-config -- replaces the active config (diffs against current state, adds/removes detector pods accordingly)
GET /edge-detector-readiness -- reports which configured detectors have inference pods ready to serve

Key design decisions

Config as a shared file on PVC

Multiple uvicorn workers cannot share in-memory state. The active config is persisted to a YAML file on the existing PVC (/opt/groundlight/edge/config/active-edge-config.yaml). Workers read it via an mtime-based cache (EdgeConfigManager.active()), which calls os.path.getmtime on each access and only re-parses when the file has changed. This gives cross-worker consistency with negligible overhead.

Config loading priority at startup

EdgeConfigManager.load_startup_config() checks sources in order:

EDGE_CONFIG env var (for Docker/test setups/Balena deployments)
Helm-mounted ConfigMap (always wins when present)
Active config on PVC (survives restarts; preserves runtime changes when no Helm config is provided)
Pydantic defaults

Detector reconciliation

PUT /edge-config triggers reconcile_config(), which:

Diffs the desired detectors against DB records (compute_detector_diff)
Marks removed detectors as pending_deletion in the DB
Creates new deployment records for added detectors
Saves the new config to disk

The inference-model-updater picks up these DB changes on its next loop iteration -- deleting pods for pending_deletion records, then creating pods for new ones. This could cause the config update process to take a bit longer than necessary (because we have to wait for the refresh loop to occur). I thought about adding logic to make the inference-model-updater poll for config updates while it waits, and terminate early if it sees one. In the end, I decided to not add that (yet). YAGNI.

Deprecation of default-edge-config.yaml

The static deploy/helm/groundlight-edge-endpoint/files/default-edge-config.yaml has been removed. Default config is now defined by Pydantic model defaults in the python-sdk. When no config file is provided to Helm, _helpers.tpl generates an empty YAML object, and the system uses Pydantic defaults.

Notable refactorings

EdgeConfigManager (new, replaces edge_config_loader.py): Singleton-style class with class methods for config lifecycle -- load_startup_config(), save(), active(), detector_configs(), detector_config().
EdgeInferenceManager simplification: Removed stored detector_inference_configs, inference_client_urls, oodd_inference_client_urls, min_times_between_escalations, and the 30-second scheduler poll. The manager now computes pod URLs on the fly from detector IDs and reads config from EdgeConfigManager.active(). Only last_escalation_times remains as runtime state.
naming.py (new): Extracted get_edge_inference_service_name and get_edge_inference_model_name out of edge_inference.py to break a circular import between edge_inference.py and edge_config_manager.py.
pending_deletion column: Added to the InferenceDeployment DB model. The column has a server default of False, so existing databases are compatible without migration.

Authentication

These new endpoints are unauthenticated, consistent with all other non-inference endpoints on the edge endpoint (status page, metrics, health checks). The edge endpoint assumes a trusted local network.

GET /edge-config and GET /edge-detector-readiness are comparable in sensitivity to the existing status page and metrics.json, which already expose detector IDs, model versions, and readiness.

PUT /edge-config is the only unauthenticated write endpoint that modifies system state. An attacker with network access could add or remove detectors. This is an accepted tradeoff under the trusted-network assumption. If edge endpoints are ever exposed to untrusted networks, this endpoint should be the first to get auth.

Release strategy

This PR must be deployed before the companion SDK PR (python-sdk 0.26.0). The SDK's gl.edge.* methods call endpoints that only exist after this PR is deployed. Releasing the SDK first would cause those methods to 404.

The edge endpoint's groundlight dependency does not need to change for this PR -- it uses SDK features already available in 0.25.x. However, once the SDK is released at 0.26.0, a follow-up bump to >=0.26.0, <0.27.0 is recommended.

Tests

test/api/test_edge_config.py: GET returns 200, PUT validates body (422 on invalid, 200 on valid)
test/core/test_edge_config_manager.py: compute_detector_diff (6 cases), apply_detector_changes (3 cases), EdgeConfigManager (save/load/active roundtrip, mtime caching, startup priority, detector config lookups)

Load Testing

This PR touches some code in the hot path of inference (in what I hope is a trivial way). To ensure that I am not introducing any performance regressions, I performed the following benchmark. The results appear nearly identical, each test achieving a "Max Steady RPS" of 30 with the countstep-yolox-tracking pipeline.

uv run python multiple_client_throughput_test.py COUNT --edge-pipeline-config count-step-yolox-tracking --time-between-ramp 10 --max-clients 8

With changes:

Before changes:

timmarkhuff · 2026-03-27T00:51:38Z

test/integration/test-with-k3s-helm.sh

-cp deploy/helm/groundlight-edge-endpoint/files/default-edge-config.yaml $EDGE_CONFIG_FILE
-sed -i "s/detector_id: \"\"/detector_id: \"$DETECTOR_ID\"/" $EDGE_CONFIG_FILE
-sed -i "s/refresh_rate: 60/refresh_rate: $REFRESH_RATE/" $EDGE_CONFIG_FILE
+cat > $EDGE_CONFIG_FILE <<EOF


We no longer have default-edge-config.yaml (since we rely on Pydantic defaults), so we need to construct a config from scratch here.

…ight/edge-endpoint into tim/edge-accepts-config-requests

brandon-wada

I'm reminded how rusty I am on the edge code. At least all the parts look like they make sense to me

brandon-wada · 2026-03-31T23:41:31Z

app/api/routes/edge_detector_readiness.py

+    detector is responding to health checks.
+    """
+    config = EdgeConfigManager.active()
+    detector_ids = [d.detector_id for d in config.detectors if d.detector_id]


When is detector_id falsey?

brandon-wada · 2026-03-31T23:55:40Z

deploy/helm/groundlight-edge-endpoint/templates/edge-deployment.yaml

          subPath: dummy-nginx.conf
        - name: edge-endpoint-persistent-volume
          mountPath: /opt/groundlight/edge/sqlite
+        - name: edge-endpoint-persistent-volume


Do we want to reuse the existing volume for this?

timmarkhuff requested a review from a team as a code owner March 20, 2026 23:38

timmarkhuff mentioned this pull request Mar 20, 2026

SDK Configures Edge groundlight/python-sdk#419

Open

timmarkhuff and others added 20 commits March 27, 2026 00:48

first pass

cc22f48

checkpoint

169849a

changing how default values are handled

100e08d

Automatically reformatting code with black and isort

7203e3e

got set-edge-endpoint mostly working

22ceeb5

code clean up

0a4e7be

Automatically reformatting code with black and isort

a1db671

adding a test

bec8a53

Automatically reformatting code with black and isort

8d076e1

checkpoint

e08d4e7

Automatically reformatting code with black and isort

219193a

making sure inference-model-updater can see config on disk

3f2ab1b

Automatically reformatting code with black and isort

74cdd77

adding detector readiness endpoint

e7ae21d

refactor of edge config management

a4a01e6

fixing a test

815f81c

refactoring

f423f86

code cleanup

7c5e4e0

Automatically reformatting code with black and isort

950a87d

Automatically reformatting code with black and isort

51f7da8

timmarkhuff force-pushed the tim/edge-accepts-config-requests branch from 4ac4b37 to 51f7da8 Compare March 27, 2026 00:50

timmarkhuff commented Mar 27, 2026

View reviewed changes

timmarkhuff and others added 6 commits March 31, 2026 00:19

code cleanup and fixing a test

96d4b21

fixing a test again

22e0e7a

adjusting a file name

676c8ea

refatoring edge detector readiness

c66d8af

Automatically reformatting code with black and isort

d301edd

code cleanup and adding tests

9ba67ea

timmarkhuff and others added 10 commits March 31, 2026 16:40

Merge branch 'tim/edge-accepts-config-requests' of github.com:groundl…

435013e

…ight/edge-endpoint into tim/edge-accepts-config-requests

Automatically reformatting code with black and isort

963863c

Merge branch 'main' into tim/edge-accepts-config-requests

1cfce19

fixing a failing test

4b2d6db

fixing a failing test

4292472

Automatically reformatting code with black and isort

8a617a0

trigger CI

36243f7

Merge branch 'tim/edge-accepts-config-requests' of github.com:groundl…

3ef0981

…ight/edge-endpoint into tim/edge-accepts-config-requests

fixing an edge case

247ce55

Automatically reformatting code with black and isort

eb52e05

timmarkhuff requested review from brandon-wada and honeytung March 31, 2026 19:17

brandon-wada approved these changes Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

edge-endpoint can be configured by python-sdk#359

edge-endpoint can be configured by python-sdk#359
timmarkhuff wants to merge 36 commits intomainfrom
tim/edge-accepts-config-requests

timmarkhuff commented Mar 20, 2026 •

edited

Loading

Uh oh!

timmarkhuff Mar 27, 2026

Uh oh!

brandon-wada left a comment

Uh oh!

brandon-wada Mar 31, 2026

Uh oh!

brandon-wada Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timmarkhuff commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design decisions

Config as a shared file on PVC

Config loading priority at startup

Detector reconciliation

Deprecation of default-edge-config.yaml

Notable refactorings

Authentication

Release strategy

Tests

Load Testing

Uh oh!

timmarkhuff Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-wada left a comment

Choose a reason for hiding this comment

Uh oh!

brandon-wada Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-wada Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timmarkhuff commented Mar 20, 2026 •

edited

Loading