SDK Configures Edge by timmarkhuff · Pull Request #419 · groundlight/python-sdk

timmarkhuff · 2026-03-20T23:58:52Z

Summary

Adds SDK methods for reading and modifying edge endpoint configuration at runtime, accessible via a namespaced gl.edge pattern. Depends on companion Edge PR.

gl = ExperimentalApi(endpoint="http://my-edge-endpoint:30101")

config = gl.edge.get_config()
gl.edge.set_config(new_config, timeout_sec=600)

New code

`EdgeAPI` class (`groundlight/edge/api.py`)

Provides three methods:

get_config() -- calls GET /edge-config, returns EdgeEndpointConfig
set_config(config, timeout_sec) -- calls PUT /edge-config, then polls GET /edge-detector-readiness until all desired detectors are ready (or timeout)
get_detector_readiness() -- calls GET /edge-detector-readiness, returns dict[str, bool]

All HTTP calls go through _request(), which raises EdgeNotAvailableError on 404 or connection failure with an informative message guiding the user to check their endpoint configuration.

`EdgeNotAvailableError`

New exception in groundlight.client. Raised when an edge-only method is called against a non-edge endpoint (cloud API, unreachable host). The error message includes a hint about the GROUNDLIGHT_ENDPOINT env var.

`gl.edge` access pattern

ExperimentalApi.__init__ creates an EdgeAPI instance as self.edge. Edge methods are only accessible through this namespace.

Version

Bumped to 0.26.0 (minor version bump for new public API surface).

Release order

The companion edge-endpoint PR must be deployed first. The gl.edge.* methods call endpoints (/edge-config, /edge-detector-readiness) that only exist after the edge PR is deployed. Releasing this SDK version first would cause those methods to raise EdgeNotAvailableError (404).

Authentication note

These methods do not add authentication headers beyond what the SDK already sends (the standard x-api-token). The edge endpoint does not currently enforce auth on the config or readiness endpoints. The SDK sends the token in case auth is added in the future, but it is not required today.

…on-sdk into tim/sdk-configures-edge

brandon-wada

Couple of fixes to take a look at, mostly to ensure some stylistic rules

brandon-wada · 2026-03-31T22:52:10Z

test/unit/test_edge_config.py

Maybe worth a small mocked test on the put method and detector readiness method

Added test_edge_set_config_sends_payload_and_polls and test_edge_get_detector_readiness.

brandon-wada · 2026-03-31T22:58:46Z

src/groundlight/experimental_api.py

        self.detector_reset_api = DetectorResetApi(self.api_client)

        self.edge_api = EdgeApi(self.api_client)
+        self.edge = EdgeAPI(self)


Something has gone very wrong here. I think what we're looking for is for edge_base_url to be defined in the EdgeApi, and for there to only be one EdgeAPI defined on the experimental client

These are two separate APIs that serve different purposes. EdgeApi (OpenAPI-generated) calls the cloud API for presigned model download URLs -- used internally by download_mlbinary(). EdgeEndpointApi (new) talks directly to the edge endpoint for config management (gl.edge.get_config(), etc.).

I have adjusted the naming and added comments to disambiguate.

brandon-wada · 2026-03-31T22:59:52Z

src/groundlight/client.py

    pass


+class EdgeNotAvailableError(GroundlightClientError):


Is there a reason to define this in client rather than experimental_api? This error should only be possible in while using the experimental client, no?

Correct that it is only be possible from the experimental client at this point.

We don't currently have any exceptions that are specific to the experimental client, so I decided not to put it there. I considered grouping it with the edge stuff, e.g. groundlight.edge.exceptions, but I decided not to introduce a new pattern for a single exception.

brandon-wada · 2026-03-31T23:02:54Z

src/groundlight/edge/api.py

+        deadline = time.time() + timeout_sec
+        while time.time() < deadline:
+            readiness = self.get_detector_readiness()
+            if desired_ids and all(readiness.get(did, False) for did in desired_ids):


Via Claude: edge condition if the detector list is empty. It should either return immediately or return a different error rather than timeout

brandon-wada · 2026-03-31T23:08:31Z

src/groundlight/edge/api.py

+        headers = self._client.get_raw_headers()
+        try:
+            response = requests.request(
+                method, url, headers=headers, verify=self._client.configuration.verify_ssl, timeout=10, **kwargs


Claude has a nit: 10 second default is fine but should be overridable via kwargs

I think this would add complexity without any value to the SDK user. All of these methods should return very quickly. 10 seconds is a reasonable and generous timeout. If the request is taking longer than this, there is something wrong with the network, which won't be fixed by adjusting the timeout.

Internal method, I'll buy that

brandon-wada · 2026-03-31T23:13:35Z

src/groundlight/edge/api.py

+        self._request("PUT", "/edge-config", json=config.to_payload())
+
+        poll_interval_seconds = 1
+        desired_ids = {d.detector_id for d in config.detectors if d.detector_id}


Claude: d.detector_id should never be falsey, it's misleading to have the check here
Me: However, in the current pydantic config I don't think we actually forbid the empty string. We probably should though

Agreed. I added pydantic validation and removed the defensive code that assumes detector ID can be falsey.

…on-sdk into tim/sdk-configures-edge

brandon-wada · 2026-04-07T00:53:08Z

docs/docs/guide/8-edge.md

brandon-wada · 2026-04-07T00:54:53Z

src/groundlight/edge/api.py

+        headers = self._client.get_raw_headers()
+        try:
+            response = requests.request(
+                method, url, headers=headers, verify=self._client.configuration.verify_ssl, timeout=10, **kwargs


Internal method, I'll buy that

## Summary Adds HTTP endpoints for reading and modifying the edge endpoint configuration at runtime, without requiring a Helm redeploy. A [Python SDK companion PR](groundlight/python-sdk#419) provides the client-side methods. **New endpoints:** - `GET /edge-config` -- returns the active `EdgeEndpointConfig` - `PUT /edge-config` -- replaces the active config (diffs against current state, adds/removes detector pods accordingly) - `GET /edge-detector-readiness` -- reports which configured detectors have inference pods ready to serve ## Key design decisions ### Config as a shared file on PVC Multiple uvicorn workers cannot share in-memory state. The active config is persisted to a YAML file on the existing PVC (`/opt/groundlight/edge/config/active-edge-config.yaml`). Workers read it via an mtime-based cache (`EdgeConfigManager.active()`), which calls `os.path.getmtime` on each access and only re-parses when the file has changed. This gives cross-worker consistency with negligible overhead. ### Config seeding via init container An `apply-edge-config` init container copies the Helm ConfigMap to the active config file on PVC. It uses a revision fingerprint (revision + deploy timestamp) stored on PVC to distinguish `helm install`/`upgrade` from restarts -- only copying when the fingerprint changes. This ensures Helm config is applied on deploy while SDK changes survive restarts. At startup, `main.py` checks: `EDGE_CONFIG` env var (for Balena) > active config on PVC > Pydantic defaults. ### Backward compatibility Older Helm charts lack the init container. In that case, `main.py` detects the missing active config file and falls back to copying the Helm ConfigMap directly. ### Detector reconciliation `PUT /edge-config` triggers `reconcile_config()`, which: 1. Diffs the desired detectors against DB records (`compute_detector_diff`) 2. Marks removed detectors as `pending_deletion` in the DB 3. Creates new deployment records for added detectors 4. Saves the new config to disk The `inference-model-updater` picks up these DB changes on its next loop iteration -- deleting pods for `pending_deletion` records, then creating pods for new ones. This could cause the config update process to take a bit longer than necessary (because we have to wait for the refresh loop to occur). I thought about adding logic to make the inference-model-updater poll for config updates while it waits, and terminate early if it sees one. In the end, I decided to not add that (yet). YAGNI. ### Deprecation of default-edge-config.yaml The static `deploy/helm/groundlight-edge-endpoint/files/default-edge-config.yaml` has been removed. Default config is now defined by Pydantic model defaults in the `python-sdk`. When no config file is provided to Helm, `_helpers.tpl` generates an empty YAML object, and the system uses Pydantic defaults. ## Notable refactorings - **`EdgeConfigManager`** (new, replaces `edge_config_loader.py`): Singleton-style class with class methods for config lifecycle -- `save()`, `active()`, `detector_configs()`, `detector_config()`. - **`EdgeInferenceManager` simplification**: Removed stored `detector_inference_configs`, `inference_client_urls`, `oodd_inference_client_urls`, `min_times_between_escalations`, and the 30-second scheduler poll. The manager now computes pod URLs on the fly from detector IDs and reads config from `EdgeConfigManager.active()`. Only `last_escalation_times` remains as runtime state. - **`naming.py`** (new): Extracted `get_edge_inference_service_name` and `get_edge_inference_model_name` out of `edge_inference.py` to break a circular import between `edge_inference.py` and `edge_config_manager.py`. - **`pending_deletion` column**: Added to the `InferenceDeployment` DB model. The column has a server default of `False`, so existing databases are compatible without migration. - **`apply-edge-config` init container** (new): Busybox-based init container added to the edge-endpoint Deployment. Uses a revision fingerprint to conditionally copy Helm config to PVC. Script lives in `files/apply-edge-config.sh`, embedded in the `edge-config` ConfigMap. ## Authentication These new endpoints are **unauthenticated**, consistent with all other non-inference endpoints on the edge endpoint (status page, metrics, health checks). The edge endpoint assumes a trusted local network. `GET /edge-config` and `GET /edge-detector-readiness` are comparable in sensitivity to the existing status page and `metrics.json`, which already expose detector IDs, model versions, and readiness. `PUT /edge-config` is the only unauthenticated write endpoint that modifies system state. An attacker with network access could add or remove detectors. This is an accepted tradeoff under the trusted-network assumption. If edge endpoints are ever exposed to untrusted networks, this endpoint should be the first to get auth. ## Release strategy This PR must be deployed **before** the companion SDK PR (`python-sdk` `0.26.0`). The SDK's `gl.edge.*` methods call endpoints that only exist after this PR is deployed. Releasing the SDK first would cause those methods to 404. The edge endpoint's `groundlight` dependency does not need to change for this PR -- it uses SDK features already available in `0.25.x`. However, once the SDK is released at `0.26.0`, a follow-up bump to `>=0.26.0, <0.27.0` is recommended. ## Tests - `test/api/test_edge_config.py`: GET returns 200, PUT validates body (422 on invalid, 200 on valid) - `test/core/test_edge_config_manager.py`: `compute_detector_diff` (6 cases), `apply_detector_changes` (3 cases), `EdgeConfigManager` (save/load/active roundtrip, mtime caching, startup priority, detector config lookups) ## Load Testing This PR touches some code in the hot path of inference (in what I hope is a trivial way). To ensure that I am not introducing any performance regressions, I performed the following benchmark. The results appear nearly identical, each test achieving a "Max Steady RPS" of 30 with the `countstep-yolox-tracking` pipeline. ```python uv run python multiple_client_throughput_test.py COUNT --edge-pipeline-config count-step-yolox-tracking --time-between-ramp 10 --max-clients 8 ``` With changes: <img width="1000" height="600" alt="image" src="proxy.php?url=https%3A%2F%2Fgithub.com%2F%3Ca+href%3D"https://github.com/user-attachments/assets/7414a6f4-cb97-443f-b4a9-3724a8edfc2d">https://github.com/user-attachments/assets/7414a6f4-cb97-443f-b4a9-3724a8edfc2d" /> Before changes: <img width="1000" height="600" alt="image" src="proxy.php?url=https%3A%2F%2Fgithub.com%2F%3Ca+href%3D"https://github.com/user-attachments/assets/ff5473e2-7f13-4da0-97a5-077a455179cf">https://github.com/user-attachments/assets/ff5473e2-7f13-4da0-97a5-077a455179cf" /> --------- Co-authored-by: Auto-format Bot <[email protected]>

first pass

157ea54

timmarkhuff mentioned this pull request Mar 20, 2026

edge-endpoint can be configured by python-sdk groundlight/edge-endpoint#359

Merged

timmarkhuff and others added 13 commits March 24, 2026 22:17

adding detector readiness check

b3f35a2

Automatically reformatting code

993857f

implementing the gl.edge pattern

237c39f

resolving merge conflicts

69b256d

Automatically reformatting code

dde221c

cleanup

e7c7e96

Merge branch 'tim/sdk-configures-edge' of github.com:groundlight/pyth…

4e89881

…on-sdk into tim/sdk-configures-edge

Automatically reformatting code

b0b6b82

bumping version

9c45e06

Merge branch 'tim/sdk-configures-edge' of github.com:groundlight/pyth…

ac06a60

…on-sdk into tim/sdk-configures-edge

addressing linter errors and cleaning up code

91e5839

fixing another linter issue

8446a4e

fixing another linter error

8d9464f

timmarkhuff requested review from brandon-wada and honeytung March 31, 2026 19:17

brandon-wada reviewed Mar 31, 2026

View reviewed changes

timmarkhuff and others added 12 commits April 1, 2026 00:07

fixing an edge case

bce92fe

adding pydantic validation

e96ea12

adjusting comment

89b083f

improving some names

00198f2

fixing linter error

80b3ac5

adding documentation

2e393e9

responding to pr feedback

8e576f4

adding more pydantic validation

21282ba

fixing a test

5a4062d

Automatically reformatting code

bc3828f

adjusting test and documentation

5da4875

Merge branch 'tim/sdk-configures-edge' of github.com:groundlight/pyth…

a2ec211

…on-sdk into tim/sdk-configures-edge

timmarkhuff requested a review from brandon-wada April 6, 2026 17:36

brandon-wada approved these changes Apr 7, 2026

View reviewed changes

timmarkhuff merged commit bda1afb into main Apr 7, 2026
8 checks passed

Conversation

timmarkhuff commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New code

EdgeAPI class (groundlight/edge/api.py)

EdgeNotAvailableError

gl.edge access pattern

Version

Release order

Authentication note

Uh oh!

brandon-wada left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timmarkhuff commented Mar 20, 2026 •

edited

Loading

`EdgeAPI` class (`groundlight/edge/api.py`)

`EdgeNotAvailableError`

`gl.edge` access pattern