You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix the handling of worker selection for requested jobs in Launcher class (and potentially subclasses; see #703) of dmod.scheduler. The current implementation is quite brittle. It does not account for potential non-default configuration adjustments in the deployment at large, or properly handle all possible situations (a problem that will be amplified after #662).
Current behavior
In the current, Docker-based implementation, the determine_image_for_job function handles job worker Docker image selection. It hard-codes the registry to 127.0.0.1:5000, even though DMOD supports configuring the internal registry differently via the deployment environment config. E.g., set DOCKER_INTERNAL_REGISTRY to something else in the .env config file, and that will be what's used when worker images are built and pushed, but not what the Launcher tries to use.
A related but separate flaw: there is limited (if any) validation of whether the desired image exists in the referenced registry, or graceful error handling when it does not. Even if the right registry is configured, the desired image may not have been pushed (yet). Strictly speaking this is a potential problem even with the current hard-coded restriction to "127.0.0.1:5000/ngen:latest", and once #662 is complete, the practical situations when this could happen expand greatly.
Expected behavior
The Launcher class should properly reflect non-default configuration settings for the internal Docker registry, or otherwise be able to synchronize its behavior to align with such settings elsewhere. It should also be able to gracefully handle (in tandem with other DMOD classes and services) the error condition of an expected or requested job worker image version not being available in the registry configured for the deployment.
If other similar subclasses are developed related to #703 before this issue is resolved, they should also properly reflect/align with configuration as applicable for that implementation, and gracefully handle expected worker versions being unavailable.
The text was updated successfully, but these errors were encountered:
Fix the handling of worker selection for requested jobs in Launcher class (and potentially subclasses; see #703) of dmod.scheduler. The current implementation is quite brittle. It does not account for potential non-default configuration adjustments in the deployment at large, or properly handle all possible situations (a problem that will be amplified after #662).
Current behavior
In the current, Docker-based implementation, the
determine_image_for_job
function handles job worker Docker image selection. It hard-codes the registry to127.0.0.1:5000
, even though DMOD supports configuring the internal registry differently via the deployment environment config. E.g., setDOCKER_INTERNAL_REGISTRY
to something else in the .env config file, and that will be what's used when worker images are built and pushed, but not what the Launcher tries to use.A related but separate flaw: there is limited (if any) validation of whether the desired image exists in the referenced registry, or graceful error handling when it does not. Even if the right registry is configured, the desired image may not have been pushed (yet). Strictly speaking this is a potential problem even with the current hard-coded restriction to "127.0.0.1:5000/ngen:latest", and once #662 is complete, the practical situations when this could happen expand greatly.
Expected behavior
The Launcher class should properly reflect non-default configuration settings for the internal Docker registry, or otherwise be able to synchronize its behavior to align with such settings elsewhere. It should also be able to gracefully handle (in tandem with other DMOD classes and services) the error condition of an expected or requested job worker image version not being available in the registry configured for the deployment.
If other similar subclasses are developed related to #703 before this issue is resolved, they should also properly reflect/align with configuration as applicable for that implementation, and gracefully handle expected worker versions being unavailable.
The text was updated successfully, but these errors were encountered: