As a developer, if the core server completes an evaluation, I don't want the user to receive a failure because of a timeout during relaying outputs to the shim. #330
Labels
bug
Something isn't working
See VLab #120579-238. The evaluation was pretty small and fast with little stdout and little output. In the stdout log, I saw this:
Note the warning. The status endpoint returns a failure (better messaging, above, about that failure would be nice, since it was just a warning).
In the shim, I saw this exception (in context):
Evan postulated that something slowed the ability the shim to get the final status update and outputs from the core server within a set time out, and that led to an exception which lead to the status failure being returned to the user.
The problem is not easily reproducible. The exact same evaluation completed successfully minutes later and I have not seen that exception, or any related exception like that (i.e., representing a shim-server breakdown), in months.
I'll add a comment with notes from Evan and James in the ticket after posting,
Hank
The text was updated successfully, but these errors were encountered: