validator: handle transient errors #5803

davidpanderson · 2024-09-10T00:27:05Z

A 'transient error' is one that will go away in a while, e.g. an fopen failure because of a broken NSF mount. In general, the BOINC back-end code (validation, assimilation) handles transient errors sensibly:
if there's a bad NSF mount, it retries validation for a few hours rather than marking thousands of jobs as failed.

Extend this to script-based validation.
If a script (either init_result or compare_results) exits with 3, treat that as a transient error.
Treat other nonzero exits (or lack of an exit code) as a permanent error.

More generally (for all validators) add a return value VAL_RESULT_TRANSIENT_ERROR for init_result() and compute_results(). This means any transient error.
Previously we checked only for ERR_OPENDIR.
And for compare_results() we treated all nonzero returns as permanent.

Fixes #5799

A 'transient error' is one that will go away in a while, e.g. an fopen failure because of a broken NSF mount. In general, the BOINC back-end code (validation, assimilation) handles transient errors sensibly: if there's a bad NSF mount, it retries validation for a few hours rather than marking thousands of jobs as failed. Extend this to script-based validation. If a script (either init_result or compare_results) exits with 3, treat that as a transient error. Treat other nonzero exits (or lack of an exit code) as a permanent error. More generally (for all validators) add a return value VAL_RESULT_TRANSIENT_ERROR for init_result() and compute_results(). This means any transient error. Previously we checked only for ERR_OPENDIR. And for compare_results() we treated all nonzero returns as permanent.

codecov · 2024-09-10T00:41:21Z

Codecov Report

Attention: Patch coverage is 0% with 68 lines in your changes missing coverage. Please review.

Project coverage is 10.49%. Comparing base (b11cd70) to head (d94d625).
Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
sched/validate_util2.cpp	0.00%	49 Missing ⚠️
sched/script_validator.cpp	0.00%	19 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #5803      +/-   ##
============================================
- Coverage     10.50%   10.49%   -0.02%     
  Complexity     1068     1068              
============================================
  Files           280      280              
  Lines         35972    36019      +47     
  Branches       8448     8444       -4     
============================================
  Hits           3780     3780              
- Misses        31798    31845      +47     
  Partials        394      394

Files with missing lines	Coverage Δ
sched/db_purge.cpp	`0.00% <ø> (ø)`
sched/script_validator.cpp	`0.00% <0.00%> (ø)`
sched/validate_util2.cpp	`0.00% <0.00%> (ø)`

AenBleidd added C: Server - Validator P: Minor PR: Bugfix labels Sep 10, 2024

AenBleidd added this to the Server Release 1.6.0 milestone Sep 10, 2024

AenBleidd approved these changes Sep 10, 2024

View reviewed changes

AenBleidd merged commit a2b61ad into master Sep 10, 2024
146 of 147 checks passed

AenBleidd deleted the dpa_script_val2 branch September 10, 2024 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validator: handle transient errors #5803

validator: handle transient errors #5803

davidpanderson commented Sep 10, 2024

codecov bot commented Sep 10, 2024 •

edited

Loading

validator: handle transient errors #5803

validator: handle transient errors #5803

Conversation

davidpanderson commented Sep 10, 2024

codecov bot commented Sep 10, 2024 • edited Loading

Codecov Report

codecov bot commented Sep 10, 2024 •

edited

Loading