-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One worker with strange symptoms #2051
Comments
blocked with a pointer to this thread. (yesterday i'd seen some further LTC timelosses after the unblock, but let them slide; with this recent batch of errors, reblocked.) |
There are 3 things here
|
BTW when blocking a worker it is best to also send an email. There is a link which brings up a convenient template. Indicate in the message when the email was sent so that someone else does not send another message. |
this is common under load, i.e. the server is not able to keep up, and we have the 30s timeouts (multiple times). |
that's what's peculiar about this case, is that the server wasn't under load, it was strictly worker-side as far as I could figure. I mean the fleet was on at the time but there were no other signs of load, no spsa fails, no vdv workers were stale, so im pretty sure that these stales, from just this one worker, were caused by the worker, as vdbergh notes.
At the present time I'm not really willing to initiate email to strangers... is it possible to use a github email address...? |
Blocked again today for more stale task spam https://tests.stockfishchess.org/actions?action=&user=Sylvain27&text= |
at this point it appears that banning the worker doesn't even accomplish anything, so i leave it in other peoples' hands |
I think the only way to robustly debug such issues is for the worker to upload the log associated with a task to the server, where it should be accessible to a new kind of user: "fishtest-developer". But to do this in a clean way, I think this would require an overhaul in the worker from logging with printf to a proper logging framework. |
So this user is really obnoxious. They keep unblocking the worker without fixing the issue (I even wrote them an email about it, to no avail). If we would restrict connections by username instead of by ip address (an easy change) then we could limit the number of workers this user can use (their other workers are good, last time I looked). |
vdbergh (or anyone) id like if you could investigate this worker https://tests.stockfishchess.org/actions?action=failed_task&user=&text=Sylvain27
the user unblocked this worker today after I'd blocked it yesterday (it was timelossing on LTC!). after unblock, there's a smattering of
Finished match uncleanly
and/api/request_spsa: request['spsa'] (value:{'num_games': 124, 'wins': 0, 'losses': 0, 'draws': 0}) is not of type 'valid_spsa_results'
, and later on it was givingStale active task
so something is up with itit's a symptom pattern that ive not seen before and i dont think i can really investigate thru just the webpages
The text was updated successfully, but these errors were encountered: