-
Notifications
You must be signed in to change notification settings - Fork 1
Make cluster directory access more robust #24
Comments
From @LeeKamentsky on January 29, 2014 19:28 I can always try a certain number of times, maybe with pauses in between, On Wed, Jan 29, 2014 at 2:17 PM, David Logan [email protected]:
|
From @dlogan on January 29, 2014 19:37 Aha, you are likely right. I just checked and (so far) the node is always the same, node1625.
|
From @dlogan on January 29, 2014 21:23 I emailed Help to get them to look at node1625 |
From @ljosa on January 31, 2014 19:15 I think these kinds of errors are rarely temporary enough that it makes sense to sleep and retry; that only delays the inevitable. Better to fail fast and restart failed jobs from the top level. Ideally, BatchProfiler should do that ASAP instead of waiting for a human to diagnose and trigger restarts, but I guess we don't want to rewrite BP right now… |
From @dlogan on January 29, 2014 19:17
Batch # 4203 http://imagingweb.broadinstitute.org/batchprofiler/cgi-bin/FileUI/CellProfiler/BatchProfiler/ViewBatch.py?batch_id=4203
is running, however 4% of it's batches have failed so far, all with the same error (example below). They all seem to be a temporary directory access failure. I presume temporary because I can manually cd to the supposedly offending directory just fine.
Instead of me resubmitting them manually, can we add a "try, wait, try again" loop in loadimages?
...
Copied from original issue: CellProfiler/CellProfiler#1033
The text was updated successfully, but these errors were encountered: