-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flag_autopowerspec fails with 'IOError: [Errno 28] No space left on device ' #1561
Comments
@bennahugo could you have a look? |
Not very familiar with the Ilifu cluster setup, so can't really comment there. It looks like an space problem on the cluster in your run directory to me in making plots at high resolution. You may need to run from somewhere with more space and not home, depending on your quota allocations? Default for the plotting is 300 dpi. Maybe setting this much lower in your recipe may help? I don't use this software any more though -- it is much better and reliable to flag GNSS saturation, LNA cycling errors and dropouts by hand - I would recommend this approach. Plotting the autocorrelations (*&&& notation in CASA) and flagging the relevant time periods is quick to do. |
@spectram unfortunately the dpi is not a user setting in caracal. If you want to test @bennahugo 's idea you could try to set the dpi to a much lower value in caracal/caracal/workers/flag_worker.py Line 130 in 25161c2
|
Thanks @paoloserra and @bennahugo. The mount should have sufficient memory (83 TB remaining I believe). I will check with ilifu helpdesk once again. I can also try modifying the dpi parameter in the code to see if that resolves the issue. Further, if space were the issue, other datasets would have produced the same error right? |
Not sure -- it is clearly an IO error. It is likely not the total disk
capacity that is an issue, but your quotas (quota -s). Most likely it is to
do with where you are running. I can imagine making waterfall plots of 32k
channels at high resolution may cause space issues if you are running from
an entrypoint where there isn't a lot of quota (home instead of scratch
space?)
…On Thu, Jan 18, 2024 at 11:17 AM Sriram Sankar ***@***.***> wrote:
Thanks @paoloserra <https://github.com/paoloserra> and @bennahugo
<https://github.com/bennahugo>. The mount should have sufficient memory
(83 TB remaining I believe). I will check with ilifu helpdesk once again. I
can also try modifying the dpi parameter in the code to see if that
resolves the issue. Further, if space were the issue, other datasets would
have produced the same error right?
—
Reply to this email directly, view it on GitHub
<#1561 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4RE6TKFVDYDG6QCIHDEVTYPDSEHAVCNFSM6AAAAABB7ZADS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGA4DQOBQG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Benjamin Hugo
|
I changed dpi to 100 and the error persists. Perhaps it has something to do with the "All NaN slice encountered" runtime warning. I am attaching the latest log below (It's the same as the previous log). From Jeremy (ilifu):
I am leaning towards not using flag_autopowerspec to circumvent the issue. |
Not sure, perhaps something is funky with boundaries or something when there is no unflagged data in a chunk. You can try adjusting the chunk length perhaps I would just disable this step and flag saturation events by hand though :) |
|
While running the flag worker on a 32K - 100 MHz dataset on ilifu, the job fails at the autopowerspec flagging step. The full log is attached, an error that sticks out is
# IOError: [Errno 28] No space left on device
- There is an All-NaN slice encountered warning just before the error.The yaml inputs are as follows.
The scratch3 mount has sufficient storage and the job has more than enough RAM (160GB across 10 cores - Seff returns ~ 25% memory efficiency). Jeremy also confirmed that there were no alerts about the local disk on the compute node reaching capacity. Two attempts on this dataset with slightly different memory allocations have yielded the same result. I haven't encountered this error on other datasets. Please advise.
log-caracal-autopowerspec.txt
The text was updated successfully, but these errors were encountered: