-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fatal Python error: Cannot recover from stack overflow. #80
Comments
Please run in single threaded mode to recover the actual trace. Cron jobs
ran a few days ago, so don't think this is upstream breakage
…On Wed, 17 Feb 2021, 19:12 Oleg Smirnov, ***@***.***> wrote:
- Tricolour version: 0.1.7.-py3
- Python version: 3.6
- Operating System: 18.04 (hall.ru.ac.za)
Description
Boom!
What I Did
Fresh venv, pip install tricolour, then:
$ tricolour -fs polarisation ../msdir/1557347448_sdp_l0-ESO137_001-corr-subset.ms/
tricolour - 2021-02-17 19:06:55,921 INFO -
*******************************************************************************
_______ _ _
|__ __| (_) | |
| |_ __ _ ___ ___ | | ___ _ _ _ __
| | '__| |/ __/ _ \| |/ _ \| | | | '__|
| | | | | (_| (_) | | (_) | |_| | |
|_|_| |_|\___\___/|_|\___/ \__,_|_|
Viva la révolution!
tricolour - 2021-02-17 19:06:55,921 INFO - Flagging on the DATA column
tricolour.mask - 2021-02-17 19:06:55,921 INFO - Looking for static masks...
tricolour.mask - 2021-02-17 19:06:55,921 INFO - Searching /etc/tricolour
tricolour.mask - 2021-02-17 19:06:55,921 INFO - Searching /home/oms/.venv/cc/etc/tricolour
tricolour.mask - 2021-02-17 19:06:55,921 INFO - Searching /home/oms/.config/tricolour
tricolour.mask - 2021-02-17 19:06:55,921 INFO - Searching /home/oms/.tricolour
tricolour.mask - 2021-02-17 19:06:55,921 INFO - Searching /home/oms/.venv/cc/lib/python3.6/site-packages/tricolour/data
tricolour.mask - 2021-02-17 19:06:55,921 INFO - Found static mask file /home/oms/.venv/cc/lib/python3.6/site-packages/tricolour/data/4k_lband_meerkat.staticm
ask
tricolour.mask - 2021-02-17 19:06:55,922 INFO - Found static mask file /home/oms/.venv/cc/lib/python3.6/site-packages/tricolour/data/4k_uhfband_meerkat.stati
cmask
tricolour.mask - 2021-02-17 19:06:55,923 INFO - Loaded mask /home/oms/.venv/cc/lib/python3.6/site-packages/tricolour/data/4k_lband_meerkat.staticmask (non-di
lated) with 41.50% flagged bandwidth between 0.856 and 1.712 GHz
tricolour.mask - 2021-02-17 19:06:55,923 INFO - Loaded mask /home/oms/.venv/cc/lib/python3.6/site-packages/tricolour/data/4k_uhfband_meerkat.staticmask (non-
dilated) with 4.64% flagged bandwidth between 0.544 and 1.088 GHz
tricolour - 2021-02-17 19:06:55,923 INFO - *****************************************
tricolour - 2021-02-17 19:06:55,923 INFO - The following strategies will be applied:
tricolour - 2021-02-17 19:06:55,923 INFO - *****************************************
tricolour - 2021-02-17 19:06:55,923 INFO - 0: flag_nans_zeros (nan_dropouts_flag)
tricolour - 2021-02-17 19:06:55,923 INFO - 1: apply_static_mask (background_static_mask)
tricolour - 2021-02-17 19:06:55,923 INFO - accumulation_mode: or
tricolour - 2021-02-17 19:06:55,923 INFO - uvrange:
tricolour - 2021-02-17 19:06:55,924 INFO - 2: sum_threshold (background_flags)
tricolour - 2021-02-17 19:06:55,924 INFO - outlier_nsigma: 10
tricolour - 2021-02-17 19:06:55,924 INFO - windows_time: [1, 2, 4, 8]
tricolour - 2021-02-17 19:06:55,924 INFO - windows_freq: [1, 2, 4, 8]
tricolour - 2021-02-17 19:06:55,924 INFO - background_reject: 2.0
tricolour - 2021-02-17 19:06:55,924 INFO - background_iterations: 5
tricolour - 2021-02-17 19:06:55,924 INFO - spike_width_time: 12.5
tricolour - 2021-02-17 19:06:55,924 INFO - spike_width_freq: 10.0
tricolour - 2021-02-17 19:06:55,924 INFO - time_extend: 3
tricolour - 2021-02-17 19:06:55,924 INFO - freq_extend: 3
tricolour - 2021-02-17 19:06:55,924 INFO - freq_chunks: 10
tricolour - 2021-02-17 19:06:55,924 INFO - average_freq: 1
tricolour - 2021-02-17 19:06:55,924 INFO - flag_all_time_frac: 0.6
tricolour - 2021-02-17 19:06:55,924 INFO - flag_all_freq_frac: 0.8
tricolour - 2021-02-17 19:06:55,924 INFO - rho: 1.3
tricolour - 2021-02-17 19:06:55,924 INFO - num_major_iterations: 5
tricolour - 2021-02-17 19:06:55,924 INFO - 3: uvcontsub_flagger (residual_flag_initial)
tricolour - 2021-02-17 19:06:55,924 INFO - major_cycles: 7
tricolour - 2021-02-17 19:06:55,924 INFO - or_original_from_cycle: 1
tricolour - 2021-02-17 19:06:55,924 INFO - taylor_degrees: 20
tricolour - 2021-02-17 19:06:55,924 INFO - sigma: 15.0
tricolour - 2021-02-17 19:06:55,924 INFO - 4: flag_nans_zeros (nan_dropouts_reflag)
tricolour - 2021-02-17 19:06:55,924 INFO - 5: apply_static_mask (uvrange_static_mask)
tricolour - 2021-02-17 19:06:55,924 INFO - accumulation_mode: or
tricolour - 2021-02-17 19:06:55,924 INFO - uvrange: 0~550
tricolour - 2021-02-17 19:06:55,924 INFO - 6: sum_threshold (final_st_very_broad)
tricolour - 2021-02-17 19:06:55,924 INFO - outlier_nsigma: 10
tricolour - 2021-02-17 19:06:55,924 INFO - windows_time: [1, 2, 4, 8]
tricolour - 2021-02-17 19:06:55,924 INFO - windows_freq: [32, 48, 64, 128]
tricolour - 2021-02-17 19:06:55,924 INFO - background_reject: 2.0
tricolour - 2021-02-17 19:06:55,924 INFO - background_iterations: 5
tricolour - 2021-02-17 19:06:55,925 INFO - spike_width_time: 6.5
tricolour - 2021-02-17 19:06:55,925 INFO - spike_width_freq: 64.0
tricolour - 2021-02-17 19:06:55,925 INFO - time_extend: 3
tricolour - 2021-02-17 19:06:55,925 INFO - freq_extend: 3
tricolour - 2021-02-17 19:06:55,925 INFO - freq_chunks: 10
tricolour - 2021-02-17 19:06:55,925 INFO - average_freq: 1
tricolour - 2021-02-17 19:06:55,925 INFO - flag_all_time_frac: 0.6
tricolour - 2021-02-17 19:06:55,925 INFO - flag_all_freq_frac: 0.8
tricolour - 2021-02-17 19:06:55,925 INFO - num_major_iterations: 1
tricolour - 2021-02-17 19:06:55,925 INFO - 7: sum_threshold (final_st_broad)
tricolour - 2021-02-17 19:06:55,925 INFO - outlier_nsigma: 10
tricolour - 2021-02-17 19:06:55,925 INFO - windows_time: [1, 2, 4, 8]
tricolour - 2021-02-17 19:06:55,925 INFO - windows_freq: [1, 2, 4, 8]
tricolour - 2021-02-17 19:06:55,925 INFO - background_reject: 2.0
tricolour - 2021-02-17 19:06:55,925 INFO - background_iterations: 5
tricolour - 2021-02-17 19:06:55,925 INFO - spike_width_time: 6.5
tricolour - 2021-02-17 19:06:55,925 INFO - spike_width_freq: 10.0
tricolour - 2021-02-17 19:06:55,925 INFO - time_extend: 3
tricolour - 2021-02-17 19:06:55,925 INFO - freq_extend: 3
tricolour - 2021-02-17 19:06:55,925 INFO - freq_chunks: 10
tricolour - 2021-02-17 19:06:55,925 INFO - average_freq: 1
tricolour - 2021-02-17 19:06:55,925 INFO - flag_all_time_frac: 0.6
tricolour - 2021-02-17 19:06:55,925 INFO - flag_all_freq_frac: 0.8
tricolour - 2021-02-17 19:06:55,925 INFO - rho: 1.3
tricolour - 2021-02-17 19:06:55,925 INFO - num_major_iterations: 1
tricolour - 2021-02-17 19:06:55,925 INFO - 8: sum_threshold (final_st_narrow)
tricolour - 2021-02-17 19:06:55,925 INFO - outlier_nsigma: 10
tricolour - 2021-02-17 19:06:55,925 INFO - windows_time: [1, 2, 4, 8]
tricolour - 2021-02-17 19:06:55,925 INFO - windows_freq: [1, 2, 4, 8]
tricolour - 2021-02-17 19:06:55,925 INFO - background_reject: 2.0
tricolour - 2021-02-17 19:06:55,925 INFO - background_iterations: 5
tricolour - 2021-02-17 19:06:55,925 INFO - spike_width_time: 2
tricolour - 2021-02-17 19:06:55,926 INFO - spike_width_freq: 10.0
tricolour - 2021-02-17 19:06:55,926 INFO - time_extend: 3
tricolour - 2021-02-17 19:06:55,926 INFO - freq_extend: 3
tricolour - 2021-02-17 19:06:55,926 INFO - freq_chunks: 10
tricolour - 2021-02-17 19:06:55,926 INFO - average_freq: 1
tricolour - 2021-02-17 19:06:55,926 INFO - flag_all_time_frac: 0.6
tricolour - 2021-02-17 19:06:55,926 INFO - flag_all_freq_frac: 0.8
tricolour - 2021-02-17 19:06:55,926 INFO - rho: 1.3
tricolour - 2021-02-17 19:06:55,926 INFO - num_major_iterations: 1
tricolour - 2021-02-17 19:06:55,926 INFO - 9: uvcontsub_flagger (residual_flag_final)
tricolour - 2021-02-17 19:06:55,926 INFO - major_cycles: 10
tricolour - 2021-02-17 19:06:55,926 INFO - or_original_from_cycle: 0
tricolour - 2021-02-17 19:06:55,926 INFO - taylor_degrees: 25
tricolour - 2021-02-17 19:06:55,926 INFO - sigma: 13.0
tricolour - 2021-02-17 19:06:55,926 INFO - 10: flag_autos (flag_autos)
tricolour - 2021-02-17 19:06:55,926 INFO - 11: combine_with_input_flags (combine_with_input_flags)
tricolour - 2021-02-17 19:06:55,926 INFO - ***************** END ********************
tricolour - 2021-02-17 19:06:55,926 INFO - Flagging based on quadrature polarized power
tricolour - 2021-02-17 19:06:57,151 INFO - Only considering scans '2, 4, 6, 8, 10, 12, 14' as per user selection criterion
tricolour - 2021-02-17 19:06:57,152 INFO - Adding field 'ESO137-001' scan 2 to compute graph for processing
/home/oms/.venv/cc/lib/python3.6/site-packages/tricolour/packing.py:346: PerformanceWarning: Increasing number of chunks by factor of 77
dtype=np.bool)
/home/oms/.venv/cc/lib/python3.6/site-packages/tricolour/packing.py:354: PerformanceWarning: Increasing number of chunks by factor of 77
dtype=data.dtype)
/home/oms/.venv/cc/lib/python3.6/site-packages/tricolour/packing.py:361: PerformanceWarning: Increasing number of chunks by factor of 77
dtype=flags.dtype)
Thread 0x00007fb772ffd700 (most recent call first):
File "/usr/lib/python3.6/threading.py", line 295 in wait
File "/usr/lib/python3.6/queue.py", line 164 in get
File "/usr/lib/python3.6/concurrent/futures/thread.py", line 67 in _worker
File "/usr/lib/python3.6/threading.py", line 864 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
Thread 0x00007fb7737fe700 (most recent call first):
File "/usr/lib/python3.6/threading.py", line 295 in wait
File "/usr/lib/python3.6/queue.py", line 164 in get
File "/usr/lib/python3.6/multiprocessing/pool.py", line 463 in _handle_results
File "/usr/lib/python3.6/threading.py", line 864 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
Thread 0x00007fb773fff700 (most recent call first):
File "/usr/lib/python3.6/threading.py", line 295 in wait
File "/usr/lib/python3.6/queue.py", line 164 in get
File "/usr/lib/python3.6/multiprocessing/pool.py", line 415 in _handle_tasks
File "/usr/lib/python3.6/threading.py", line 864 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
Thread 0x00007fb790ff9700 (most recent call first):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 406 in _handle_workers
File "/usr/lib/python3.6/threading.py", line 864 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
Thread 0x00007fb7917fa700 (most recent call first):
File "/usr/lib/python3.6/threading.py", line 295 in wait
File "/usr/lib/python3.6/queue.py", line 164 in get
File "/usr/lib/python3.6/multiprocessing/pool.py", line 108 in worker
File "/usr/lib/python3.6/threading.py", line 864 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
Thread 0x00007fb791ffb700 (most recent call first):
File "/usr/lib/python3.6/threading.py", line 295 in wait
File "/usr/lib/python3.6/queue.py", line 164 in get
File "/usr/lib/python3.6/multiprocessing/pool.py", line 108 in worker
File "/usr/lib/python3.6/threading.py", line 864 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
...
Aborted (core dumped)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#80>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4RE6WYIARS5IWS74GXODTS7P2JHANCNFSM4XYXNZVA>
.
|
With
Infinite recursion maybe? |
Dunno if it's related, but occasional dumping of cores has been seen before: IanHeywood/oxkat#34 Running the same thing twice would sometimes dump, sometimes not. (I had to laugh when Jack said "computers are nothing if not deterministic" in his excellent RCI talk.) EDIT: This could also be a singularity issue, I really have no idea. |
I did |
Odd very odd. Can we try roll back dask and daskms to a state prior to the ammended sheduling metadata changes? I've never seen this prior to to Ian's first mentions Specifically, say dask 2.30? This really looks like a upstream issue in dask itself. edit: I certainly did not see it in my MANY MANY runs in september/october to attempt profiling and characterization for the paper |
For references sake: sjperkins@hall:/home/oms$ . .venv/tricol/bin/activate
(tricol) sjperkins@hall:/home/oms$ python --version
Python 3.6.9
(tricol) sjperkins@hall:/home/oms$ pip freeze
asciitree==0.3.3
dask==2021.2.0
dask-ms==0.2.6
donfig==0.6.0
fasteners==0.16
future==0.18.2
llvmlite==0.35.0
numba==0.52.0
numcodecs==0.7.3
numpy==1.19.5
pkg-resources==0.0.0
python-casacore==3.3.1
PyYAML==5.4.1
scipy==1.5.4
six==1.15.0
threadpoolctl==2.1.0
toolz==0.11.1
tricolour==0.1.7
zarr==2.6.1 |
So which versions of dask and dask-ms would you like me to try? |
Lets roll her back to September/October last year (https://pypi.org/project/dask/2.29.0/ and the current release (https://github.com/ska-sa/dask-ms/blob/0.2.6/setup.py indicates no conflict >= 2.2.0). You can iteratively roll back until we find the breaking release (hopefully) |
Can confirm that it breaks with |
I set up a virtual environment on jake. pip freeze results follow: sjperkins@jake:~$ . ~/venv/tricolour/bin/activate
(tricolour) sjperkins@jake:~$ pip freeze
asciitree==0.3.3
dask==2021.2.0
dask-ms==0.2.6
donfig==0.6.0
fasteners==0.16
future==0.18.2
llvmlite==0.35.0
numba==0.52.0
numcodecs==0.7.3
numpy==1.19.5
pkg-resources==0.0.0
python-casacore==3.3.1
PyYAML==5.4.1
scipy==1.5.4
six==1.15.0
threadpoolctl==2.1.0
toolz==0.11.1
tricolour @ git+https://github.com/ska-sa/tricolour.git@536ee527d86d2c06c9ae0ac78b443d3622db2abf
zarr==2.6.1 I also copied $ (tricolour) sjperkins@jake:~$ tricolour data/1557347448_sdp_l0-ESO137_001-corr-subset.ms/ -nw 4
[## ] | 6% Completed | 1min 30.0s/home/sjperkins/venv/tricolour/lib/python3.6/site-packages/tricolour/flagging.py:1027: RuntimeWarning: Mean of empty slice
avgvis = np.nanmean(vis_scratch[cp, :, :], axis=0)
[## ] | 6% Completed | 2min 17.1s/home/sjperkins/venv/tricolour/lib/python3.6/site-packages/tricolour/flagging.py:1027: RuntimeWarning: Mean of empty slice
avgvis = np.nanmean(vis_scratch[cp, :, :], axis=0)
[#### ] | 10% Completed | 23min 2.2s^[[C
[#### ] | 10% Completed | 23min 2.5s
[#### ] | 11% Completed | 26min 3.9s It's still going, no crashes yet. |
Could you double check that? #80 (comment) suggests you've got tricolour 0.1.7 in your virtual environment, but you may be using a different venv. |
Confirmed. pip freeze still reports 0.1.7 ever after |
Maybe a |
pip freeze should display something like this: tricolour @ git+https://github.com/ska-sa/tricolour.git@536ee527d86d2c06c9ae0ac78b443d3622db2abf |
Ha, I did not know this! OK, that installed, and the master copy works now.
|
Right, since we've seen something in dask 2020.x breaking 0.1.7, we're going to "catch up" with it, then pin 0.1.8 to the current and known working version, right? Suggest keeping master unpinned though to make sure we catch any new problems. |
@bennahugo does test nightlies with jenkins (this was actually caught in #78), but yeah we should probably test nightly in github actions too. @smasoka Do you have bandwidth to set this up? |
0.1.8 released https://pypi.org/project/tricolour/ |
Description
Boom!
What I Did
Fresh venv,
pip install tricolour
, then:The text was updated successfully, but these errors were encountered: