Skip to content

Comments

Destroyed pool fix: prevent SIGSEGV on IOC exit when pvAccess holds NDArrays after driver/pool destroyed#570

Open
kgofron wants to merge 3 commits intoareaDetector:masterfrom
kgofron:destroyed-pool
Open

Destroyed pool fix: prevent SIGSEGV on IOC exit when pvAccess holds NDArrays after driver/pool destroyed#570
kgofron wants to merge 3 commits intoareaDetector:masterfrom
kgofron:destroyed-pool

Conversation

@kgofron
Copy link
Member

@kgofron kgofron commented Feb 19, 2026

Segmentation fault

"fix: prevent SIGSEGV on IOC exit when pvAccess holds NDArrays after driver/pool destroyed" refers to Segmentation fault after ioc exits, when acquisition was performed (memory/pool allocated).

epics> auto_settings.sav: 2354 of 2354 PV's connected
ACQUIRE CHANGE: ADAcquire=1 (was 0), current ADStatus=0
PrvHst: Checking if TCP streaming should start - WritePrvHst=0
PrvHst: WritePrvHst is disabled (0) - TCP streaming not started
After acquireStart: ADStatus=1
ACQUIRE CHANGE: ADAcquire=0 (was 1), current ADStatus=1
PrvImg TCP connection closed by peer
PrvHst TCP disconnected
After acquireStop: ADStatus=0

epics> exit
PrvHst TCP disconnected
./st.cmd: line 5: 2343260 Segmentation fault      ../../bin/linux-x86_64/tpx3App st_base.cmd

Fix applied to ADCore 3.14.0 master.

epics> exit
PrvHst TCP disconnected

Problem

When an IOC exits (e.g. user types exit) after acquisition has run, the process can hit a SIGSEGV (signal 11). The crash is in NDArrayPool::release() (or equivalent use of the pool) after the detector driver and its NDArrayPool have already been destroyed.
Cause: Shutdown order: the detector driver destructor runs and deletes pNDArrayPoolPvt_. Later, the pvAccess ServerContext is torn down (atexit). Its MonitorElements still hold NDArray-derived data. The deleter used by ntndArrayConverter (freeNDArray) calls NDArray::release() on those arrays. By then the pool is gone, so release() runs against freed memory → SIGSEGV.
This has been seen with areaDetector IOCs (e.g. ADTimePix3) using ADCore 3.12.1 and 3.14.0. See issue areaDetector/ADTimePix3#5.

Approach

Two parts:
“Destroyed pool” registry

  • Before the driver deletes its pool, it registers the pool pointer in a static set.
  • In NDArray::release(), we check that set using only the pool address (no dereference).
  • If the pool was registered as destroying, we set pNDArrayPool = NULL and return without calling the pool.
    So any late release() (from PVA or elsewhere) no-ops safely, even for NDArrays that are not the driver’s pArrays[] (e.g. copies handed to PVA).

asynNDArrayDriver destructor

  • Store maxAddr in a member maxAddr_.
  • In ~asynNDArrayDriver(): call NDArrayPool::registerDestroyingPool(pNDArrayPoolPvt_), null pNDArrayPool on each pArrays[i], then delete pNDArrayPoolPvt_.

Changes

File Change
NDArray.h Declare NDArrayPool::registerDestroyingPool(NDArrayPool*) and NDArrayPool::isPoolDestroyed(NDArrayPool*).
NDArrayPool.cpp Implement both with a static std::set<NDArrayPool*> and a mutex. Pools are only ever added; the set is process-lifetime.
NDArray.cpp At the start of NDArray::release(), if isPoolDestroyed(pNDArrayPool) then set pNDArrayPool = NULL and return ND_ERROR without calling the pool.
asynNDArrayDriver.h Add private member int maxAddr_.
asynNDArrayDriver.cpp Constructor: initialize maxAddr_(maxAddr) (initializer order matches member declaration). Destructor: call registerDestroyingPool(pNDArrayPoolPvt_), then loop over pArrays[0..maxAddr_-1] and set pArrays[i]->pNDArrayPool = NULL, then delete pNDArrayPoolPvt_.

ADCore314_fix.md

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant