🔨 Fix Semantic Segmentor Memory Spill Issue #988

Jiaqi-Lv · 2026-01-27T17:17:26Z

While I was working on the KongNet notebook PR, I encountered out-of-memory issues and Python kept crashing when processing a relatively large WSI. After investigating and have to admit getting lots of help from AI, we found the root cause being the infer_wsi() function from SementicSegmentor.

I used a lot of help for AI to make this work, as I find some parts quite difficult to understand, so careful review is needed...

The example slide I was trying to run was: https://huggingface.co/datasets/TIACentre/TIAToolBox_Remote_Samples/blob/main/sample_wsis/D_P000019_PAS_CPG.tif. The code I was trying to run was:

segmentor = SemanticSegmentor(model="fcn_resnet50_unet-bcss")
out = segmentor.run(
    images=[Path(wsi_path)],
    patch_mode=False,
    device="cuda",
    save_dir=output_path,
    overwrite=True,
    output_type="annotationstore",
    auto_get_mask=True,
    memory_threshold=25,
    num_workers=0,
    batch_size=8,
)

Before this PR, the code kept crashing on my workstation, which has 32GBs of RAM, memory spiked to 100% just before it crashed.

Root Causes:

Horizontal merge built per-row canvases spanning the slide width; large arrays spiked RAM near the end of WSI processing.
Masked-output alignment allocated dense zero-filled arrays in RAM for sparse locations.
Probability spill path forced a full compute()of the Dask array.

Key Changes:

Row-bounded horizontal merge: limit canvas width to the current row span, shrinking per-row allocations. [semantic_segmentor.py:1070-1081]
NumPy coercion before in-place merges to avoid Dask out errors. [semantic_segmentor.py:1014-1022]
Incremental [save_to_cache]: flush canvas/count row-chunks to Zarr without materializing the full arrays. [semantic_segmentor.py:1103-1158]
Streaming probability spill: write block-by-block instead of computing the whole Dask array; handle 2D/3D blocks with ndim-aware indexing. [semantic_segmentor.py:1234-1268], [semantic_segmentor.py:1263-1267]
Full-batch placement to disk: masked-output alignment now uses per-batch temp Zarr directories, eliminating giant in-RAM zeros and avoiding chunk-shape collisions. Call site: [semantic_segmentor.py:497-505]; impl: [semantic_segmentor.py:1355-1413]
Cleanup/maintenance: docstring fixed

Problems Solved:

OOMs and late-stage crashes during WSI processing.
Masked-output alignment allocating massive dense arrays in RAM.

Copilot

Pull request overview

This pull request addresses memory management issues in the semantic segmentor by implementing several optimizations: using zarr-backed arrays to reduce peak memory usage during batch processing, incrementally flushing data to disk instead of computing entire dask arrays at once, and improving memory threshold calculations. The changes aim to prevent out-of-memory errors when processing large whole slide images.

Changes:

Modified memory threshold monitoring to use currently available memory instead of initial snapshot
Refactored prepare_full_batch to use zarr-backed arrays for intermediate storage
Updated save_to_cache to incrementally flush data block-by-block
Modified merge_vertical_chunkwise to process probability blocks iteratively
Added cleanup of temporary directories and improved zarr file removal

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
tiatoolbox/models/engine/semantic_segmentor.py	Core memory optimizations including zarr-backed storage, incremental data flushing, improved memory monitoring, and temporary file handling
tests/engines/test_semantic_segmentor.py	Adjusted assertion tolerances to account for numerical precision changes from zarr storage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T17:23:17Z