Skip to content

Add afterMerge() lifecycle hook to KnnVectorsWriter and PerFieldKnnVe…#15952

Open
MrFlap wants to merge 1 commit intoapache:mainfrom
MrFlap:after-merge
Open

Add afterMerge() lifecycle hook to KnnVectorsWriter and PerFieldKnnVe…#15952
MrFlap wants to merge 1 commit intoapache:mainfrom
MrFlap:after-merge

Conversation

@MrFlap
Copy link
Copy Markdown

@MrFlap MrFlap commented Apr 10, 2026

Resolves #15935

Problem

PerFieldKnnVectorsFormat subclasses that allocate merge-time resources (such as thread pools for concurrent HNSW graph construction) have no way to know when a merge completes. getKnnVectorsFormatForField is called per-field during merge, but there is no corresponding callback when the merge finishes. This makes it impossible to cleanly release per-merge resources, leading to resource leaks.

Solution

Add a protected void afterMerge() throws IOException method with a no-op default to both KnnVectorsWriter and PerFieldKnnVectorsFormat.

A consumer can then override afterMerge() to shrink a shared thread pool after each merge completes, ensuring the pool scales elastically with the number of concurrent merges rather than leaking threads indefinitely.

Changes

KnnVectorsWriter.java: Added afterMerge() no-op method. Wrapped the field merge loop in merge() with try/finally { afterMerge() }.
PerFieldKnnVectorsFormat.java: Added afterMerge() no-op method on the format. FieldsWriter overrides afterMerge() to delegate to PerFieldKnnVectorsFormat.this.afterMerge().

cc @navneet1v @shatejas

@navneet1v
Copy link
Copy Markdown
Contributor

@MrFlap can you add entry in change log and also unit test for the new interface

Comment on lines -107 to 127
if (mergeState.infoStream.isEnabled("VV")) {
mergeState.infoStream.message("VV", "merge done " + mergeState.segmentInfo);
if (mergeState.infoStream.isEnabled("VV")) {
mergeState.infoStream.message("VV", "merge done " + mergeState.segmentInfo);
}
}
}
} finally {
afterMerge();
}
finish();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't finish work? It seems like the writer should know its in a merge context already and finish should just handle that? Adding a new interface seems unnecessary?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to manage the threads in PerFieldKnnVectorsFormat, which doesn't get a callback once KnnVectorsWriter.finish() happens. Also, the suggested afterMerge() runs in a finally block, so we can be sure to close out the threads even on exception.

Copy link
Copy Markdown
Member

@benwtrent benwtrent Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. On an exception with merge, all files, etc. should be closed, do we want to move finish into the finally? There may be edge cases, for sure...
  2. Are we sure extending PerFieldKnnVectorsFormat is the way forward here? My point is that this type of thing is doable via experts in Lucene in different ways than adding another top-level API to the KnnVectorFormat which then impacts all formats.

I want to be very careful about adding an expert level API that Apache Lucene itself doesn't actually use.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On an exception with merge, all files, etc. should be closed, do we want to move finish into the finally? There may be edge cases, for sure...

I'm not sure I can give a completely educated call on this, but it seems like the right thing to do. If mergeOneField throws mid-way, the finish() call will be skipped.

Are we sure extending PerFieldKnnVectorsFormat is the way forward here? My point is that this type of thing is doable via experts in Lucene in different ways than adding another top-level API to the KnnVectorFormat which then impacts all formats.

Understandable. I'm not a Lucene expert, but I don't think there is an easy way for PerFieldKnnVectorsFormat to know when a merge completes. I'll continue looking into this. cc @navneet1v

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am saying, that any expert extending PerFieldKnnVectorsFormat can likely do this in another way and requiring a Lucene change that Lucene doesn't benefit from is a tricky situation.

I am just wondering if this is the best way for Lucene.

@MrFlap
Copy link
Copy Markdown
Author

MrFlap commented Apr 16, 2026

@MrFlap can you add entry in change log and also unit test for the new interface

Adding these.

@github-actions github-actions bot added this to the 11.0.0 milestone Apr 16, 2026
…ctorsFormat

Signed-off-by: Andrew Klepchick <aklepchi@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hook for merge thread pool

3 participants