Skip to content

fix(HuggingFaceLocalGenerator): remove stop_words cross-product in reply post-processing#11502

Closed
alvinttang wants to merge 1 commit into
deepset-ai:mainfrom
alvinttang:fix/hf-local-generator-stop-words-cross-product
Closed

fix(HuggingFaceLocalGenerator): remove stop_words cross-product in reply post-processing#11502
alvinttang wants to merge 1 commit into
deepset-ai:mainfrom
alvinttang:fix/hf-local-generator-stop-words-cross-product

Conversation

@alvinttang

Copy link
Copy Markdown

Refs #11409.

HuggingFaceLocalGenerator.run post-processes replies with a nested list comprehension:

replies = [reply.replace(stop_word, '').rstrip()
           for reply in replies
           for stop_word in self.stop_words]

That's a cross-product. With N replies and M stop words it emits N*M replies, and only every M-th one has every stop word removed. Half the output silently still contains a stop word.

The chat sibling at chat/hugging_face_local.py:660 already does this correctly with a sequential loop, so this PR aligns the non-chat path with the same pattern.

RED

Two new regression tests on main:

FAILED test_run_stop_words_removal_with_multiple_stop_words
  assert {'replies': [...4 entries...]} == {'replies': ['Paris is the capital.', 'France is in Europe.']}

FAILED test_run_stop_words_removal_all_stop_words_removed_from_each_reply
  assert {'replies': ['Hello STOP world']} == {'replies': ['Hello  world']}

The original test_run_stop_words_removal (single stop word) keeps passing.

GREEN

test_run_stop_words_removal                                              PASSED
test_run_stop_words_removal_with_multiple_stop_words                     PASSED
test_run_stop_words_removal_all_stop_words_removed_from_each_reply       PASSED

Full file: 28 passed, 1 deselected (integration, model download), 6 warnings in 140.60s. No regression elsewhere.

@alvinttang alvinttang requested a review from a team as a code owner June 4, 2026 00:37
@alvinttang alvinttang requested review from anakin87 and removed request for a team June 4, 2026 00:37
@vercel

vercel Bot commented Jun 4, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant

CLAassistant commented Jun 4, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@anakin87 anakin87 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR.

Please sign the CLA, then ping me and I'll proceed with the actual review.

@alvinttang alvinttang force-pushed the fix/hf-local-generator-stop-words-cross-product branch from 9fc7c40 to 7697c9b Compare June 6, 2026 05:55
@alvinttang

Copy link
Copy Markdown
Author

recheck

…ply post-processing

With N replies and M stop_words, the previous nested-comprehension
produced N*M replies instead of N. Half of the extra replies still
contained the stop word because each iteration only stripped one.

Switching to a sequential loop (already what the chat sibling at
chat/hugging_face_local.py:660 does) keeps the count at N and removes
every stop word from every reply.

Refs deepset-ai#11409
@alvinttang alvinttang force-pushed the fix/hf-local-generator-stop-words-cross-product branch from e971f30 to 6ee9af3 Compare June 6, 2026 12:45
@alvinttang

Copy link
Copy Markdown
Author

@anakin87 CLA is signed now (had to re-author the commit under the right email to make cla-assistant pick it up). Ready for review whenever you have a moment, thanks.

@bogdankostic

Copy link
Copy Markdown
Contributor

Thanks @alvinttang, the bug has been already fixed in #11413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants