fix: suffixes_prefixes_titles always reflects current set state#166
fix: suffixes_prefixes_titles always reflects current set state#166gaoflow wants to merge 8 commits into
Conversation
|
Two follow-up commits added based on review feedback:
The original fix dropped the cache entirely, which is ~1000x slower per call (~69 µs vs ~70 ns). This commit restores it with correct invalidation:
690 tests pass (682 pre-existing + 8 new × dual-run fixture). |
|
A follow-up on the cache-invalidation machinery added in
The fix moves the behavior onto a small
The in-place Behavior, New tests cover the paths the previous version left unguarded: the replaced-manager re-wiring path, the "orphaned manager no longer invalidates the live cache" guarantee (asserted by cache identity, since the leak's only symptom is an unnecessary recompute), |
The `suffixes_prefixes_titles` property on `Constants` cached its result in `_pst` after the first access. Any subsequent `add()` or `remove()` call on `titles`, `prefixes`, `suffix_acronyms`, or `suffix_not_acronyms` was silently ignored by the cache, so `is_rootname()` kept returning the stale result. Concretely, a word added to `C.titles` after the property was first accessed would still be treated as a rootname (first/middle/last) by `join_on_conjunctions`, even though `is_title()` correctly returned `True` for it. This contradicts the documented behaviour of per-instance config customisation described in AGENTS.md. Fix: drop the `_pst` cache entirely and compute the union fresh on every access. The four-set union is cheap and the simplest correct approach. Add five tests that assert the property and `is_rootname` stay consistent with the live sets after `add()`/`remove()` calls.
…_setattr__ The original cache was dropped to fix staleness, but recomputing the set union on every call is ~1000x slower. This restores the cache with a correct invalidation strategy: SetManager fires an on_change callback after add/remove, and Constants.__setattr__ clears _pst and re-wires the callback whenever one of the four contributing attrs is replaced (covering both user mutations and conftest teardown restores). The conftest manual `CONSTANTS._pst = None` reset is removed — it is now handled automatically by __setattr__ during collection restore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lper Add missing test cases flagged during PR review: - prime-then-mutate test that would catch a regression to cached staleness - add tests for suffix_acronyms and suffix_not_acronyms (previously untested) - remove test for prefixes (mirroring the existing titles remove test) - assertFalse(x in ...) -> assertNotIn for the two remove tests Also adds assertNotIn to HumanNameTestBase to support the new assertions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Six of the suffixes_prefixes_titles tests passed against the unfixed code because they read the property only once on a cold cache, so they asserted "the union includes its source sets" rather than "the cache stays consistent after a mutation." Prime the cache before mutating so they actually exercise invalidation; verified they now fail (14 across the dual-run fixture) against master's pre-fix config. Change the property guard from `if not self._pst` to `if self._pst is None` so a legitimately empty union caches instead of recomputing on every access — and so "cached" means "computed," which is the state the primed tests rely on. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… leak Replace the catch-all Constants.__setattr__ + _PST_ATTRS frozenset with a _CachedUnionMember descriptor on the four cached attributes (prefixes, suffix_acronyms, suffix_not_acronyms, titles). The invalidation/rewiring behavior now lives beside the attributes it governs, runs only for those four (not on every attribute assignment), and removes the hand-synced attribute-name list. Fixes a stale-callback leak: reassigning a manager now detaches the replaced one (_on_change = None) so an orphaned reference can no longer invalidate the live cache. Drop the unused on_change constructor parameter from SetManager — wiring is always done by the owning Constants after construction (e.g. the config-teardown setattr path), so the constructor channel was dead and misleading. Behavior, mypy, and ruff are unchanged; parse throughput is identical to the cached baseline (~18us/name vs ~263us/name without the cache). Tests: add coverage for the replaced-manager rewiring path, the orphan-no-longer-invalidates guarantee, add_with_encoding invalidation, and suffix-set removal after priming; add an assertIs shim to the test base. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_CachedUnionMember stores prefixes/titles/etc. under a leading-underscore private name (_prefixes, _titles, …). The previous __getstate__ filtered out everything starting with '_', so those four attrs were silently dropped from every pickle. __setstate__ then had nothing to restore and accessed the class-level descriptor object rather than a SetManager. Fix: map '_xxx' back to 'xxx' when the attr is a _CachedUnionMember, so __setstate__ receives the public name and can restore through the descriptor, re-wiring _on_change callbacks in the process. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
388c65b to
9abb861
Compare
- SetManager.remove() only fires _on_change when an element was actually removed, avoiding unnecessary cache churn on no-op calls - _CachedUnionMember.__set__ raises TypeError for non-SetManager values instead of silently skipping hook wiring (which would break invalidation) - Constants.__setstate__ guards against malformed/truncated pickle state using _CachedUnionMember introspection, matching __getstate__'s approach - Add comment to __init__ explaining the descriptor-assignment ordering dependency that establishes _pst - UnicodeDecodeError fallback in base.py now preserves actual/expected in the assertion message - Replace duplicate add-title test with pickle round-trip callback test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
UpdateSince opening this PR, a few things have happened:
From my end I'm happy with the state of this. I'll wait a few days to see if the original contributor has any feedback before merging. |
Took master's assertIs (with improved error message) and kept assertIsNot addition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem
Constants.suffixes_prefixes_titlescached its result in_pstafter thefirst access and never invalidated that cache. Any subsequent
add()orremove()call ontitles,prefixes,suffix_acronyms, orsuffix_not_acronymswas silently ignored by the stale cache.This creates an observable inconsistency:
is_title()/is_prefix()/is_suffix()query the live sets and return the correct answer, butis_rootname()— which delegates tosuffixes_prefixes_titles— keepsreturning the stale cached answer, causing it to contradict the other helpers.
Minimal reproduction:
The same inconsistency affects titles and suffixes added or removed after the
property is first read.
join_on_conjunctionsrelies onis_rootnametocount
rootname_piecesand choose whether to join single-letter conjunctions,so a stale
_pstcan silently skew that decision.Fix
Keep
_pstas a cache but wire invalidation callbacks so it is clearedwhenever any of the four contributing
SetManagerattributes changes.A new
_CachedUnionMemberdescriptor manages each ofprefixes,suffix_acronyms,suffix_not_acronyms, andtitles. On assignment itdetaches the old manager's callback, wires the new manager's
_on_changeto_invalidate_pst, and clears the cache.SetManager.add_with_encodingandremovefire_on_changeafter each mutation, so in-place changesinvalidate the cache too.
This preserves the performance benefit of caching while fixing the staleness.
Tests
Thirteen new tests in
tests/test_constants.pyverify that:suffixes_prefixes_titlesreflects add/remove on all four contributing sets,both on a cold cache and after priming.
is_rootnamestays consistent withis_title/is_prefixafter mutations.SetManagerinvalidates the cache and wires the newmanager's callbacks.
add_with_encodinginvalidates the cache.All 734 tests pass (20 xfailed).
This pull request was prepared with the assistance of AI, under my direction and review.