Skip to content

HumanName loses CONSTANTS singleton identity on pickle/deepcopy (has_own_config flips, pickle bloat) #169

Description

@derek73

Summary

A default HumanName shares the module-level CONSTANTS singleton as its .C.
pickle and copy.deepcopy cannot preserve object identity, so after a
round-trip .C is a detached copy rather than the singleton. This flips
has_own_config from False to True and bloats every serialized default name
with a full Constants copy (~1000+ entries).

Surfaced during the round-trip work in #168 (which fixes the value-level
round-trip bugs but deliberately leaves this identity/semantic issue out of
scope).

Reproduction

import pickle, copy
from nameparser import HumanName
from nameparser.config import CONSTANTS

hn = HumanName("John Doe")
print(hn.has_own_config, hn.C is CONSTANTS)              # False True

r = pickle.loads(pickle.dumps(hn))
print(r.has_own_config, r.C is CONSTANTS)                # True  False

d = copy.deepcopy(hn)
print(d.has_own_config, d.C is CONSTANTS)                # True  False

Impact

  • has_own_config reports True for a name that was using the shared config —
    the property effectively lies after a round-trip.
  • The restored name no longer tracks later changes to the global CONSTANTS.
  • Each pickled/deep-copied default name carries a full Constants copy instead
    of a reference. Real memory/size cost when serializing names in bulk.

This is pre-existing (the old serialization also copied .C by value); #168 did
not introduce it.

Proposed fix

Add HumanName.__getstate__/__setstate__ (or __reduce__) that special-case
self.C is CONSTANTS and restore the reference instead of a copy. Sketch:

def __getstate__(self):
    state = self.__dict__.copy()
    if self.C is CONSTANTS:
        state['C'] = None        # sentinel: shared module config
    return state

def __setstate__(self, state):
    if state.get('C') is None:
        state['C'] = CONSTANTS
    self.__dict__.update(state)

(self.C is never legitimately None — the constructor replaces None with a
fresh Constants — so None is a safe sentinel. Confirm before relying on it.)

Notes / things to decide

  • This changes HumanName serialization semantics, so it warrants its own PR and
    a changelog note.
  • Needs tests: default name round-trips with .C is CONSTANTS and
    has_own_config == False; an instance-config name still round-trips its own
    config (covered today by the tests added in Fix pickle/copy round-trip bugs in Constants and config managers #168); both pickle and
    copy.deepcopy.
  • Decide whether copy.copy (shallow) behavior should match.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions