Skip to content

Add sentinel field to NumericalAttribute#42

Open
copybara-service[bot] wants to merge 1 commit into
mainfrom
cl/935074118
Open

Add sentinel field to NumericalAttribute#42
copybara-service[bot] wants to merge 1 commit into
mainfrom
cl/935074118

Conversation

@copybara-service

@copybara-service copybara-service Bot commented Jun 19, 2026

Copy link
Copy Markdown

Add sentinel field to NumericalAttribute

Adds a configurable sentinel field to NumericalAttribute (defaults
to np.nan) that controls the value assigned to out-of-domain entries
during reverse discretization when clip_to_range is False.

Previously, out-of-domain values were unconditionally mapped to None
(scalar path) or np.nan (vectorized path). With this change, users
can supply a custom sentinel value (e.g. -1) to keep integer dtype
arrays instead of silently promoting to float. When
interval_handling='interval', sentinel must be a string; for other
modes it must be numeric (int or float).

The pipeline proto does not persist sentinel; it always uses the
default NaN on deserialization.

Changes:

  • domain.NumericalAttribute: new sentinel field (float|int|str)
    with NaN default, bidirectional type validation (string ↔ interval),
    and NaN-safe __eq__/__hash__
  • transformations.py: reverse() returns sentinel when clip_to_range=False
  • vectorized_transformations.py: undiscretize() uses configurable sentinel
  • pydantic_api.py: convert NaN→None at output layer for pydantic compat

@copybara-service copybara-service Bot force-pushed the cl/935074118 branch 4 times, most recently from 37ace23 to 5fc5291 Compare June 20, 2026 02:10
Adds a configurable `sentinel` field to `NumericalAttribute` (defaults
to `np.nan`) that controls the value assigned to out-of-domain entries
during reverse discretization when `clip_to_range` is False.

Previously, out-of-domain values were unconditionally mapped to `None`
(scalar path) or `np.nan` (vectorized path). With this change, users
can supply a custom sentinel value (e.g. `-1`) to keep integer dtype
arrays instead of silently promoting to float. When
`interval_handling='interval'`, sentinel must be a string; for other
modes it must be numeric (int or float).

The pipeline proto does not persist sentinel; it always uses the
default NaN on deserialization.

Changes:
- `domain.NumericalAttribute`: new `sentinel` field (float|int|str)
  with NaN default, bidirectional type validation (string ↔ interval),
  and NaN-safe `__eq__`/`__hash__`
- `transformations.py`: `reverse()` returns sentinel when clip_to_range=False
- `vectorized_transformations.py`: `undiscretize()` uses configurable sentinel
- `pydantic_api.py`: convert NaN→None at output layer for pydantic compat
PiperOrigin-RevId: 935074118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants