Skip to content

Perl_check_hash_fields_and_hekify - hekifying NVs is not locale-safe#24252

Open
richardleach wants to merge 1 commit intoPerl:bleadfrom
richardleach:chfandhekify_locale_NVs
Open

Perl_check_hash_fields_and_hekify - hekifying NVs is not locale-safe#24252
richardleach wants to merge 1 commit intoPerl:bleadfrom
richardleach:chfandhekify_locale_NVs

Conversation

@richardleach
Copy link
Contributor

Perl_check_hash_fields_and_hekify used to convert IV, NV, and PV OP_CONST SV values into shared HEKs at compile time. However, the decimal separator character is locale dependent, so it is unsafe to perform a one-off conversion of an NV to a hash key using the compile time locale.

This commit adds a check to ensure that NVs are no longer hekified.


  • This set of changes does not require a perldelta entry.

@richardleach
Copy link
Contributor Author

Note: I'll try to add a test.

@richardleach richardleach marked this pull request as draft March 3, 2026 17:44
@richardleach
Copy link
Contributor Author

Hmm, exists($h{1.2}) still seems to get hekified. Converted this PR to a draft.

@richardleach
Copy link
Contributor Author

No, it does actually seem fine.

@richardleach richardleach marked this pull request as ready for review March 3, 2026 17:52
@khwilliamson
Copy link
Contributor

I'm trying to understand the motivation behind this. No matter what the locale, a dot is supposed to be accepted as the decimal radix character., in addition to whatever the locale's official character is. We shouldn't be outputting a dot but we should always accept them. Is there a real life example where this didn't happen?

@richardleach
Copy link
Contributor Author

This PR followed from the discussion here: #24228 (comment)

@tonycoz
Copy link
Contributor

tonycoz commented Mar 3, 2026

I'm trying to understand the motivation behind this. No matter what the locale, a dot is supposed to be accepted as the decimal radix character., in addition to whatever the locale's official character is. We shouldn't be outputting a dot but we should always accept them. Is there a real life example where this didn't happen?

The problem is the other direction, in $h{1.2} the 1.2 is treated as an NV, but is converted at compile time to a string.

This is a problem under use locale; since it uses the compile-time locale rather than the runtime locale:

# 5.40.1 because it's handy
$ LC_ALL=de_DE.UTF-8 perl -Mlocale -MPOSIX=setlocale,LC_ALL -le 'setlocale(LC_ALL, "en_AU.UTF-8") or die; $h{1.2} =1; print keys %h'
1,2

(we use . for the decimal in Australia)

This is more obviously a bug if we also add runtime evaluation:

$ LC_ALL=de_DE.UTF-8 perl -Mlocale -MPOSIX=setlocale,LC_ALL -le 'setlocale(LC_ALL, "en_AU.UTF-8") or die; $x=1.2; $h{1.2} =1; $h{$x} = 2; print for keys %h'
1.2
1,2

Note the problem isn't locales, it's that we convert to a string too early.

&& SvTYPE(sv) < SVt_PVMG
&& SvOK(sv)
&& !SvROK(sv)
&& !SvNOK(sv) /* Decimal separator depends on runtime locale */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be safe to do the NV to string conversion if IN_LC_COMPILETIME(LC_NUMERIC) is false, so maybe:

  && !(SvNOK(sv) && IN_LC_COMPILETIME(LC_NUMERIC))

which I think is more readable than:

&& (!SvNOK(sv) || !IN_LC_COMPILETIME(LC_NUMERIC))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IN_LC_COMPILETIME() check should be cheap in the non-use locale; common case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does IN_LC_COMPILETIME work? Is it safe against things like hekification within BEGIN {} blocks before the use locale; is encountered, or require locale; (if that's allowed) at runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I've added a couple of tests into t/run/locale.t and the one relevant to this PR fails if the !SvNOK(sv) test is changed to !(SvNOK(sv) && IN_LC_COMPILETIME(LC_NUMERIC)). Not sure if that's just the way the tests have been written though....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't looked, but it looks like the run time vs compile time distinction in the macros is broken (or unneeded).

During compilation, while generating OPs, PL_curcop points at PL_compiling, and during peep-time (still compiling) PL_curcop points at the most recent COP, so it's always correct to check PL_curcop.

Functions like Perl_ckwarn_common() which can be called at compiler or run time only check PL_curcop.

@khwilliamson ^^

So this should be checking IN_LC_RUNTIME(LC_NUMERIC) and the new tests pass if I change it:

diff --git a/op.c b/op.c
index c36327cf67..5974518ab5 100644
--- a/op.c
+++ b/op.c
@@ -2806,7 +2806,7 @@ Perl_check_hash_fields_and_hekify(pTHX_ UNOP *rop, SVOP *key_op, int real)
             && SvTYPE(sv) < SVt_PVMG
             && SvOK(sv)
             && !SvROK(sv)
-            && !SvNOK(sv) /* Decimal separator depends on runtime locale */
+            && !(SvNOK(sv) && IN_LC_RUNTIME(LC_NUMERIC)) /* Decimal separator depends on runtime locale */
             && real)
         {
             SSize_t keylen;

`Perl_check_hash_fields_and_hekify` used to convert IV, NV, and PV
`OP_CONST` SV values into shared HEKs at compile time. However, the
decimal  separator character is locale dependent, so it is unsafe to
perform a one-off conversion of an NV to a hash key using the compile
time locale.

This commit adds a check at optimization time to ensure that NVs are
not unsafely hekified.

It also adds a couple of new regression tests.
@richardleach richardleach force-pushed the chfandhekify_locale_NVs branch from 60f6b05 to 3a2eb9a Compare March 7, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants