Inconsistent alphabets across languages #9

kschulst · 2023-03-29T06:58:41Z

Hi! It seems that the default base62 alphabet is defined differently across of languages:

Java implementation:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Ref: https://github.com/mysto/java-fpe/blob/main/src/main/java/com/privacylogistics/FF3Cipher.java#L515

Python implementation:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Ref: https://github.com/mysto/python-fpe/blob/main/ff3/ff3.py#L75

This will lead to inconsistent results between implementations if the user does not define the alphabet explicitly.

The text was updated successfully, but these errors were encountered:

bschoening · 2023-03-29T21:11:05Z

Hi Kenneth, thanks for detailing this issue with the base62 alphabet. It's unfortunate that these two implementation have reversed the order here.

The solution is probably to externalize the test vectors as yaml or json and share them. However, aligning the order will break previously encoded data for the package which changes. I'll look into this as the benefits of shared test vectors is significant (in addition to correcting this alignment issue).

kschulst · 2023-04-03T11:32:35Z

If the alphabets will be aligned across of languages, what would be the preferred definition of the base62 alphabet?
digits+uppercase+lowercase or digits+lowercase+uppercase?

Using the ordinal value from the ascii table as a pointer, that would suggest that digits+uppercase+lowercase could be the reference definition. On the other hand, if other implementations already utilise alphabets such as 0123456789abcdef, the digits+lowercase+uppercase option could be regarded as an extension of such alphabets.🤔

It would be nice to know, so that I my implementation can settle for a good "default override" alphabet.

bschoening · 2023-04-03T12:46:39Z

Yes, ASCII would suggest digits+uppercase+lowercase.

However, the NIST test vectors for FPE use digits+lowercase and do not use uppercase in their plaintext / ciphertext. For example, Sample #5. Not this affects only the test cases, the alphabet ordering is not part of the standard.


FF3-AES128
Key is EF 43 59 D8 D5 80 AA 4F 7F 03 6D 6F 04 FC 6A 94
Radix = 26
--------------------------------------------------------------
PT is <0123456789abcdefghi>
CT is g2pk40i992fn20cjakb
---------------------------

at https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/examples/ff3samples.pdf

Implementations that support custom alphabets could handle this as a special case, but not all of the Mysto languages support custom alphabets at this point. My C implementation, for example, does not support it currently.

I may ask NIST why they chose digits+lowercase, instead of using the ASCII ordering. I'd say ASCII is the more traditional ordering.

kschulst · 2023-04-03T14:25:35Z

Based on the NIST test vectors, and for compliancy with the other Mysto- FPE implementations, it seems that the safest bet is to go with a digits+lowercase+uppercase alphabet as default then?

To give you some context, we are working on a custom extension to the Google Tink library. Tink does not currently provide any cryptographic primitives for FPE, so we have created a custom FPE abstraction. Instead of implementing FPE from scratch, we are using the Mysto library family underneath the hood as a baseline implementation for FF3-1 across languages. Disclaimer: I am not associated with Google or Tink.

The alphabet of choice (such as ALPHANUMERIC) is encoded into the Tink FPE key material, so it would be nice to be aligned with the defaults applied by the Mysto-libraries.

(It would be very interesting to discuss other usage aspects regarding the Mysto-library and get your opinion on the design decisions that we have made for Tink FPE (python and java version). But that is probably out of context for this issue 🙂)

bschoening · 2023-04-04T15:03:24Z

Kenneth,

After further thought, both Unicode and ASCII use the ordinal sort order of digits+uppercase+lowercase. It's sort of an accident of how NIST has defined the test vectors, which is unfortunate, but can be worked around.

To correct this in the Java version, I'll remove the support for radix 26 in the FF3Cipher(key, tweak, radix) constructor.

bschoening · 2023-04-12T15:13:09Z

@kschulst please look at the latest trunk for java-fpe and see if this looks good to you. I plan to revise the Python and other implementations to use the general lexicographical ordering of digits, uppercase and then lowercase.

kschulst · 2023-04-13T20:33:35Z

Totally agree with this approach 👍

kschulst mentioned this issue Apr 13, 2023

Rearrange alphanumeric alphabet statisticsnorway/tink-fpe-java#10

Merged

kschulst mentioned this issue Apr 14, 2023

Rearrange default alphanumeric alphabet statisticsnorway/tink-fpe-python#40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent alphabets across languages #9

Inconsistent alphabets across languages #9

kschulst commented Mar 29, 2023

bschoening commented Mar 29, 2023

kschulst commented Apr 3, 2023 •

edited

Loading

bschoening commented Apr 3, 2023 •

edited

Loading

kschulst commented Apr 3, 2023

bschoening commented Apr 4, 2023 •

edited

Loading

bschoening commented Apr 12, 2023 •

edited

Loading

kschulst commented Apr 13, 2023

Inconsistent alphabets across languages #9

Inconsistent alphabets across languages #9

Comments

kschulst commented Mar 29, 2023

bschoening commented Mar 29, 2023

kschulst commented Apr 3, 2023 • edited Loading

bschoening commented Apr 3, 2023 • edited Loading

kschulst commented Apr 3, 2023

bschoening commented Apr 4, 2023 • edited Loading

bschoening commented Apr 12, 2023 • edited Loading

kschulst commented Apr 13, 2023

kschulst commented Apr 3, 2023 •

edited

Loading

bschoening commented Apr 3, 2023 •

edited

Loading

bschoening commented Apr 4, 2023 •

edited

Loading

bschoening commented Apr 12, 2023 •

edited

Loading