Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent alphabets across languages #9

Open
kschulst opened this issue Mar 29, 2023 · 7 comments
Open

Inconsistent alphabets across languages #9

kschulst opened this issue Mar 29, 2023 · 7 comments

Comments

@kschulst
Copy link
Contributor

Hi! It seems that the default base62 alphabet is defined differently across of languages:

Java implementation:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Ref: https://github.com/mysto/java-fpe/blob/main/src/main/java/com/privacylogistics/FF3Cipher.java#L515

Python implementation:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Ref: https://github.com/mysto/python-fpe/blob/main/ff3/ff3.py#L75

This will lead to inconsistent results between implementations if the user does not define the alphabet explicitly.

@bschoening
Copy link
Member

Hi Kenneth, thanks for detailing this issue with the base62 alphabet. It's unfortunate that these two implementation have reversed the order here.

The solution is probably to externalize the test vectors as yaml or json and share them. However, aligning the order will break previously encoded data for the package which changes. I'll look into this as the benefits of shared test vectors is significant (in addition to correcting this alignment issue).

@kschulst
Copy link
Contributor Author

kschulst commented Apr 3, 2023

If the alphabets will be aligned across of languages, what would be the preferred definition of the base62 alphabet?
digits+uppercase+lowercase or digits+lowercase+uppercase?

Using the ordinal value from the ascii table as a pointer, that would suggest that digits+uppercase+lowercase could be the reference definition. On the other hand, if other implementations already utilise alphabets such as 0123456789abcdef, the digits+lowercase+uppercase option could be regarded as an extension of such alphabets.🤔

It would be nice to know, so that I my implementation can settle for a good "default override" alphabet.

@bschoening
Copy link
Member

bschoening commented Apr 3, 2023

Yes, ASCII would suggest digits+uppercase+lowercase.

However, the NIST test vectors for FPE use digits+lowercase and do not use uppercase in their plaintext / ciphertext. For example, Sample #5. Not this affects only the test cases, the alphabet ordering is not part of the standard.


FF3-AES128
Key is EF 43 59 D8 D5 80 AA 4F 7F 03 6D 6F 04 FC 6A 94
Radix = 26
--------------------------------------------------------------
PT is <0123456789abcdefghi>
CT is g2pk40i992fn20cjakb
---------------------------

at https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/examples/ff3samples.pdf

Implementations that support custom alphabets could handle this as a special case, but not all of the Mysto languages support custom alphabets at this point. My C implementation, for example, does not support it currently.

I may ask NIST why they chose digits+lowercase, instead of using the ASCII ordering. I'd say ASCII is the more traditional ordering.

@kschulst
Copy link
Contributor Author

kschulst commented Apr 3, 2023

Based on the NIST test vectors, and for compliancy with the other Mysto- FPE implementations, it seems that the safest bet is to go with a digits+lowercase+uppercase alphabet as default then?

To give you some context, we are working on a custom extension to the Google Tink library. Tink does not currently provide any cryptographic primitives for FPE, so we have created a custom FPE abstraction. Instead of implementing FPE from scratch, we are using the Mysto library family underneath the hood as a baseline implementation for FF3-1 across languages. Disclaimer: I am not associated with Google or Tink.

The alphabet of choice (such as ALPHANUMERIC) is encoded into the Tink FPE key material, so it would be nice to be aligned with the defaults applied by the Mysto-libraries.

(It would be very interesting to discuss other usage aspects regarding the Mysto-library and get your opinion on the design decisions that we have made for Tink FPE (python and java version). But that is probably out of context for this issue 🙂)

@bschoening
Copy link
Member

bschoening commented Apr 4, 2023

Kenneth,

After further thought, both Unicode and ASCII use the ordinal sort order of digits+uppercase+lowercase. It's sort of an accident of how NIST has defined the test vectors, which is unfortunate, but can be worked around.

To correct this in the Java version, I'll remove the support for radix 26 in the FF3Cipher(key, tweak, radix) constructor.

@bschoening
Copy link
Member

bschoening commented Apr 12, 2023

@kschulst please look at the latest trunk for java-fpe and see if this looks good to you. I plan to revise the Python and other implementations to use the general lexicographical ordering of digits, uppercase and then lowercase.

@kschulst
Copy link
Contributor Author

Totally agree with this approach 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants