Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SPAKE2, SPAKE2-EE, PAKE2+, and PAKE2+EE #273

Closed
wants to merge 10 commits into from

Conversation

Sc00bz
Copy link
Contributor

@Sc00bz Sc00bz commented Jan 1, 2016

No description provided.

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 3, 2016

If anyone wants to run some tests, this is how you're suppose to use it.

SPAKE2/SPAKE2-EE

var useEE = true; // SPAKE2-EE (or false for SPAKE2)
var salt = sjcl.random.randomWords(4);
var sharedKey = sjcl.misc.pbkdf2("password", salt, 20000);

// Replace "Alice" and "Bob" with real IDs
var aSpake2 = sjcl.pake.createSpake2("Alice", "Bob", useEE);
var bSpake2 = sjcl.pake.createSpake2("Alice", "Bob", useEE);

var aData = aSpake2.startA(sharedKey); // send aData & salt to b
var bData = bSpake2.startB(sharedKey); // send bData to a

var aKey = aSpake2.finish(bData);
var bKey = bSpake2.finish(aData);
console.log(aKey);
console.log(bKey);

PAKE2+/PAKE2+EE

var useEE = true; // PAKE2+EE (or false for PAKE2+)
var salt = "data from server or username@domain"; // get salt from server
var sharedKey = sjcl.misc.pbkdf2("password", salt, 20000);

// Replace "Alice" and "example.com" with real IDs
var cPake2Plus = sjcl.pake.createPake2Plus("Alice", "example.com", useEE);
var sPake2Plus = sjcl.pake.createPake2Plus("Alice", "example.com", useEE);

var cData = cPake2Plus.startClient(sharedKey); // send cData to server
var sDbData = cPake2Plus.generateServerData(sharedKey); // read from DB
var sData = sPake2Plus.startServer(sDbData.pwKey1_M, sDbData.pwKey1_N, sDbData.pwKey2, sDbData.pwKey3_G); // send sData to client

var cKey = cPake2Plus.finish(sData);
var sKey = sPake2Plus.finish(cData);
console.log(cKey);
console.log(sKey);

Optional

var aVerifierA = sjcl.misc.hkdf(aKey, null, "Verifier A"); // send to b
var aVerifierB = sjcl.misc.hkdf(aKey, null, "Verifier B");
var aSessionKey = sjcl.misc.hkdf(aKey, null, "Session key");

var bVerifierA = sjcl.misc.hkdf(bKey, null, "Verifier A");
var bVerifierB = sjcl.misc.hkdf(bKey, null, "Verifier B"); // send to a
var bSessionKey = sjcl.misc.hkdf(bKey, null, "Session key");

if (sjcl.bitArray.equal(aVerifierA, bVerifierA)) // b checks this
if (sjcl.bitArray.equal(aVerifierB, bVerifierB)) // a checks this
  alert("aSessionKey == bSessionKey");

@ggozad
Copy link
Contributor

ggozad commented Jan 3, 2016

This is a great contribution, will definitely go through but might take a while.
Is it possible to include somewhere (where?) documentation for this kind of thing?

@Nilos
Copy link
Collaborator

Nilos commented Jan 3, 2016

Usually if javadoc is available it is added to the documentation after merge (here: http://bitwiseshiftleft.github.io/sjcl/)
Also adding information to the wiki is highly appreciated!

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 3, 2016

You need to include sjcl.js first and random.js last. I think all the others can be in any order.

<script type="text/javascript" src="sjcl.js"></script>
<script type="text/javascript" src="aes.js"></script>
<script type="text/javascript" src="bitArray.js"></script>
<script type="text/javascript" src="bn.js"></script>
<script type="text/javascript" src="codecString.js"></script>
<script type="text/javascript" src="ecc.js"></script> *** from this patch
<script type="text/javascript" src="hkdf.js"></script>
<script type="text/javascript" src="hmac.js"></script>
<script type="text/javascript" src="pake.js"></script> *** from this patch
<script type="text/javascript" src="sha256.js"></script>
<script type="text/javascript" src="random.js"></script>

P.S. Thanks @ggozad while looking at the dependencies I noticed a bug in random.js #274

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 3, 2016

Oh I think read that wrong... I thought you wanted to know what files to include :). Eh it's probably useful for someone.

@ggozad
Copy link
Contributor

ggozad commented Jan 6, 2016

Is there a test suite that should be included with the PR?

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 13, 2016

Added tests but there are no official test vectors or really how to generate pwKey1, pwKey2, and pwKey3; N and M per curve; or elligator edition N and M. So I just test to see if it generates the same keys with the same shared keys and different keys if the shared keys are different.

@ggozad
Copy link
Contributor

ggozad commented Jan 13, 2016

Thank you Steve. I could not find official test vectors either. Having a look at https://github.com/warner/python-spake2/blob/master/spake2/test_spake2.py seems like @warner does the same thing for the python library. Can you please confirm the two libraries generate the same keys?

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 13, 2016

The things that should be looked at:

  • N and M for all the curves to make sure they were generated correctly. I generated them in Sage starting with x=2 and incremented until I found two points on the curve.
  • sjcl.ecc.curve.deterministicRandomPoint() which is simplified Shallue-Woestijne-Ulas. I think there's a faster way to do this because I do pow mod (p-1)/4 twice: line 321 and 322 of ecc.js.
  • sjcl.pake._generateKeys() 128 bits extra should be fine for "uniform enough" distribution.

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 13, 2016

That uses Ed25519... oh wow that's flawed. They use the same password key for "pwKey1" and "pwKey2". Also they went with

key = H(aId || ":" || bId || ":" || aMsg || ":" || bMsg || ":" || a*b*P || ":" || pw)

vs what I did

key = H(beInt32(bitLen(aId)) || aId || beInt32(bitLen(bId)) || bId || aMsg || bMsg || a*b*P || pwKey2)

I did this to avoid collisions with IDs (ie aId = "asdf:asdf" bId = "asdf" and aId = "asdf" bId = "asdf:asdf"). Also note that aMsg, bMsg, abP, and pwKey2 are fixed length binary data.

@ggozad
Copy link
Contributor

ggozad commented Jan 14, 2016

Is there a way to verify general correctness by comparing with some other implementation? I will be reviewing more next week, have some code comments but will do in a batch.

@warner
Copy link

warner commented Jan 15, 2016

Nice stuff! It'd be great to have interoperability between our libraries. Some notes (for reference) about my python library (which @ggozad referenced):

  • I'm using the Ed25519 group because it's safe and fast. The corresponding Curve25519 group doesn't make it easy to do point addition, which you need for PAKE (but not for basic DH). My library can also use some prime-order integer groups (a few are included, copied from a NIST document that was cited in the J-PAKE demo code), but there's no good reason to use them: Ed25519 is faster for the 256-bit security parameter.
  • It creates M and N by sha512-hashing "M" or "N", treating that as an integer (mod Q), treating that as the Y coordinate, recovering the "positive" X coordainte, checking that the point is both on the curve and of the right order, and if it isn't then we increment Y and try again until we succeed. As far as I know, this is a safe-but-conservative approach. The most important property is that nobody knows the discrete log of these values.
  • There are two forms: the "asymmetric" form uses distinct N and M, while the "symmetric" form uses the same value for N and M. I've shown this to a couple of people: Mike Hamburg (at least) said he thought it's safe, although it means the proof reduces to CDH-Squared instead of regular CDH. I'm pestering Dan Boneh about it as well. (I think this is what you meant by "pwKey1"/"pwKey2").
  • I agree that the combiner for the transcript hash needs to be safe against parsing confusion, and that colon-joining the two ID values doesn't give you that. I'll change my library to make it safe. How about H(aId):H(bId):aMsg:bMsg:a*b*P:pw ? I.e. let the hash function take care of the variable-length ID strings, and then everything is fixed-length except the last string (the password)? I think that'd be simpler than picking a canonical serialization of the length (which then depends upon whether you're talking bits or bytes, what endianness to use, plus it has an arbitrary maximum length).

BTW, I wanted an asymmetric form for applications that don't have a good way to figure out "who's on first", like my magic-wormhole application when the two participants come up with a code-phase in person, then paste them into their computers later. I initially had both sides run two SPAKE2 protocols in parallel (each side doing one as Alice and a second as Bob), and then combining the results. Mike convinced me that it was safe to use M==N and reduce the traffic/computation in half. If that turns out to not be safe, I'll switch to adding a roundtrip to my protocol, so the two sides can negotiation who is whom before starting a single SPAKE2 process.

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 15, 2016

I like H(aId):H(bId):aMsg:bMsg:a*b*P:pw. Are you thinking of H(aId) || ":" || H(bId) || ":" || aMsg || ":" || bMsg || ":" || a*b*P || ":" || pw because I don't see why you'd need those colons.


The "pwKey1" vs "pwKey2" issue is that with PAKE2+ the server has pwKey1 (or precalculated pwKey1N and pwKey1M), pwKey2, and pwKey3P. Also for PAKE2+, "pw" is suppose to be salted and stretched to slow down offline attacks when the DB gets dumped. I just assumed that the only difference between SPAKE2 and PAKE2+ is that extra value "pwKey3b*P".

SPAKE2:

A = a*P + pwKey1*N
B = b*P + pwKey1*M
H(aId || bId || A || B || a*b*P || pwKey2)

PAKE2+:

A = a*P + pwKey1*N
B = b*P + pwKey1*M
H(aId || bId || A || B || a*b*P || pwKey2 || pwKey3*b*P)
(Server knows b and pwKey3*P and client knows pwKey3 and b*P)

The way I generated pwKey1, pwKey2, and pwKey3:
"HKDF(ikm, info)"
algoName = "SPAKE2", "SPAKE2-EE", "PAKE2+", or "PAKE2+EE" depending on which algorithm.

if (elligatorEdition)
  pwKey1_M = HKDF(pw, algoName + " PW1 M")
  pwKey1_N = HKDF(pw, algoName + " PW1 N")
else
  pwKey1 = HKDF(pw, algoName + " PW1")
pwKey2 = HKDF(pw, algoName + " PW2")
if (PAKE2+ or PAKE2+EE)
  pwKey3 = HKDF(pw, algoName + " PW3")

This might make more sense to do:

pwKey1 = HKDF(pw, algoName + " PW1")
pwKey2 = HKDF(pw, algoName + " PW2")
if (PAKE2+ or PAKE2+EE)
  pwKey3 = HKDF(pw, algoName + " PW3")

...

if (elligatorEdition)
  pwKey1_M = elligator(HKDF(pwKey1, "Elligator M"))
  pwKey1_N = elligator(HKDF(pwKey1, "Elligator N"))
else
  pwKey1_M = pwKey1*M
  pwKey1_N = pwKey1*N

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 15, 2016

Huh I can't find anywhere that you need to generate two keys for SPAKE2 and three keys for PAKE2+. So I guess I'm wrong.

SPAKE2:

A = a*P + pw*N
B = b*P + pw*M
H(aId || bId || A || B || a*b*P || pw)

PAKE2+:

pwKey1 = HKDF(pw, "PW1")
pwKey2 = HKDF(pw, "PW2")
A = a*P + pwKey1*N
B = b*P + pwKey1*M
H(aId || bId || A || B || a*b*P || pwKey1 || pwKey2*b*P)

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 15, 2016

@warner Should I just use SHA512 instead of HKDF. To generate a scalar I did "HKDF(..., cyclical group byte size + 16 bytes) % cyclical group" and for simplified Shallue-Woestijne-Ulas I did "HKDF(..., field prime byte size + 16 bytes) % field prime". The only "issue" is for P521 since 2**512-1 is less than both the field prime and cyclical group.

SPAKE2:

pwScalar = SHA512(pw)
A = a*P + pwScalar*M
B = b*P + pwScalar*N
H(H(aId) || H(bId) || A || B || a*b*P || pw)

SPAKE2-EE:

A = a*P + elligator(SHA512("M" + pw))
B = b*P + elligator(SHA512("N" + pw))
H(H(aId) || H(bId) || A || B || a*b*P || pw)

PAKE2+:

pw1 = SHA512("1" + pw)
pw2 = SHA512("2" + pw)
A = a*P + pw1*M
B = b*P + pw1*N
H(H(aId) || H(bId) || A || B || a*b*P || pw1 || pw2*b*P)

PAKE2+EE:

pw1 = SHA512("1" + pw)
pw2 = SHA512("2" + pw)
A = a*P + elligator(SHA512("M" + pw1))
B = b*P + elligator(SHA512("N" + pw1))
H(H(aId) || H(bId) || A || B || a*b*P || pw1 || pw2*b*P)

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 15, 2016

@warner So I just realized you do compressed points. So "A" and "B" in the hash will be different. Also I noticed "# try for compatibility with Boneh's JS version" in your code. Did Dan Boneh write SPAKE2 in JS?

Added option to compress points to sjcl.ecc.point.toBits().
Replaced pake.js's use of HKDF with SHA512.
Replaced pake.js's pwKey1, pwKey2, and pwKey3 with pwScalar, pw, and pw2Scalar.
Changed SPAKE2's key output to H(H(aId) || H(bId) || A || B || a*b*G || pw)
Changed PAKE2+'s key output to H(H(aId) || H(bId) || A || B || a*b*G || pw || pw2Scalar*b*G)

New internal password keys/points:
SPAKE2:
pw = sharedKey
pwScalar = SHA512(pw)

SPAKE2-EE:
pw = sharedKey
pw_M = elligator(SHA512("M" || pw))
pw_N = elligator(SHA512("N" || pw))

PAKE2+:
pw = SHA512("1" || sharedKey)
pwScalar = pw
pw2 = SHA512("2" || sharedKey)

PAKE2+EE:
pw = SHA512("1" || sharedKey)
pw_M = elligator(SHA512("M" || pw))
pw_N = elligator(SHA512("N" || pw))
pw2 = SHA512("2" || sharedKey)
@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 15, 2016

This should match @warner's when he changes his to "H(H(aId) || H(bId) || A || B || abP || pw)" except there's no Ed25519 in SJCL and there's no NIST curves in his.

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 15, 2016

Just verified this matches @warner's when he changes his to "H(H(aId) || H(bId) || A || B || a_b_P || pw)". I added Ed25519 to SJCL, but it's not in this pull request because I think that will lower the chances of this being merged. If you want to test it: http://pastebin.com/dR9xTUP0

I still need to implement "sjcl.ecc.tEdCurve.deterministicRandomPoint()".

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jan 16, 2016

Get warner/python-spake2#3 and add the following to the end of spake2.py and run python setup.py test (super lazy).

sA = SPAKE2_A(b"password", idA=b"alice", idB=b"bob")
m1A = sA.start()
print hexlify(m1A)
m1B = raw_input('Enter the message: ')
kA = sA.finish(unhexlify(m1B))
print hexlify(kA)
exit()

Get the current pull request and http://pastebin.com/dR9xTUP0 (ecc.js). In the dev console do:

var bSpake2 = sjcl.pake.createSpake2("alice", "bob", false, sjcl.ecc.curves.ed25519);
var bData = bSpake2.startB("password");
console.log(sjcl.codec.hex.fromBits(bData));

Swap messages with spake2.py and in the dev console do:

var spake2_py = "418c1b37427797a199817932c780a6333731c0f572da718ab3d998ba69ae001fe5";
var bKey = bSpake2.finish(sjcl.codec.hex.toBits(spake2_py));
console.log(sjcl.codec.hex.fromBits(bKey));

You'll get something like this:
spake2

@warner
Copy link

warner commented Jan 16, 2016

I landed the hash-the-IDs PR.. thanks for the patch!

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Feb 10, 2016

  • SPAKE2: pw_scalar = HKDF(pw_bytes, cxt="SPAKE2 pw", len=byteLen(n)+16)%n

👍

SPAKE2+:
It looks like they didn't include the client or server identities in the transcript because they're already included in the password generation hash: pi0+pi1 = H'(pw,idP,idQ). So we need to define that function clearly, but we can leave the identities out of the final transcript.

Ah is H' suppose to be a key stretching algorithm like PBKDF2, scrypt, Argon2, etc. The assumption for mine implementation was that you already salted and stretched the password. Granted the easiest way was scrypt(pw, salt=userName+"@"+serverDomain, ...) which is basically what H' would be doing.

  • pwstuff = pw + SHA256(idA) + SHA256(idB)
  • pw0_bytes = HKDF(pwstuff, cxt="SPAKE2+ pw0 bytes", len=256
  • pw0_scalar = HKDF(pw0_bytes, cxt="SPAKE2+ pw0 scalar", len=byteLen(n)+16)%n
  • pw1_scalar = HKDF(pwstuff, cxt="SPAKE2+ pw1 scalar", len=byteLen(n)+16)%n
  • I think pwstuff should be wrapped in a key stretching algorithm or pw is explicitly required to be already salted and stretched.
  • If pw is already salted and stretched then HKDF(pwstuff, ... -> HKDF(ikm=pw, salt=SHA256(idA) + SHA256(idB), ... I think this is more how HKDF should be used.
  • I assume that "len=256" is 256 bits.

Besides the key stretching stuff 👍

  • SPAKE2: H(sha256(pw)+sha256(idA)+sha256(idB)+X+Y+Z)
  • SPAKE2+: H(pw0_bytes+X+Y+Z+N)

👍

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Feb 12, 2016

  • SPAKE2: pw_scalar = HKDF(pw_bytes, cxt="SPAKE2 pw", len=byteLen(n)+16)%n

Is pw_bytes just the password?


What about elligator edition? My guess is you'd suggest this:

SPAKE2-EE

  • pw_M = elligator(HKDF(pw_bytes, cxt="SPAKE2 pw M", len=byteLen(n)+16)%n)
  • pw_N = elligator(HKDF(pw_bytes, cxt="SPAKE2 pw N", len=byteLen(n)+16)%n)

SPAKE2+EE

  • pw0_M = elligator(HKDF(pw0_bytes, cxt="SPAKE2+ pw0 M", len=byteLen(n)+16)%n)
  • pw0_N = elligator(HKDF(pw0_bytes, cxt="SPAKE2+ pw0 N", len=byteLen(n)+16)%n)

@warner
Copy link

warner commented Feb 13, 2016

Yeah, I think stretching should happen before SPAKE2+ even gets started.. that'll compose better. No point in baking a particular stretching algorithm into the PAKE layer.

Especially since people usually layer additional stretching on top of their stored values, over time (pbkdf2(pw) today, bcrypt(pbkdf2(pw)) tomorrow, scrypt(bcrypt(pbkdf2(pw))) next thursday).. That layering would get really messy if it had to incorporate SPAKE2+'s hash-to-element function.

I'm mildly in favor of including idA/idB in the function that converts pw to pw0/pw1, even if the application's stretching function might also include idA/idB. It's important to include it somewhere, because we don't include it in the final transcript hash, and doing it ourselves means we'll tolerate applications which fail to do it during the stretch. If the app includes it in the stretch too, that's fine too.

I like your idea of using salt=SHA256(idA)+SHA256(idB) inside both the pw0_bytes and pw1_scalar HKDF calls. I guess not in the pw0_scalar computation, since it's already included in the predecessor.

Yeah, len=256 is bits.

So:

  • stretched_pw = whatever-is-out-of-scope(pw)
  • pw0_bytes = HKDF(stretched_pw, salt=sha256(idA)+sha256(idB), ctx="SPAKE2+ pw0 bytes", len=256bits)
  • pw0_scalar = HKDF(pw0_bytes, ctx="SPAKE2+ pw0 scalar", len=(byteLen(n)+16)%n)
  • pw1_scalar = HKDF(stretched_pw, salt=sha256(idA)+sha256(idB), ctx="SPAKE2+ pw1 scalar", len=(byteLen(n)+16)%n)

And then the server stores pw0_bytes, pw0_scalar, and basepoint*pw1_scalar, but specifically does not store pw or pw1_scalar.

Yeah, for SPAKE2, pw_bytes is just the password, although apps are free to do some stretching first too. It protects against brute-force attacks (against the server-stored value) just like it would in SPAKE2+. The difference is that the SPAKE2 server value can be used to spoof the client, but the SPAKE2+ value cannot. (There might be other values derived from the original password, making it useful to protect the password from that sort of attacker even though the PAKE could be spoofed).

I hesitate to say it, but since we seem to be having such a good time with HKDF, I wonder if we should use it for the final transcript too? Like

  • SPAKE2: HKDF(ikm=sha256(pw)+Z, salt=sha256(idA)+sha256(idB)+X+Y)
  • SPAKE2+: HKDF(ikm=pw0_bytes+Z+N, salt=X+Y)

I'm -0 on that, but I can imagine an argument for consistency. It takes us further away from the published algorithm, though.

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Feb 14, 2016

And then the server stores pw0_bytes, pw0_scalar, and basepoint*pw1_scalar, but specifically does not store pw or pw1_scalar.

The server only needs to store pw0_bytes and basepoint*pw1_scalar. Since pw0_scalar is derived from pw0_bytes. Storing pw0_bytes, pw0_scalar*M, pw0_scalar*N and pw1_scalar*basepoint allows the server implementation to work with both SPAKE2+ and SPAKE2+EE and it's less work for the server.

I hesitate to say it, but since we seem to be having such a good time with HKDF, I wonder if we should use it for the final transcript too? Like

  • SPAKE2: HKDF(ikm=sha256(pw)+Z, salt=sha256(idA)+sha256(idB)+X+Y)
  • SPAKE2+: HKDF(ikm=pw0_bytes+Z+N, salt=X+Y)

I'm -0 on that, but I can imagine an argument for consistency. It takes us further away from the published algorithm, though.

I vote for H(...) but I'm also on the fence.

  • SPAKE2: H(sha256(pw)+sha256(idA)+sha256(idB)+X+Y+Z)
  • SPAKE2+: H(pw0_bytes+X+Y+Z+N)

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Feb 17, 2016

I'll try to add some tests to python-spake2 that use pre-computed scalars, so we can get some deterministic interoperability unit tests. Those tests will need some API hook or mock that lets the test replace the random internal scalar with a predetermined value.. not sure what the best approach for that would be. I'm ok with having the test reach inside the object and replace a method to accomplish that.

I just remembered I never said anything about this. It's easy to have deterministic test vectors for SJCL and should be added:

var tmp = sjcl.bn.random;

// "Sets" scalar to 0x123456789abcdef
sjcl.bn.random = function(modulus, paranoia) { return new sjcl.bn("123456789abcdef"); };

// ** Call sjcl.pake's start function here **

// Copy back the real sjcl.bn.random()
sjcl.bn.random = tmp;

I already made the changes for the following. Should I wait just in case we want to change the output to use HKDF instead of just a hash? (P.S. H() is user selected and defaults to SHA256. It is also used as the base hash function for HKDF.)

SPAKE2

pw_scalar = HKDF(pw, cxt="SPAKE2 pw", len=byteLen(n)+16)%n
key = H(H(pw)+H(idA)+H(idB)+X+Y+Z)

SPAKE2-EE

pw_M = elligator(HKDF(pw, cxt="SPAKE2 pw M", len=byteLen(n)+16)%n)
pw_N = elligator(HKDF(pw, cxt="SPAKE2 pw N", len=byteLen(n)+16)%n)
key = H(H(pw)+H(idA)+H(idB)+X+Y+Z)

SPAKE2+

stretched_pw = whatever-is-out-of-scope(pw)
pw0_bytes = HKDF(stretched_pw, salt=H(idA)+H(idB), ctx="SPAKE2+ pw0 bytes", len=256bits)
pw0_scalar = HKDF(pw0_bytes, ctx="SPAKE2+ pw0 scalar", len=byteLen(n)+16)%n
pw1_scalar = HKDF(stretched_pw, salt=H(idA)+H(idB), ctx="SPAKE2+ pw1 scalar", len=byteLen(n)+16)%n
key = H(pw0_bytes+X+Y+Z+N)

SPAKE2+EE

stretched_pw = whatever-is-out-of-scope(pw)
pw0_bytes = HKDF(stretched_pw, salt=H(idA)+H(idB), ctx="SPAKE2+ pw0 bytes", len=256bits)
pw0_M = elligator(HKDF(pw0_bytes, cxt="SPAKE2+ pw0 M", len=byteLen(n)+16)%n)
pw0_N = elligator(HKDF(pw0_bytes, cxt="SPAKE2+ pw0 N", len=byteLen(n)+16)%n)
pw1_scalar = HKDF(stretched_pw, salt=H(idA)+H(idB), ctx="SPAKE2+ pw1 scalar", len=byteLen(n)+16)%n
key = H(pw0_bytes+X+Y+Z+N)

warner added a commit to warner/python-spake2 that referenced this pull request May 10, 2016
* split expandstring() into two functions:
  * expand_password (for password_to_scalar)
  * expand_arbitrary_element_seed (for arbitrary_element)
* change HKDF context_info= for both
  * This should match the proposed SJCL changes, in
    bitwiseshiftleft/sjcl#273
* remove element_hasher= from Group constructor
* change Ed25519 to use the same scheme
warner added a commit to warner/python-spake2 that referenced this pull request May 12, 2016
This changes password_to_scalar() and arbitrary_element() to a new
derivation function, using HKDF and matching the proposed functions in
bitwiseshiftleft/sjcl#273 . Users of this
library from after this commit will not interoperate with those who use
the library from before this commit: they will get "WrongPasswordError"
all the time.

closes #5 (rebased the original commits)
warner added a commit to warner/python-spake2 that referenced this pull request May 12, 2016
This modifies the finalization function (which hashes the transcript and
shared group element into the final key) to match the proposal in
bitwiseshiftleft/sjcl#273 . As with the previous
compatibility-breaking patches, applications which use this revision of
the SPAKE2 library will not be able to communicate with those using the
previous version.
@warner
Copy link

warner commented May 12, 2016

Sorry I've been so out of touch on this one. I just landed the patches to change python-spake2's final hash function to match this (key = H(H(pw)+H(idA)+H(idB)+X+Y+Z)). I also updated the symmetric form (idA=idB, which doesn't exist in this SJCL patch, but I'd like to see it added some day) to something similar (key = H(H(pw)+H(idSym)+sorted(X,Y)+Z)).

I also recently landed a patch that changes the password-to-scalar function to match the above proposal (pw_scalar = HKDF(pw, cxt="SPAKE2 pw", len=byteLen(n)+16) % n). I haven't implemented SPAKE2+ or the elligator-edition forms yet, but when I do, I'll use the proposals above.

https://github.com/warner/python-spake2/blob/master/src/spake2/test/test_compat.py has test vectors that should let us check interoperability. For the finalization hash, the following should hold true (where the output is a bytestring, displayed here in hex):

finalize("idA", "idB", "X_msg", "Y_msg", "K_bytes", "pw") = aa02a627537543399bb1b4b430646480b6d36ab5c44842e738c8f78694d8afac

and the password-to-scalar for the Ed25519 group should have this (note that the scalar is an integer, but displayed here in hex):

pw2scalar("pw") = cf090b60384cb818b12c8d972dfbaf910c0c7295c5cfe560e508f5f062f3960f

I've also added backwards-compatibility tests of the scalar-to-bytes conversion function (e.g. ed25519 scalars are serialized little-endian, in my codebase, inside a function that suspends the whole SPAKE2 conversation for later resumption), the overall SPAKE2 operation (using a deterministic RNG), and the "arbitrary element" function (which turns a seed like "N" or "M" into an un-discrete-loggable group element).

I'm eager to build some tests that confirm interoperability of the overall SPAKE2 operation. Feel free to copy my tests, but the "overall" test is pretty dependent upon the particular way I built the deterministic RNG (basically SHA256 "CTR mode") and the exact way in which the "pick a random scalar" function uses the RNG. It might be easier to change the test to let you inject a pre-determined scalar into the SPAKE2 object, and include that "secret" scalar as part of the test vector.

It'd be nice to get compatibility between our implementations for that arbitrary-element function too, so it's obvious in all codebases that the values they use were generated safely (i.e. it'd be great if both said arbitrary_element(seed="M"), rather than one doing that and the other pasting in a big opaque serialized point, for which you'd have to hunt down the source of the other library to confirm that it was built from a seed). But that isn't strictly necessary.. as long as they use the same points, the two SPAKE2 implementations should interoperate.

I plan to make a new release of python-spake2 in the next few days. I'm aiming for a 1.0 release by the end of the month, to support a https://github.com/warner/magic-wormhole 1.0 release in the same timeframe.

@Nilos
Copy link
Collaborator

Nilos commented May 30, 2016

@Sc00bz @warner @ggozad any more work on this? Is this ready for review?

@warner
Copy link

warner commented May 31, 2016

I've made a release (python-spake2==0.7), and I'm presenting magic-wormhole at PyCon this week, so I'm mostly committed to forwards-compatibility from here on out. (I'm holding off on python-spake2==1.0 until I land SPAKE2+ support too, but the SPAKE2 support will need to be the same as what's in 0.7). I think the next step will be for one of us (maybe @Sc00bz , maybe me in a few weeks when I'm done with the conference) to add some tests to the SJCL branch that use the same vectors as the ones from python-spake2.

@ggozad
Copy link
Contributor

ggozad commented May 31, 2016

I am ok with the code which I had reviewed before. Happy to help with tests next week if @Sc00bz does not beat me to it. @warner I hadn't seen python-spake2, would it be possible to highlight which tests you think should be covered here? Have fun at pycon!

Replaced pake.js's use of SHA512 with HKDF.
Changed SPAKE2's key output to H(H(pw) || H(aId) || H(bId) || A || B || a*b*G)
Changed SPAKE2+'s key output to H(HKDF(pw, H(aId) || H(bId), "SPAKE2+ pw0 bytes") || A || B || a*b*G || pw2Scalar*b*G)
Fixed defaults for compressPoints and littleEndian
@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jun 1, 2016

I check and it matches the python-spake2 0.7 version.

M and N changed to:

sjcl.ecc.curves.ed25519.M = sjcl.ecc.curves.ed25519.fromBits(sjcl.codec.hex.toBits("15cfd18e385952982b6a8f8c7854963b58e34388c8e6dae891db756481a02312"), 1);
sjcl.ecc.curves.ed25519.N = sjcl.ecc.curves.ed25519.fromBits(sjcl.codec.hex.toBits("f04f2e7eb734b2a8f8b472eaf9c3c632576ac64aea650b496a8a20ff00e583c3"), 1);

Or you can grab the updated ecc-ed25519.js

@Sc00bz
Copy link
Contributor Author

Sc00bz commented Jun 2, 2016

@warner I'm going to work on tests and generating M and N from HKDF(...). I noticed that info is "SPAKE2 arbitrary element" which is 24 bytes you should try for at most 22 bytes so that it fits in one block, but it doesn't really matter.

@titanous
Copy link

titanous commented Aug 19, 2016

I just came across this and wanted to point out draft-irtf-cfrg-spake2-03, which specifies things like a specific format of the transcript hash data (with eight-byte little endian length prefixes for each field). It's still an early draft, so you can submit feedback on the CFRG mailing list.

@jaredhirsch
Copy link

@Sc00bz @ggozad Hi, looks like this PR has been sitting for a while. What needs to happen for this to land?

@ggozad
Copy link
Contributor

ggozad commented Apr 26, 2017

@6a68 Last I remember @Sc00bz and @warner were going to coordinate with regards to compatibility and tests with python. I am fine with the state of it as is, it is a very valuable contribution to sjcl.

@jml
Copy link

jml commented May 18, 2017

FWIW, @exarkun and I are working on a Haskell implementation of SPAKE2 (ignoring PAKE2+ etc. for the time being).

It seems there a bunch of things that sit above the mathematical protocol (e.g. which hash algorithm to use, how group elements get mapped to bytes & vice versa, etc.) that need to be properly defined for full interoperability.

I haven't fully digested all the discussion in this PR, but it seems that @Sc00bz and @warner have gone some way to identifying these things & making decisions. I would be grateful if they could be listed out here in bullet point form! I guess this is what the draft-irtf-cfrg-spake2-03 (linked from #273) is about, but I notice that it has expired.

In any case, I'll keep plugging away, will give this PR a thorough read, and do my best to document what I'm doing so reviewing interoperability gets easier.

Should I update here, or is there somewhere else more appropriate?

@jml
Copy link

jml commented May 19, 2017

Attempting to summarise interoperability decisions above. Restricting to SPAKE2 for now.

  • To convert the password (a bytestring) into a scalar: HKDF(pw_bytes, cxt="SPAKE2 pw", len=byteLen(n) + 16)
    • Note: I am not 100% clear on
      • what n represents here
      • whether this is an extract-and-expand HKDF operation or something else
      • where the + 16 comes from or why it's there
  • To construct the session key: H(H(pw_bytes) + H(idA) + H(idB) + X_msg + Y_msg + K_bytes), where:
    • H is an arbitrary hash function, defaulting to SHA256
    • + is concatenation
    • idA & idB are arbitrary bytestrings representing the identities of the sides
    • X_msg and Y_msg are the blinded X & Y values, converted into bytes (see below)
    • K_bytes is the result of the second step of the protocol (not exactly sure what I'm supposed to call this. In haskell-spake2 it's generateKeyMaterial), converted into bytes (again, see below)
  • For symmetric SPAKE2: H(H(pw) + H(id) + msg1 + msg2 + K_bytes), where:
    • id is the symmetric ID. It will be the same for both sides, that's why it's symmetric
    • [msg1, msg2] = sorted([X_msg, Y_msg]), where sorting is bytewise lexical
  • When sending blinded values on the wire (X* & Y* in Abdalla & Pointcheval — is there a better term for these?):
    • turn the element into bytes $SOMEHOW (Note: I don't see a clear agreement on this)
    • prefix the value with a single byte: A (0x41), B (0x42), or S (0x53) to indicate side A, side B, or symmetric respectively

Thus, before exchanging, two nodes need to agree on the following, out-of-band:

  • hash algorithm, H
  • group to use
  • arbitrary members of group to use for blinding
  • symmetric or asymmetric
  • their respective IDs

I think that's it, but I could be wrong.

Open questions

This section just collects my questions, most of which are embedded in the description above.

  • what is the n in the password-to-scalar function?
  • where does the + 16 come from?
  • exactly what HKDF operation is being performed?
  • how are blinded elements turned into bytes to be sent on the wire?
  • how do we determine M, N, S?
    • does there need to be a well-known, agreed-upon way of turning simple bytestrings into group elements?
    • does this mechanism need to vary by group, or can it be defined in general terms?
  • how does endianness come into play?

Am I missing anything? Could someone have a go at answering my open questions? If not, I'll get there eventually.

jml added a commit to LeastAuthority/haskell-spake2 that referenced this pull request May 25, 2017
Much input derived from a PR to implement this for Javascript:
bitwiseshiftleft/sjcl#273
@jml
Copy link

jml commented May 25, 2017

Answers to my own questions:

what is the n in the password-to-scalar function?

It is the size of the elements of the group, in bytes.

where does the + 16 come from?

Feelings. Oversizing the password by a certain number of bytes means the resulting scalar is more uniformly distributed over the group

exactly what HKDF operation is being performed?

expand and extract together

how are blinded elements turned into bytes to be sent on the wire?

Groups must implement their own means of turning elements into bytes.
The number of bytes should be fixed so that all elements encode to the same length bytestring.
The i2osp function in Haskell's cryptonite library seems to do the right thing, even if I don't know what that is.

how do we determine M, N, S?
does there need to be a well-known, agreed-upon way of turning simple bytestrings into group elements?

Not necessarily, but it's a desirable property for being able to readily agree on protocol instantiations (if that's the correct term) between implementations.

does this mechanism need to vary by group, or can it be defined in general terms?

It must vary group-by-group.
Note that in python-spake2, groups even differ in the way they turn a bytestring into a seed integer,
with Ed25519 oversizing the HDKF expansion
while IntegerGroup does not.
This distinction seems unnecessary to me.

how does endianness come into play?

Still no idea, but i2osp seems to sort it out, at least with Python interoperability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants