Skip to content

Signed Documents

Jens Alfke edited this page Jul 5, 2015 · 13 revisions

This is a specification for digitally signing a JSON document. It's not tied to Couchbase Lite (or the Couchbase Sync Gateway) though it was created for use with them. Nor do those products require signed documents.

Signing a document provides these benefits:

  • The enclosed public key can be used as an identifier of the entity that signed the document.
  • Any unauthorized modification of the document can be detected, as it'll invalidate the signature.

Thus a signature serves as a form of authentication. Why do we need this when servers like the Sync Gateway and CouchDB already supports several types of authentication?

  • The signature authenticates a document, not a connection. This is a very important distinction when documents are replicated, especially when they can pass between multiple servers. A document may be forwarded by an entity that didn't create it, so the fact that the replicator connection is authenticated does not authenticate the document. The document has to carry its own credentials.
  • Public keys allow for many types of identity and authentication. In the simplest case, an entity can create a key-pair and use the public key as its sole identification; this is useful even though it doesn't tie that entity to any external form of ID. More complex systems can use a hierarchical public-key infrastructure like X.509 or a "web of trust" like PGP.

History:

  • 2009: Initial draft of an earlier spec, heavily influenced by SDSI, an experimental public-key infrastructure that used S-expressions as its universal data representation.
  • Feb/March 2014: Significant evolution/rewrite of the old spec. First appearance in the Couchbase Lite wiki.
  • August 2014: Simplified the data format by removing the use of nested arrays tagged with algorithm IDs. Added support for Curve25519 algorithm.
  • July 2015: Replaced references to Curve25519 with Ed25519, since the latter is what's actually used for signatures (Curve25519 itself only does encryption.)

Usage

There are two main algorithms here: the signature algorithm takes a JSON object and a private key and produces a signature object, and the verification algorithm takes a JSON object and a signature object and determines whether or not the signature is valid for that object.

Unlike some other JSON-signature systems, the object being signed doesn't need to be specially encoded. This is important because it doesn't get in the way of systems (like Couchbase Lite or CouchDB) that read the object.

Another advantage is that the signature doesn't need to be contained in the signed object (or vice versa.) It is common for the signature to be contained -- and there's a special (signed) property defined for it -- but there are situations where this isn't practical. For example, some storage systems may require metadata such as a signature to be stored externally. In this case it's up to the application to have a way to find the signature of an object.

Usage With Couch-Style Document Databases

When signing documents belonging to a CouchDB-family database (also including Couchbase Lite and the Couchbase Sync Gateway) it's important to handle the document metadata correctly.

The key point is that the document ID and the parent revision ID must be signed. If not, the document can be used for replay attacks. If the doc ID isn't signed, an attacker can change it to another ID and create a copy of the document. If signature doesn't include the parent revision ID, an attacker can re-post the document at any time, reverting it to an older version.

Specifically:

  • When signing a document or verifying a signature, its _id property MUST be included in the JSON being signed.
  • The parent rev ID isn't stored in the document, so it needs to be explicitly added as a parent_rev property. This property MUST be stored in a signed Couch-type document (and included in the JSON being signed) and MUST be equal to the _rev property of the parent revision, unless this is a first-generation document with no parent revision, in which the property MUST be absent.
  • The only other metadata property that typically appears is _attachments -- in principle this would be good to sign, but I am not sure its exact contents will stay the same across replication (for example the digest properties might change format.) More research is needed.

Data Formats

Standard Algorithms

  • SHA: A SHA-family digest (SHA-1, SHA-256, etc.)
  • RSA: An RSA public key, encoded in ASN.1 BSAFE format
  • 25519: An Ed25519 public key

Some algorithms have multiple key or output sizes; the size doesn't need to be specified in the algorithm name because it can easily be inferred from the size of the associated data.

Note: Ed25519 signatures are generated according to the algorithm used by the crypto_sign_detached function of libSodium.

Signature Object

This is a JSON object that acts as a digital signature of some other JSON object (without specifying where that other object is.)

A signature object has at least the following properties, each of which has a base64 string as its value:

  • digest_SHA: A cryptographic digest of the canonical encoding of the object being signed.
  • key_RSA or key_25519: The public key of the key-pair performing the signing.
  • sig: The digital signature of the canonical encoding of the signature object minus this field. (The algorithm doesn't need to be specified: it's implicitly the same as the one used for the key property.)

Optional properties include:

  • date: A JSON-format (ISO-8601) timestamp identifying when the signature was generated. Example: "2014-08-29T16:22:28Z"
  • expires: The number of minutes the signature remains valid after being generated. (If missing, the signature never expires.)

Signed Object

This is simply a JSON object that directly contains its signature as the value of a (signed) property. Obviously this property needs to be ignored while computing the canonical digest of the object.

Algorithms

Generating Canonical Digests

Digest algorithms like SHA operate on raw binary data, not abstract objects like JSON. There are many different ways to encode the same JSON object as data, which will all result in different digests. So for the signer and verifier to agree on the same digest of an object, we have to define a canonical encoding algorithm that always maps equivalent objects to identical data.

There is no standard for canonical JSON encoding yet, but the OLPC group has documented one that's pretty reasonable:

  • No whitespace.
  • Numbers must be representable as 48-bit integers (i.e. in the range [-2^47 .. 2^47-1].)
  • Numbers cannot have decimal points nor scientific notation nor leading zeros. "-0" is not allowed.
  • Strings (including keys) are converted to Unicode Normalization Form C.
  • No escape sequences in strings, other than \" and \\. All other characters must be represented literally, including control characters.
  • Object keys are lexicographically sorted by Unicode character values (code points). The sorting occurs before escape sequences are added.
  • The entire output is encoded in UTF-8.

Note: Non-integers are forbidden because different formatting libraries will convert them to textual form in different ways.

Note: Integers are restricted to 48-bit, not 64-bit, because many JSON parsers convert numbers to double-precision floating point, which is a 64-bit value but only has about 50 bits of precision (mantissa).

Note: The above-linked OLPC spec says "string are uninterpreted bytes" and "arbitrary [binary] content may be represented as a string" — this is untrue. The JSON specification states that "a string is a sequence of zero or more Unicode characters". The encoding of a string must therefore be valid UTF-8 data. The only safe way to store binary blobs in JSON is to encode them somehow, typically as Base64.

Creating A JSON Signature

  1. Compute the canonical digest of the object being signed.
  2. Create an unfinished signature object with only digest and key properties.
  3. If desired, add date and expires properties.
  4. Add any other optional properties desired.
  5. Compute the canonical digest of the unfinished signature object.
  6. Generate a digital signature of that canonical digest, using the private key that matches the public key used in step 2.
  7. Add the base64-encoded signature as the sig property of the signature object.

Note: The purpose of removing underscore-prefixed properties from the canonical form is to exclude those properties from the signature.

Verifying A JSON Signature

  1. Temporarily remove from the target object any properties that should not be considered part of its content; for instance, metadata like a date received (or the (signed) property itself, in the case of a signed object.)
  2. Compute the canonical digest of the target object, using the algorithm given in the signature's digest property.
  3. Compare this digest with the one contained in the digest property of the signature. If they aren't equal, fail (the object does not match what was signed.)
  4. Copy the signature object and remove the sig property from the copy.
  5. Compute the canonical digest of the copied signature object.
  6. Verify the digest against the signature contained in the sig property, using the public key contained in the key property. If verification fails, fail (the signature itself has been altered.)
  7. If the signature contains a date property:
  8. If that date is in the future, fail (not valid yet, or else there's unacceptable clock skew.)
  9. If the signature also contains an expires property, add that number of minutes to the date. If the resulting time is in the past, fail (signature expired).
  10. Succeed: the signature is valid!

At any step:

  • If any value in the signature object is invalid (date not in ISO-8601 format, invalid base64, digest not expressed as 2-element array, etc.), then fail (the signature is syntactically invalid).
  • If any algorithm string is unrecognized or the program can't perform that algorithm, then fail (not possible to verify the signature.) It is not known whether the signature is valid, but the application should not trust the signature or the object that was signed.

Example

RSA-2048, no date or expiration:

{
    "digest_SHA": "CVVhu8Ux+kv7dZHV+9gV7q5tWVA=",
    "key_RSA": "MIIBCgKCAQEAyjt5YE/R0f8fkQS95RjV0KqogjtNIIMiv7EuxYZLXL0AxXHKH13VmUTUis9PjtlsW3CoBNldPUyj5Xmujn39AlIhTioUXtBOrySrfAiaqfo28ytavY2q2X75YzQwLbTt1mRaP4Vl/NyYFf1sx7EfBoC807VcXbSbajxf7T5E9o/zwBgTME6nvXy1OPT+LkOHPmoat9RM37rOhBhA1hLHG2zxeQum31Ck0TrKJrefgmITQQ/SQsX5d8b9vXwvYrc7enGU0EfxBL4Ni62+mPYuFkh6uYrvoTNZ0wqSnhljF8C8JsxwQZ7zUWvRFhpsR1Xk71XYGcis/ZxiWXOQ+7LvAQIDAQAB",
    "sig": "nBMh3nrOPwsQIrduAhHSaXIwtHQ74xFz0S4YN2IrSPhxUtTomuRSwO0vHHjHHZFKyReYJUikmVrJ7gkObdO325E07bRYfRG2phao1R1D/Jmj0rBEhAXaDbkfSd99URJjzsjxCagwRXU2JjrjNsih53dUJXKwYcyPpjgwhBy7Nzs8PjJCr4szA//ckLtSBI8G5pjY8eTrPR2udLIflwUgji51sxRvT6+GRFjqWH9JeLPoyvK6J1E3+xsCj397dUAcodCgomotnjghC/VywK/O7wDjgA9aj8/OyMhTyf3MPGjF05zQj2ggjo76Yuqz9Z7aHp5A9eJeARNKqTy2646gWQ=="
}

Document with an embedded signature using Curve25519, with date and one-hour expiration:

{
  "foo" : 1234,
  "bar" : ["hi", "there"],
  "(signed)" : {
    "digest_SHA" : "LIf7ohS5NIajwHNUbmmfilKVgf0=",
    "sig" : "r7d8KgAiQfHUXRCJiOFjBnPAF1\/59Cnk0Yvqx58xwOt1GnsMZ+jPxaPmTbgN+mTgFjBzi7dt4xwLtN3gPw9UgA==",
    "key_25519" : "CvRaGuU\/Hlod4+wK4PR2EZTF3NMV5zZ6f7OZK4qARQ8=",
    "expires" : 60,
    "date" : "2014-08-29T22:44:48Z"
  }
}
Clone this wiki locally