Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zeros embedded in subject and issuer fields when using certificates generated with makecert (e.g. self-signed for test-signing during driver development) #95

Open
hugmyndakassi opened this issue Jan 30, 2024 · 7 comments
Labels
bug Something isn't working C:uthenticode The core uthenticode library enhancement New feature or request help wanted Extra attention is needed

Comments

@hugmyndakassi
Copy link
Contributor

hugmyndakassi commented Jan 30, 2024

It appears as if makecert writes Windows-style wchar_t* into these fields verbatim instead of encoding them as, say, UTF-8. This is an issue as it will lead to the strings being interpreted as single-character string, given every other character is a zero byte.

I worked around this locally by creating a function that filters these:

static inline std::string filter_escaped_zero_bytes(char const* instr)
{
    auto input = std::string(instr);
    static std::string const needle = "\\x00";
    std::ostringstream os;
    std::size_t currpos = 0, oldpos{};
    do
    {
        oldpos = currpos;
        currpos = input.find(needle, currpos);
        if (std::string::npos == currpos)
        {
            break;
        }
        os << input.substr(oldpos, currpos - oldpos);
        currpos += needle.size();
    } while (true);
    os << input.substr(oldpos); // remainder
    return os.str();
}

... and using that in the ctor (Certificate::Certificate):

subject_ = filter_escaped_zero_bytes(subject.get());
issuer_ = filter_escaped_zero_bytes(issuer.get());

Not sure this is an issue you'll want to address at all, since it appears that MS and its makecert tool deviate from the standard here. Just wanted to bring this to your attention as well as to the attention of those watching the project.

Feel free to close this pretty much immediately.

PS: this isn't particularly optimized, because I only ever encountered this with those certificates created by makecert and we only use these in testing scenarios. But I'm sure there is still some room for improvement.

@woodruffw
Copy link
Member

Thanks for the report! Yeah, I'm not sure if there's a great option for us here:

  • The main relevant standard is probably RFC 5280, which says that individual RDN attribute values have to be either PrintableString or UTF8String. So mkcert is just plain wrong to use UTF-16 or UCS-2 or whatever it's doing. OTOH, they may be following RFC 3280, which is much looser about value type encodings.
  • The subject and issuer are both exposed as std::string intentionally, since consumers should treat them as a (ptr, len) pair that may have all kinds of weird bytes internally (such as embedded NULLs).

So, two ideas I have:

  • Tweak the documentation slightly to make it clear that get_subject() and get_issuer() may return all kinds of nonsense, including NULL bytes.
  • Change get_subject() and get_issuer() to return std::vector<std::byte> instead, which will emphasize that they aren't necessarily a naive printable encoding. This will be a breaking change, but IMO an acceptable one.

Thoughts?

@woodruffw
Copy link
Member

  • The subject and issuer are both exposed as std::string intentionally, since consumers should treat them as a (ptr, len) pair that may have all kinds of weird bytes internally (such as embedded NULLs).

Ah, never mind -- I forgot that we call X509_NAME_oneline which boils it down to char *, causing all this pain. I think we need to replace that call with X509_NAME_print_ex and pass in ASN1_STRFLGS_UTF8_CONVERT, which will perform the UTF-8 conversion.

@woodruffw woodruffw added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed C:uthenticode The core uthenticode library labels Jan 30, 2024
@hugmyndakassi
Copy link
Contributor Author

Isn't ASN1_STRFLGS_UTF8_CONVERT for ASN1_STRING_print_ex() as opposed to X509_NAME_print_ex()? What am I missing?

@woodruffw
Copy link
Member

Isn't ASN1_STRFLGS_UTF8_CONVERT for ASN1_STRING_print_ex() as opposed to X509_NAME_print_ex()? What am I missing?

It's supported by both: from the OpenSSL docs:

The complete set of the flags supported by X509_NAME_print_ex() is listed below.

...

Additionally all the options supported by ASN1_STRING_print_ex() can be used to control how each field value is displayed.

From man 3 X509_NAME_print_ex, also here: https://docs.openssl.org/3.4/man3/X509_NAME_print_ex/

My understanding is that flag should work, but there might be other flags that have similar behavior that have better/more suitable behavior. I haven't gone through all of them 🙂

@hugmyndakassi
Copy link
Contributor Author

Yep, it does work. PR incoming for that one.

@hugmyndakassi
Copy link
Contributor Author

hugmyndakassi commented Sep 30, 2024

Ouch, this is trickier than I thought. Getting the name out isn't a problem. But getting it out in the old form is. I can't seem to come up with a combination of flags that could emulate the old behavior of X509_NAME_oneline.

The flag that would seem to make most sense — XN_FLAG_ONELINE — doesn't work. Example for the issuer field:

C = US, O = "DigiCert, Inc.", CN = DigiCert Trusted G4 Code Signing RSA4096 SHA384 2021 CA1

The flag XN_FLAG_RFC2253 looks saner in general, but may not be desirable due to the escaping:

CN=DigiCert Trusted G4 Code Signing RSA4096 SHA384 2021 CA1,O=DigiCert\, Inc.,C=US

Original form with X509_NAME_oneline():

/C=US/O=DigiCert, Inc./CN=DigiCert Trusted G4 Code Signing RSA4096 SHA384 2021 CA1

The most notable difference being the inverse order.

In other words, without going through hoops it will be hard to retain backwards-compatibility. I reckon the correct approach would use X509_NAME_cmp or X509_cmp instead of relying on string representations for comparison. But I know from experience that folks will nevertheless opt to do string comparison and complain bitterly if the format doesn't comply with their particular expectation.


One method that may work would be XN_FLAG_RFC2253 & (~XN_FLAG_DN_REV) which mimics the old order, but then would require some post-processing to deal with:

  • missing leading slash
  • replacing commas by slashes
  • unnecessary escape sequences

Without that it still looks like this:

C=US,O=DigiCert\, Inc.,CN=DigiCert Trusted G4 Code Signing RSA4096 SHA384 2021 CA1

Please advise, @woodruffw ...

This code may come in handy when trying it out:

static inline bool name_to_string(std::string& strname, X509_NAME *name, unsigned long flags)
{
  std::unique_ptr<BIO, decltype(&BIO_free)> name_bio(BIO_new(BIO_s_mem()), BIO_free);
  // auto nameptr = impl::OpenSSL_ptr(X509_NAME_oneline(name, nullptr, 0), impl::OpenSSL_free);
  // strname = std::string(nameptr.get());

  if (-1 != X509_NAME_print_ex(name_bio.get(), name, 0, (flags) & ~(ASN1_STRFLGS_ESC_MSB)))
  {
    char* data = nullptr;
    auto len = BIO_get_mem_data(name_bio.get(), &data);
      strname = std::string(data, len);
      return true;
  }
  return false;
}

@woodruffw
Copy link
Member

Hmm, that's unfortunate. I'm of two minds on this:

  1. uthenticode doesn't stipulate that get_subject() returns a consistent format, and our current encoding is arguably bad and should be changed anyways (since it's specific to OpenSSL, not a standard encoding of Distinguished Names.
  2. More generally, this suggests that get_subject() -> string is probably too high-level of an API: I added that originally just for pretty-printing, not for people to do bytewise comparisons with 🙂

So, I think we should do two things here:

  1. get_subject() should use XN_FLAG_RFC2253. That'll be a compatibility break technically, but we never guaranteed the string format anyways. If you think that'll be too disruptive in 2.x though, we'll document it and do it with a 3.0 series release.
  2. We should probably add a new API, something like get_subject_components() or similar, which returns a vector<string> of all of the DN's entries, serialized individually. I think this can be done with X509_NAME_entry_count and X509_NAME_get_entry.

For now, (1) is probably fine though. (2) is something we can consider if users actually complain and want low-level access to each name component.

hugmyndakassi added a commit to hugmyndakassi/uthenticode that referenced this issue Sep 30, 2024
- Certificate got another ctor which takes the flags to pass when
  formatting the X509_NAME values
- The default formatting changed to XN_FLAG_RFC2253 but can be overridden
  from the outside by defining UTHENTICODE_DEFAULT_XN_FLAGS
- This introduces an incompatibility _if_ the caller assumes that the
  issuer and subject can be compared in their string form
hugmyndakassi added a commit to hugmyndakassi/uthenticode that referenced this issue Sep 30, 2024
- Certificate got another ctor which takes the flags to pass when
  formatting the X509_NAME values
- The default formatting changed to XN_FLAG_RFC2253 but can be overridden
  from the outside by defining UTHENTICODE_DEFAULT_XN_FLAGS
- This introduces an incompatibility _if_ the caller assumes that the
  issuer and subject can be compared in their string form
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working C:uthenticode The core uthenticode library enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants