Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS #17

Open
wants to merge 46 commits into
base: master
Choose a base branch
from
Open

TLS #17

wants to merge 46 commits into from

Conversation

snej
Copy link
Contributor

@snej snej commented Aug 20, 2019

I've implemented TLS support since my project requires it. There are many TLS libraries, so I've kept the interface abstract, and provided an implementation that uses mbedTLS.

Design:

tls_socket subclasses stream_socket. Its constructor takes an existing open stream_socket that wraps around and takes over. The client reads/writes the tls_socket, which calls into the TLS library, which in turn reads/writes the underlying stream_socket.

There's also a tls_context class which acts as a factory for tls_sockets, and holds state that's to be shared between sockets, like the trusted root CAs and (on the server side) the certificate and private key.

Both of these are abstract and get subclassed by the real implementation(s).

Caveats:

  • Has only been compiled with Clang, on macOS.
  • Has only been built with Xcode, i.e. I haven't updated the CMake build.
  • I don't know how you want to handle the mbedTLS dependency, so it has to be separately installed.
  • No unit tests yet.
  • I've tried to adhere to your coding style, but it's different from my own (I don't use snake_case) so I may have left camelCase names by mistake.

Architectural Issues:

tls_socket is a subclass of socket (via stream_socket) that overrides the read/write methods, but it breaks some of socket's assumptions, or rather its concrete subclasses do:

  1. It doesn't have a file descriptor. (The socket it's wrapping does, but the client should never see that.)
  2. The TLS API calls don't set errno, they return error codes directly.
  3. The TLS error codes are not in the same 'namespace' as POSIX/sockets errors.
  4. A TLS socket isn't duplicable, nor easily moveable, because it has state other than just a file descriptor.

It's not my place to refactor your existing APIs to account for these, so I've resorted to some hacks for now:

  1. tls_socket's constructor sets a bogus file descriptor, a negative value (but not -1), so that the socket superclass will know that it's open.
  2. I added a set_last_error() method to socket, so mbed_tls_socket can store the errors it gets back from the mbedTLS API.
  3. Fortunately all errno values are positive, while mbedTLS errors are negative, so client code can use the sign bit to distinguish between them. Of course I can't guarantee this is true of OpenSSL, SecureTransport, WolfSSL, etc., so I'm just dodging the problem!
  4. I allocate tls_socket instances on the heap and manage them with unique_ptr.

(I'm also ignoring the complications implied by DTLS, i.e. TLS-over-UDP, which would require a socket class that doesn't inherit from stream_socket...)

Architectural Suggestions:

These problems go away if tls_socket doesn't subclass socket, because then there's no existing API it has to adhere to. Unfortunately this would be really awkward for clients that need to support both TLS and non-TLS sockets, because they no longer share an API.

Sounds like it's time for another layer of abstraction! ("Any problem can be solved by adding a layer of abstraction, except for the problem of too many layers of abstraction.")

The best idea I can come up with is to split out the file descriptor and stream concepts from the base classes. So socket becomes purely abstract, with subclasses fd_socket (adding the file descriptors) and tls_socket.

Then split out the read/write API into reader and writer interfaces. The concrete socket classes can expose instances via accessors. (If the reader and writer each have their own last-error properties, that also solves the thread-safety problem of reader & writer threads!)

@snej snej changed the title Tls TLS Aug 20, 2019
@fpagliughi
Copy link
Owner

Awesome. I was thinking SSL/TLS should be next, and it looks like you hit a number of issues that I was worried/thinking about. I'll need some time to go over all of this, but my thinking was:

  • SSL/TLS sockets should adhere to the current API as much as possible.
  • People not using secure sockets should not pay a penalty for having them added to the library (i.e. zero cost abstraction). So:
    • The base API should not be less efficient for TCP/UDP. So I'm leaning away from abstract base classes and virtual functions (though haven't totally ruled them out).
    • The the new code should be optional at build time.

Some other things that I see here:

  • I was leaning towards OpenSSL, as it seems the most popular on Linux, but this library started as one for embedded systems, so we should probably do it in a way that the choice of SSL library is pluggable (if possible), or selectable at build time. (I can't imagine someone using multiple SSL libraries in a single build).
  • I've gone through many coding styles with C++ over the years, but finally resolved that the C++ std library should be the definitive standard for naming convention. They use snake case for classes and methods. So now, so do I.

And one minor thing:

  • socket::clear() resets the last error code back to zero by default. It also takes an optional int. So you can use that to set the last error on an object. Perhaps it's not very clear, but it's probably mainly useful for derived classes. I think I stole that from iostreams; can't remember. So set_last_error() is not necessary.

@borrrden
Copy link
Contributor

With my updates, it has now been tested on Windows as well. We spent a lot of time getting rid of OpenSSL from our codebase, which is part of the reason for the decision to go with mbedTLS instead. As a bonus, it's made with embedded systems in mind and it's way easier to compile.

@snej snej force-pushed the TLS branch 3 times, most recently from 7dfa729 to 1ca3726 Compare September 5, 2019 20:40
@fpagliughi fpagliughi added this to the v0.8 milestone Sep 9, 2019
@snej snej force-pushed the TLS branch 3 times, most recently from e297281 to cb7966c Compare September 12, 2019 20:24
These return the error code directly, instead of storing it in the
socket's last_error property.
snej and others added 13 commits September 25, 2019 17:59
Add 'functional' include needed by Windows
1. The async connect loop on Windows doesn't return WSAEINPROGRESS, but rather WSAEWOULDBLOCK (this is the standard return value, at least in Winsock2, for operations that cannot be completed immediately when the socket is in non-blocking mode)

2. A subtle error in stream_socket.cpp made the underlying problem unnoticeable (`return ret == SOCKET_ERROR ? ret : written` when `ret` is `int` and `written` is `DWORD` (`unsigned long`) returns the value typed as the higher width of the two types in this case, and so -1 loses its sign information and that causes chaos for callers who check for `ret < 0`).  The ultimate problem is that, unfortunately, `WSASendMsg` only works for datagram or raw sockets, not TCP sockets like we have here.  `WSASend` needs to be used instead.
It's more modern and is actually what people think is going to happen when they use SO_REUSEADDR
So check for GNU behavior using __GLIBC__ instead
Only in cases of importing system certs.  Some of them may be so old (or maybe so new?) as to be unable to be read by mbedTLS.  The behavior should not be to discard everything.  However when specifying our own chains we want everything to be valid.
So that when throwing an exception up the chain, a positive integer doesn't get interpreted as an unrelated POSIX code.  Instead indicate that the verification failed.
snej and others added 18 commits April 29, 2020 11:47
Client code needs it to find the actual handle, the peer address, etc.
(I'd override those methods, but they're not virtual.)
This is important because if shutdown has been called, then the socket probably is not good for general use anymore and this state should be considered when making decisions about using it.
It doesn't behave like the others, and acts as SO_REUSEPORT
and/or Windows desktop apps compiled in unicode mode

1. Replace macros with their hardcoded ASCII equivalents to avoid an invalid cast of WCHAR* to string
2. Replace the unavailable CertOpenSystemStore with the more low level CertOpenStore
mbedtls_context::set_root_cert_locator() can be used on platforms
such as iOS, where the list of trusted root certs isn't available, but
there is API to find the root cert that signed a given cert.
1. There is no way to reset the state of the validation back to using the system certificates, so take setting a null root cert locator callback to mean that
2. When using a pinned cert, all errors up the chain should be ignored because all we care about are the potential errors on the leaf
Even if the handshake failed, so that it can be examined and pinned on the other side if necessary
Instead just pull out its raw data since that is what it will end up as anyway
This is an optional part of the TLS handshake in which a list of acceptable CAs are sent to the client along with the certificate request.  A well behaved client will check this and not send any certs that are not issued by these CAs.  This doesn't jive well with some late stage authentication methods that don't make use of CA certs, so provide an option to disable it (it is enabled by default).
WINAPI_FAMILY_PARTITION is only defined on Windows
The single byte string version is deprecated and causes compiler warnings
@logidelic
Copy link

This sounds great. Just wondering if others are using this branch? Are there any plans to integrate it?

MBEDTLS_DEBUG_C is a flag that can be disabled in mbedtls/config.h,
to disable mbedTLS's debug logging. I tried doing so, which revealed
two compile errors in mbdetls_context.cpp. This fixes them.
snej added 2 commits October 6, 2021 22:58
I noticed that the `mbedtls_socket` constructor was passing wrong value to
`setup_bio` -- the constructor's bool is called "blocking", whereas the param
is called "nonblocking".

This causes the mbedTLS recv callbacks to be setup wrong; the docs for
`mbedtls_ssl_set_bio` say:

> "The two most common use cases are:
>  - non-blocking I/O, f_recv != NULL, f_recv_timeout == NULL
>  - blocking I/O, f_recv == NULL, f_recv_timout != NULL"

but we end up doing it backwards.

Strangely this doesn't seem to have led to problems, since we've never noticed
this bug before. But it seems worth fixing.
While testing I ran into a situation where a read was returning -1 signaling an
error, but the socket error was 0. This then triggered an assertion failure in
LiteCore.

mbedTLS was returning the code MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY, which our
`translate_mbed_error` method converts to errno 0, i.e. it's not an error at
our level. However, `check_mbed_io` still returned -1.

Fixed by making `check_mbed_io` return 0, not -1, if the error code translates
to 0.
@borrrden
Copy link
Contributor

At this point I don't have any confidence that this is going to be merged here, so I think it should be closed. I am planning to change our branch name anyway, but to answer the old question: I doubt any "others" besides us are using this branch.

@fpagliughi
Copy link
Owner

fpagliughi commented Oct 20, 2021

Hey @Borrden! Thanks again to you and @snej for all the work on this, and apologies that I haven't had much time to deal with this library in a while as my professional work has completely switched over to Rust for the last few years.

As I stated in my original comment, I do want TLS to be the next addition to this library, but as most of the companies I work with still use OpenSSL, and it's still quite popular, I didn't want to go with a single alternate SSL library, but wanted to figure a way to compile in one or another.

I also wanted to make sure that TLS was a zero-cost abstraction. TCP and UNIX-domain sockets should see no overhead from the TLS code. (I can't remember why I thought that might not be the case with this PR, but I wanted to make sure first).

But the biggest thing is that there seems to be so much great stuff in this PR that has nothing to do with TLS, and even if I didn't want to go with the mbedTLS exactly as coded here, there's still a lot of this PR that I absolutely do want to get merged in.

Give me a little more time to go over this before you close it. I hadn't been thinking to reject this PR. I just haven't had time to give it the attention it deserves. I'll try to block out some time soon.

@borrrden
Copy link
Contributor

I'm not sure we've taken any steps or not to work towards the goals that you have (though maybe?). That combined with the fact that since the PR this branch has received all of our work (bug fixes, Windows testing, etc, that weren't specific to TLS support, and weren't meant firstly for this PR, but rather just to be put into the product) means that as time goes on it gets harder and harder to review. I certainly don't mean to put any blame anywhere, I just wanted to give it a good poke to see what happened.

@fpagliughi
Copy link
Owner

Sorry, this repo hasn't gotten a lot of love lately; I just haven't been doing much C++ the last few years, and it's hard to get your brain back into C++ mode when you've been away a while!

But... I've got some C++ work coming up this year, and is as good a time as any to get back in the groove and get this library updated.

I've cherry-picked most of the non-TLS commits in this PR into the develop branch and will clean them up and expand to fit with the rest of the API and verify with the targets I have available.

I hope to get to a TLS implementation within the next few months. As I mentioned long ago, I need OpenSSL support for my work, so I assume the attack would be to make sure we have generic tls_context and tls_socket interfaces which can be implemented by OpenSSL or by this mbed_tls_context, etc, fit these mbed classes in, and get started on the OpenSSL implementation.

I assume the user would opt in based on build options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants