Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Enable TLS external session cache #14553

Open
LuyaoZhong opened this issue Jan 3, 2021 · 18 comments
Open

RFC: Enable TLS external session cache #14553

LuyaoZhong opened this issue Jan 3, 2021 · 18 comments
Labels

Comments

@LuyaoZhong
Copy link
Contributor

LuyaoZhong commented Jan 3, 2021

Title: Enable TLS external session cache

Description:

TLS provide some session resumption policies to do a quick handshake. Envoy support "session ticket" and "session id" policies when TLS version is older than 1.3. For "session id", Envoy stores the session in the memory on server side(internal session cache). TLS also support external session cache which would be a enhancement for Envoy TLS session management. That means if the TLS session is not found in internal storage or lookups for the internal storage have been deactivated, the server will try the external storage if available. With external cache supported, Envoy could be stateless and still can leverage session resumption, and session could be reserved for longer time, and a session could be shared between multiple Envoy proxies.

I have a initial idea about design and implementation, the API in my mind is probably like this:
extensions.transport_sockets.tls.v3.DownstreamTlsContext
{
"common_tls_context": "{...}",
"require_client_certificate": "{...}",
"session_ticket_keys": "{...}",
"session_ticket_keys_sds_secret_config": "{...}",
"disable_stateless_session_resumption": "...",
"session_timeout": "{...}",
"ocsp_staple_policy": "...",
"external_session_cache": "" // introduce this new field extensions.transport_sockets.tls.v3.TlsExternalSessionCache
}
extensions.transport_sockets.tls.v3.TlsExternalSessionCache
{
"session_storage_type": "redis", // redis is one of the conventional choices for external cache
"session_storage_cluster": "redis_cluster"
}

[optional Relevant Links:]

Session Caching

@LuyaoZhong LuyaoZhong added the triage Issue requires triage label Jan 3, 2021
@LuyaoZhong LuyaoZhong changed the title Enable TLS external session cache RFC: Enable TLS external session cache Jan 3, 2021
@mattklein123 mattklein123 added area/tls and removed triage Issue requires triage labels Jan 5, 2021
@mattklein123
Copy link
Member

cc @ggreenway @PiotrSikora

I think it would be interesting to look at external session caches, but I would definitely want this to be pluggable from the start.

@ggreenway
Copy link
Contributor

Makes sense to me. +1 on making it pluggable. One option is just a simple gRPC service, and adapters to a backing store (redis, etc) are the responsibility of the user.

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jan 6, 2021

@ggreenway @mattklein123 Thanks for your comments. Could you give me some guide or references? I am a newbie in Envoy project, it will be great if I can contribute this feature to community.

Enabling external session cache needs setting some callback functions to SSL context, I'm thinking about it has to modify the main code in transport socket tls extension. So do you mean only making the external session storage as a gRPC service and it provides API to connect to concrete backing store?

@mattklein123 mattklein123 added the help wanted Needs help! label Feb 3, 2021
@mattklein123
Copy link
Member

Sorry for the long delay in response.

So do you mean only making the external session storage as a gRPC service and it provides API to connect to concrete backing store?

Yes, exactly. I think if we develop a gRPC API that can be optionally called from the TLS code, this will allow arbitrary backend implementations. We have many examples of APIs at this point that are "side-calls" of this nature. Take a look at:

  1. https://github.com/envoyproxy/envoy/blob/main/api/envoy/service/ratelimit/v3/rls.proto and how it's used in the various rate limit filters (both network and HTTP)
  2. https://github.com/envoyproxy/envoy/blob/main/api/envoy/service/metrics/v3/metrics_service.proto and how it's used in the metrics service stat sink
  3. https://github.com/envoyproxy/envoy/blob/main/api/envoy/service/ext_proc/v3alpha/external_processor.proto and how it's used in the new ext_proc HTTP filter which allows a gRPC API/backend to implement some of the HTTP filter semantics.

^ should hopefully be enough to get you started but feel free to ping back if you have any questions and we can help more. Also, for this feature I would recommend doing a small gdoc with a proposed design that we can discuss. Thank you!

@LuyaoZhong
Copy link
Contributor Author

@mattklein123 Thank you for providing these references, I‘ll draft gdoc when I get time, it might take some time since I'll take a vacation about one or two weeks, I'll update when I'm back. Thanks. :)

@LuyaoZhong
Copy link
Contributor Author

Sorry for long-time no updates due to some personal emergent work. @mattklein123 @ggreenway Do we have a template for design doc?

@mattklein123
Copy link
Member

We don't have a fixed template right now. I would just type up something in whatever format you want and we can go from there. I would just cover standard design doc stuff (problem statement, goals, non-goals, high level design, etc.)

@LuyaoZhong
Copy link
Contributor Author

@mattklein123 Hi, I draft a minimal design doc, welcome to review.
Design Doc

@ggreenway
Copy link
Contributor

I think it would be good to discuss if/how this would be applicable to TLS 1.3, and how this compares to session tickets (ie when would you use a session cache instead of session tickets).

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Mar 5, 2021

@ggreenway @mattklein123 I add a background section to answer the questions from you.design doc
Besides, I would like to consult you where I should start my PoC. I figure out how to add a config API. But for gRPC service, I add my protobuf file but it doesn't generate C++ headers after I build envoy. And I am not very clear about how to write a gRPC client based on it. I wrote an independent gRPC client before but I don't know how to make it work with Envoy. Do you have any guide docs? Thanks in advance.

If possible I would like to add more details into design doc such as interface definition, it might help me coding. :)

@LuyaoZhong
Copy link
Contributor Author

Hi, @ggreenway @mattklein123 , I'd like to start with a PoC first and polish my design doc at the same time. I have almost done the API part. And currently I get stuck at generating pb.h files from my proto file(I need to implement a envoy built-in grpc client based on current design), is there any tool in envoy to help that?

@ggreenway
Copy link
Contributor

proto compilation is done by the build system. For an example, look at the ratelimit filter (source/extensions/filters/common/ratelimit). In the BUILD file, it specifies a dependency on the protos in the deps via @envoy_api//envoy/service/ratelimit/v3:pkg_cc_proto. You can see that grpc service defined in api/envoy/service/ratelimit/v3/rls.proto, and the BUILD file in that directory.

@LuyaoZhong
Copy link
Contributor Author

proto compilation is done by the build system. For an example, look at the ratelimit filter (source/extensions/filters/common/ratelimit). In the BUILD file, it specifies a dependency on the protos in the deps via @envoy_api//envoy/service/ratelimit/v3:pkg_cc_proto. You can see that grpc service defined in api/envoy/service/ratelimit/v3/rls.proto, and the BUILD file in that directory.

Thanks.

@LuyaoZhong
Copy link
Contributor Author

@ggreenway I'm a little confusing about the Envoy built-in grpc client, e.g. ratelimit and ext_proc, they all implemented a async client, it seems that the request and response are seperated, but for tls external session, I need get the response immediatly after I send out the request since I need that session to do the quick handshake, am I supposed to implement a sync client? Do you have any suggestion?

@ggreenway
Copy link
Contributor

For it to work properly, you must be able to treat it as async. If you try to make it sync, an envoy worker will be blocked waiting on the response, and performance will be unacceptable.

@l8huang
Copy link
Contributor

l8huang commented Oct 3, 2023

@LuyaoZhong what's the current status of this RFC? Thanks

@soulxu
Copy link
Member

soulxu commented Oct 9, 2023

@l8huang I think @LuyaoZhong is already moving the interesting. If you are interesting on this, you can free to take it.

@zhangbo1882
Copy link
Contributor

I pick up the task. I submit a initial PR #35014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants