-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent partially applies new fleet-server TLS configuration #5888
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
@pierrehilbert, @nimarezainia, @cmacknz we need to fully define and document what use-cases we'll cover with the proxy and TLS configuration. There are possible conflicting scenarios when the agent does not receive a proxy, certificate+key+passphrase or CAs. just like that: https://xkcd.com/1172/ |
@leehinman I guess you're the one who can help to define what are real use cases or not |
@AndersonQ the use cases that we need to support should be defined already. I had a doc sometime ago and permutations of how we connect via a proxy are documented, I can forward if needed. I would say that we need to fix the code to ensure those use cases are functioning properly. is this about the proxy having a different tls config? |
yes, about a new proxy removing any of the tls config. Either the clien certificates or the CA. Considering the control plane connection: agent -> fleet-server, TLS is configured through the CLI and a proxy might be added Depending on type of the proxy the agent might still have to perform a TLS handshake with fleet-server (pass-through proxy)
each of them can are evaluated in isolation and follow the precedence: which means, once set, any of them can only be changed, never erased This allows the agent to be individually configured, having different agents with However, right now, once a mTLS proxy is configured, it's not possible to remove The side effect is that if the client certificate (certificate, certificate key and key passphrase path) are a path and are deleted, the agent might not start up anymore as it'll look for the certificate (certificate, certificate key and key passphrase path). I think it's more about we well defining and document what can and cannot be done. The less intrusive is to keep it as is. However I believe we need to document clearly the implications of it and eventually, if needed, work to support removing proxy and TLS configurations thought the policy |
From a UI perspective, if you are adding a proxy and that proxy requires mTLS, then having the UI present all the TLS config together makes sense. It also is necessary to have a single atomic policy change for all of these parameters at once. The UI below looks right in the context of adding a new proxy. This is assuming that the certificate and certificate key boxes are the client mTLS certificate and key. If they are, we should make that explicit. I think generally if someone is using mTLS it is likely to be a universal requirement of their deployment and not a quirk of a specific proxy, so it is highly unlikely mTLS is going to be suddenly removed. That said, not being able to remove mTLS if it was enabled at install time is a problem that will limit our ability to debug issues. I suspect the core problem here is that clearing the proxy client cert and key has no effect if agent was configured locally with a client cert and key. It should be possible for Fleet to tell agent to ignore the install time cert and key. This is a more general problem for parameters we allow to be set at install time that is not specific to this particular issue. Unless I am missing something, we should open a separate issue to allow giving Fleet complete control over the agent configuration regardless of what got set on the command line at install or enroll time. |
It's already atomic in the sense the agent either applies all changes or none if the validation fails. However it'll apply the precedence "entity by entity".
or through the policy. Not being able to remove a previously set config the is the expected and current behaviour.
That is the dream. What I believe we need to decide is if, while we do not have this full control from fleet, the agent will treat proxy, CA, client cert as a single entity or keep the current behaviour. It's my understanding it makes more send to make them all a single entity, which, given the proxy address has changed, would allow to remove CAs and the client certificate/key. |
One thing is worth pointing out, the agent won't apply a configuration that causes loss of connectivity. Therefore, regardless the inconsistent state it might end up on, it'll still be able to reach fleet-server |
When I said "atomic" I meant from the UI perspective, the mTLS configuration and the proxy URL should come in the same policy change action or agent won't apply it. The Proxy configuration UI is currently correct in grouping them all together.
I don't fully understand this. The agent "entity" is an outgoing HTTP connection which may pass through a proxy and/or may use mTLS and/or may require additional certificate authorities to validate the certificate presented by the server it connects to. mTLS and the use of a proxy URL have no other relation, other than the fact that people may configure both a proxy and mTLS. The agent configuration needs no concept of a proxy entity right now. The UI side of this models proxies as a single entity in a single page because people editing a proxy will want to consider the TLS configuration of the proxy. "Proxy" as an entity is a UI only concept here, and facilitates sending agent all of the required changes to support a proxy in one policy change action. |
@AndersonQ does this mean we have to fix it asap in order to make it land in 8.16.1? Is this bad enough that we should block 8.16? Shall we already create a known issue? |
@jlind23 No, we're fine:
|
@cmacknz, by "single entity" I meant the agent would check if the proxy changed, if so, it'd apply the TLS configuration from the policy, even if it's empty. As you said, it's what one would expect to happen given the current UI |
Been thinking about this for a bit from a PKI standpoint. (Assumption) Because Fleet is a central management for configuration of agents, we would expect the configuration received from Fleet to completely override any local configuration. This would mean "erasing" an option if it wasn't present in the fleet supplied policy, but present in the local policy (not what happens today). Edge cases:
|
Well said @leehinman, we perhaps need an explicit way to annotate which parts of configuration can or must be per agent, because there is no way to centrally manage per agent values today. The mTLS private key and certificates are clearest example we've had so far. CC @nimarezainia The workflow we'd need to enable is rotating the mTLS private key of a single agent.
If I understand this right, this would be making an assumption in the agent that the UI will only ever update proxy and mTLS parameters together? I don't think agent should assume anything about how the UI will group changes together when there is not an explicitly defined API contract we can validate against (we don't have that here yet). |
That would be the end result. What we have today is trying to respect the separation Lee stated. "Proxy", "client cert and key" and "ca" are configured separately, but once set, they cannot be removed. |
@cmacknz is the next step here do build out the necessary UI enhancement for managing per-agent configuration (e.g. mTLS private key and certificate)? Or is there an immediate path forward where we can resolve this issue entirely on the Agent side without needing an UI work as a pre-requisite? |
@ycombinator I'm not Craig, but my take on it is the following: We cannot do it fully on the agent side, not the desired final state:
What we can do on the agent side it to couple any proxy change with the CA and client tls private keys and certificates. So in the edge case where a CA and client cert were set though the CLI (during enroll) and a new proxy is set with absent CA and client cert, the agent would also remove the CA and client cert. This is what can be done without UI changes. This would create a more consistent state, but goes against the behaviour that an nil or absent proxy/CA/cert would not remove an existing proxy/CA/cert. I believe tying proxy+CA/cert together is the more consistent approach. That means, if the proxy changes, the agent would apply what ever CA/cert is in the policy, even if it means removing what was set by the CLI |
The Elastic Agent can receive porxy and TLS configuration for fleet-server thorough its CLI, during install/enroll, and the policy. The configurations from the policy take precedence over the CLI, however and empty configuration from the policy does not changes the current configuration.
If the Elastic Agent is installed with a proxy using mTLS and the policy has another proxy configured with simple/one way TLS, the agent will apply the new proxy and CA, but keep the old certificate, certificate key and key passphrase/key passphrase path. Letting the agent on an inconsistent state. Such state might even lead to a failure to start if the old certificate-related configurations are files and the files are removed. If it happens, on start up, the agent will try to load those files and fail, preventing the agent from starting.
The culprit is
elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go
Lines 201 to 264 in 0580e53
which handles the proxy, TLS Certificate and the TLS CA configurations as different entities, when they should be treated as one.
In other words, if either change, the proxy or the TLS certificate or the CAs, they all should be replaced. If there is any change in policyConfig.Transport, apply the whole new Transport. Right now as proxy, TLS certificate and CA are handled separately, if the CLI defines a proxy with mTLS and the policy has another proxy with one way TLS, the result config will be a mix of both instead of only the new proxy with one way TLS. The expected result is to have the new proxy URL+headers and the CAs, but the certificate and certificate key must be cleared.
For confirmed bugs, please report:
inspect
commandThe text was updated successfully, but these errors were encountered: