Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for untrusted CAs in datanode remote reindex migration #19775

Merged
merged 16 commits into from
Jul 4, 2024

Conversation

todvora
Copy link
Contributor

@todvora todvora commented Jun 28, 2024

Description

This PR is adding support for untrusted certificate authorities during the remote reindex migration to the datanode. The connection check step is now reporting that none of our trust managers is trusting the remote host. Then, user can ✔️ the trust unknown certificates checkbox. Connection check is then using a trust manager that accepts all certificates and reports unknown. These unknown certificates are then transported, together with the allowlist value, to the datanode. Datanode takes these certificates and adds them to its truststore. The truststore is regenerated during each startup, so these certificates will disappear with the next process restart and won't stay there forever.

Motivation and Context

Fixes #19759

How Has This Been Tested?

Manually, added unit tests.

Screenshots (if appropriate):

image

image

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (non-breaking change)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.

@todvora todvora marked this pull request as ready for review June 28, 2024 08:44
Copy link
Contributor

@gally47 gally47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontend LGTM. Thanks for working on this.

Copy link
Contributor

@moesterheld moesterheld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, works and lgtm.
However, I still see a little flaw which needs more info for the user or she will trip over it (like i did)

If you have the CA used by OS in the truststore used by Graylog, provision the data nodes and restart Graylog without disabling the truststore, the migration will think it is legit and will not add it to the data node's truststore.
For the time being, I would add a comment to the restart screen of the migration to disable the truststore.

javaOpts.add("-Xmx%s".formatted(opensearchSecurityConfiguration.getOpensearchHeap()));

opensearchSecurityConfiguration.getTruststore().ifPresent(truststore -> {
javaOpts.add("-Djavax.net.ssl.trustStore=" + truststore.location().toAbsolutePath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without having tested it, I am not sure if OS will be able to access e.g. a s3 bucket for data tiering if overriding the default trust store

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me see if we can merge the default truststore with our and provide a complete set of certificates to the opensearch.

@todvora
Copy link
Contributor Author

todvora commented Jul 3, 2024

Tested, works and lgtm. However, I still see a little flaw which needs more info for the user or she will trip over it (like i did)

If you have the CA used by OS in the truststore used by Graylog, provision the data nodes and restart Graylog without disabling the truststore, the migration will think it is legit and will not add it to the data node's truststore. For the time being, I would add a comment to the restart screen of the migration to disable the truststore.

To mitigate that, we could always propagate the certificate of the remote opensearch. Automatically if graylog trusts it, with confirmation if the graylog doesn't trust it.

Other, more advanced approach would be to let datanodes run the connection check and indices discovery, ideally each node once. This would provide even safer verification that the communication datanode<->remote_opensearch is working fine.

@moesterheld
Copy link
Contributor

Tested, works and lgtm. However, I still see a little flaw which needs more info for the user or she will trip over it (like i did)
If you have the CA used by OS in the truststore used by Graylog, provision the data nodes and restart Graylog without disabling the truststore, the migration will think it is legit and will not add it to the data node's truststore. For the time being, I would add a comment to the restart screen of the migration to disable the truststore.

To mitigate that, we could always propagate the certificate of the remote opensearch. Automatically if graylog trusts it, with confirmation if the graylog doesn't trust it.

Other, more advanced approach would be to let datanodes run the connection check and indices discovery, ideally each node once. This would provide even safer verification that the communication datanode<->remote_opensearch is working fine.

I would implement the automatic propagation. Whilst the advanced approach sounds good, it raises more questions on error handling for individual failing nodes and so on.

Copy link
Contributor

@moesterheld moesterheld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now. thank you for adding this feature.
tested with new self-signed CA for data node and existing, uploaded CA

@moesterheld moesterheld merged commit b6cb2c1 into master Jul 4, 2024
6 checks passed
@moesterheld moesterheld deleted the feature/trust-unknown-ca-remote-reindex branch July 4, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Data node does not start up successfully after importing custom CA.
3 participants