Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Replace Java Security Manager (JSM) #1687

Open
1 task done
reta opened this issue Dec 9, 2021 · 93 comments
Open
1 task done

[RFC] Replace Java Security Manager (JSM) #1687

reta opened this issue Dec 9, 2021 · 93 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Security Project-wide roadmap label security Anything security related v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@reta
Copy link
Collaborator

reta commented Dec 9, 2021

Is your feature request related to a problem? Please describe.
It has been announced a while ago that SecurityManager is going to be phased out from the JDK. The first step, the deprecation of the SecurityManager (JEP-411), has been landed in JDK 17 and issues the following warnings on OpenSearch builds or server startup:

WARNING: System::setSecurityManager will be removed in a future release

The JDK 18 pushes it even further and now fails on startup (see please https://bugs.openjdk.java.net/browse/JDK-8270380), running OpenSearch builds or server on JDK 18 EA fails with:

Caused by: java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
	at java.base/java.lang.System.setSecurityManager(System.java:416)

It now requires JVM command line option to enable it explicitly using (see please [1]):

-Djava.security.manager=allow 

Describe the solution you'd like
There is no alternative or replacement for the SecurityManager (to understand why, Project Loom is to "blame"), see please [2]. One of the options is to just drop it, it sounds risky but combined with Plugin Sandbox (see please [3], [4]) it may sounds like a viable option. Other options include (but not limited to): bytecode instrumentation, java agent, custom classloader.

Describe alternatives you've considered
We could keep it as long as we can, but once removed from the JDK, it will be a problem.

Additional context
The upcoming JDK-24 release disables SecurityManager permanently [6].
See please links.

[1] https://inside.java/2021/12/06/quality-heads-up/
[2] https://inside.java/2021/04/23/security-and-sandboxing-post-securitymanager/
[3] #1572
[4] #1422
[5] A possible JEP to replace SecurityManager after JEP 411
[6] openjdk/jdk#21498

@reta reta added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 9, 2021
@dblock
Copy link
Member

dblock commented Dec 9, 2021

@nknize suggested we remove security manager in 2.0, labelling issue as such - once we have agreed here on what to do for this issue let's open a campaign parent issue in https://github.com/opensearch-project/opensearch-plugins/

@dblock dblock added v2.0.0 Version 2.0.0 and removed untriaged labels Dec 9, 2021
@reta
Copy link
Collaborator Author

reta commented Dec 9, 2021

@dblock would you mind if I submit a small patch for 1.3.x+ so it could be run on JDK 18? Thank you

PS: To clarify why, JDK 18 is scheduled to be released in March, right around 1.4.x (planned) release, I suspect a number of people may give it a try. The change is only adding the command line property, non breaking.

@dblock
Copy link
Member

dblock commented Dec 9, 2021

@dblock would you mind if I submit a small patch for 1.3.x+ so it could be run on JDK 18? Thank you

PS: To clarify why, JDK 18 is scheduled to be released in March, right around 1.4.x (planned) release, I suspect a number of people may give it a try. The change is only adding the command line property, non breaking.

I'm A-OK with anything non-breaking on 1.x.

@nknize
Copy link
Collaborator

nknize commented Dec 9, 2021

You mean something like adding support to disable the security manager via -Djava.security.manager=disable? (EDIT: I should've read past the first line :) )

I suspect tests will blow up since the test infrastructure leverages a custom SecurityManger via SecureSM. That's going to be more impactful. I'd love some thoughts from @rmuir or @uschindler on this as they are much closer to the JDK security bits than I.

@rmuir
Copy link
Contributor

rmuir commented Dec 10, 2021

I think the issue is written up correctly. You'll want to set -Djava.security.manager=allow from startup scripts (e.g. .bat/.sh), and from gradle when running tests? Otherwise System.setSecurityManager() will fail.

Lucene uses a custom security manager too, no issues on JDK18. we just initialize it differently than opensearch, right at JVM startup time: -Djava.security.manager=org.apache.lucene.util.TestSecurityManager.

But in your case here, it is a little different because system starts up with no security manager, then parses some config files and maybe does a few evil things on startup, then it installs security manager via System.setSecurityManager(). That's the difference, the deferred initialization. So now for JDK18 you have to set "allow" property for that call to not fail.

@rmuir
Copy link
Contributor

rmuir commented Dec 10, 2021

Separately, as far as alternatives, I can suggest a few things:

  1. Keep the SystemCallFilter. This is unrelated to security manager and will stop RCE dead in its tracks, as it disables fork()/exec() etc completely in an irreversible way.
  2. Look into enhancing the systemd unit to compensate. You can do a lot here, such as allow/block lists of filesystem paths, and more. Recommended introduction. Especially file paths would be great, if you have a directory traversal vulnerability, it is way better to fail with a filesystem error than to transfer some private files. But in addition to file paths, you can also do fancy stuff such as system-call filtering (except for fork/exec which is why you still need to keep part 1), capability drops, etc.
  3. consider hardening Docker environment too. current entrypoint just runs the shell script, maybe it could instead use the systemd unit, to also benefit from work already done above.
  4. adjust existing classloader filtering: example. The filtering-classloader currently integrates with security manager, just as a convenient way to provide a list of allowable classes, but it doesn't have to work this way. It can be changed to get its list of allowed classes some other way, and then things like scripting languages at least keep that protection.

I don't recommend directly going the LSM route (AppArmor, SELinux, etc). There's a lot of complexity to those, and its so system-specific which if any are even available. I'd start with systemd which is basically universal now on linux systems, and it gets you the biggest wins anyway (e.g. filtering filesystem and so on).

@rmuir
Copy link
Contributor

rmuir commented Dec 10, 2021

Another win for stuff like ingest-attachment would be to just run the tika server (separate service/container) and have this plugin call out to it with a REST call. IMO it would be better security for using tika and they provide such a server these days. Then the tika could run in its own stricter separate sandbox.

but that strategy won't work for all the code: There's no one-size/fits-all solution. For example, things like analysis modules/plugins are extremely performance sensitive, and really need to just be passed to IndexWriter. At the same time, these plugins have less security risk (compared to e.g. Tika or scripting languages), so it's not a huge deal: they are just exposing lucene analyzers :)

@reta
Copy link
Collaborator Author

reta commented Dec 10, 2021

Thank you very much, @rmuir

I think the issue is written up correctly. You'll want to set -Djava.security.manager=allow from startup scripts (e.g. .bat/.sh), and from gradle when running tests? Otherwise System.setSecurityManager() will fail.

That is right.

@rmuir
Copy link
Contributor

rmuir commented Dec 14, 2021

I've also made my opinion loudly clear on twitter that removing SecurityManager without replacement is a bad idea for java right now. At least providing a "replacement" first (ideally enabled by default), to help protect server-side apps against the worst vulnerabilities, is really needed. Java is filled with security landmines.

Doubt anything will change on the java side, but I tried. I don't have the resources/energy to write up JEP proposals or anything to try to make real change here though, sorry.

@reta
Copy link
Collaborator Author

reta commented Dec 14, 2021

Thanks @rmuir , I think the large part with respect to "what the replacement should be" is still unknown, as it is dictated by Project Loom that is not there yet. But I do 💯 agree on the point: removing SecurityManager without replacement is a bad idea.

@rmuir
Copy link
Contributor

rmuir commented Dec 15, 2021

if you think of the entire internet (not just opensearch), i really do feel that something similar to the openbsd pledge() api would be at least a minimal replacement. process-wide: drop permissions to fork/exec (RCE), maybe drop network connect() permissions to hosts you don't need, maybe drop permissions to file paths you don't need. In many cases, perhaps the OS can enforce the functionality, in other cases, maybe java needs to do it.

but there's also the separate problem that java includes insecure functionality like JDNI ("landmines"), by default. Besides sandboxing, we need to get good secure defaults here and disable dangerous crap by default. it is a multi-pronged approach.

@anasalkouz anasalkouz added v3.0.0 Issues and PRs related to version 3.0.0 and removed v2.0.0 Version 2.0.0 labels Apr 12, 2022
@Pallavi-AWS
Copy link
Member

Pallavi-AWS commented Apr 20, 2022

Do we have a decision on whether OpenSearch will deprecate SecurityManager in a future release or will command line option be used? If it will be deprecated, will there be a replacement? @dblock @nknize @rmuir. thanks,

@reta
Copy link
Collaborator Author

reta commented Apr 20, 2022

@Pallavi-AWS the recent (one of many) discussions on OpenJDK mailing list hint there won't be replacements for SecurityManager (very likely, at least) as well as there won't be suitable mechanisms provided for implementing your own. For JDK-18, we explicitly allow SecurityManager but there is no official decision being made on deprecation since no replacement is available.

[1] https://mail.openjdk.java.net/pipermail/security-dev/2022-April/029643.html

@rmuir
Copy link
Contributor

rmuir commented Apr 20, 2022

i recommend to keep using it until it completely stops working. why would you voluntarily disable a security feature unless you have to?

@nknize
Copy link
Collaborator

nknize commented Apr 20, 2022

Do we have a decision on whether OpenSearch will deprecate SecurityManager

It's already deprecated in the jdk and can be found in the build logs: WARNING: System::setSecurityManager will be removed in a future release.

will there be a replacement?

This is still being worked and there are already some great suggestions on this issue. In the meantime, we planned to keep using it until it stops working and will converge on a plan before upgrading to a jdk that removes it completely.

peternied added a commit to peternied/security that referenced this issue May 6, 2022
Use of the SecurityManager and AccessController have been deprecated and
will be removed in java versions after 17.  While this is an issue its
also one that will take a concerted effort to resolve.  These warning
messages making discovering build errors and other warnings more
difficult; hence adding this supression logic.

For tracking the effort to replace these components look into opensearch-project/OpenSearch#1687

Signed-off-by: Peter Nied <[email protected]>
@dblock dblock changed the title [RFC] Consider alternatives to SecurityManager moving forward [RFC] Consider alternatives to SecurityManager (JSM) moving forward Nov 30, 2022
@dblock dblock changed the title [RFC] Consider alternatives to SecurityManager (JSM) moving forward [RFC] Remove Java Security Manager (JSM) Nov 30, 2022
@dblock dblock changed the title [RFC] Remove Java Security Manager (JSM) [RFC] Replace Java Security Manager (JSM) Nov 30, 2022
@kumargu
Copy link
Contributor

kumargu commented Nov 19, 2024

@reta just an update from my conversation with GraalVM folks on their slack channel. Sandboxing in GraalVM is not yet supported for JAVA. It on their roadmap, but we don't have any dates when/if it would be delivered.

slack discussion (GraalVM public channel): https://graalvm.slack.com/archives/CPSD12R71/p1731769241953729

@pfirmstone
Copy link

In case anyone is wondering:

SM API Compatibility across all Java Platforms:

We can no longer call System::getSecurityManager or System::setSecurityManager, many permission checks call System::getSecurityManager, but don't have to:

SecurityManager security = System.getSecurityManager();
if (security != null) {
    security.checkPermission(new RuntimePermission("closeClassLoader"));
}

Use checkGuard instead:

new RuntimePermission("closeClassLoader").checkGuard(null);
Alternatively save the new permission to a static field:

private static Guard CLOSE_CLASS_LOADER = new RuntimePermission("closeClassLoader");
Then call:

CLOSE_CLASS_LOADER.checkGuard(null);

Continue using AccessController::doPrivileged and Subject::doAs methods.

Use -Djava.security.manager=default to set a SecurityManager on supported platforms.

This will allow your software to support all Java platforms.

@reta
Copy link
Collaborator Author

reta commented Nov 27, 2024

Thanks @pfirmstone very correct (and the same applies to Policy), so we are exploring a number of options under [1]. I've just submitted the POC for Java agent instrumentation [2]. The custom class loader in the next one in list of exploration, thank you for update.

[1] #16634
[2] #16731

@pfirmstone
Copy link

One possible strategy might be to update OpenSearch to provide binary compatibility with Java platforms prior to and following 24, while security restoration options are explored, with the caveat that anyone running without SM do so at their own risk, this way, developers can commence testing on Java 24, with a view to support readiness at some later point once new security mechanisms are in place.

@sam-herman
Copy link
Contributor

Hi folks, I think one follow up I would like to have is whether there is any point in maintaining the effort with the Java SM. And perhaps it’s better to just remove it before upgrading to future Java versions.
As mentioned in the comments, as we were operating the service I haven’t seen any value from it, since when most plugins are enabled the policy is pretty much permissive in a way that doesn’t really add on to any security.
OS ACLs and Network cage where the underlying true protections and the SM was just a cumbersome thing that added complexity and overhead in code.
Also I would like to bring to attention the note from the community for deprecating it:

The Security Manager cannot address 19 of the 25 most dangerous issues identified by industry leaders in 2020, so issues such as XML external entity reference (XXE) injection and improper input validation have required direct countermeasures in the Java class libraries.

They are also making additional important points, but this one above is aligned with out experience as well. As we enable plugins, our main concerns where really around these areas described above, which the SM didn’t help prevent in anyway.
https://openjdk.org/jeps/411

@pfirmstone
Copy link

I'm sure you mean well, it's always good to explore options and try to see consider other perspectives, in this case Opensearch is attempting to address the dangerous issues SM did address, and have been investigating all options, including using agents as per OpenJDK advice. Even if there was only one dangerous issue, that would be justification enough. OpenJDK just doesn't want the expense to maintain a feature that's not commonly used and are delegating that burden back onto developers. Criticisms in JEP411 apply to implementation code in OpenJDK, but many of those issues are easily addressed and had been addressed outside of OpenJDK for a long time. Perhaps if 20 years ago, we had good tooling to manage policy, things would be different today...

I've been working on making significant improvements to SM, to increase the number of vulnerabilities it can intercept:

  1. Principle of Least Privilege Policy Generation Tool to generate policy files -Djava.security.manager=polpAudit This addresses the issue of permissive policies and simplifies policy creation and maintenance.
  2. LoadClassPermission - allows policy to restrict loading untrusted code, eg only allow loading of signed jar files to prevent code injection attacks.
  3. SerialObjectPermission - whitelists serialized classes, allows policy to prohibit serialization (note this won't protect against flaws in java.base classes, if only privileged code is on the stack, however serial filters can do that). This highlights a design issue with Serialization, rather than SecurityManager though.
  4. Removed static permissions granted by ClassLoader's, eg URL's passed to URLClassLoader that allowed for injection attacks using URL Strings. Now controlled by policy, unauthorized URL's are intercepted.
  5. System::setSecurityManager throws exception when sm parameter argument is null. Historically, gadget attacks targeted privileged context to set SecurityManager null, disabling it. Now once an SM has been enabled, it cannot be set null, it cannot easily be replaced by injected code as that would make the context unprivileged.
  6. By removing AllPermission grants from <<JAVA_HOME>>/lib/security/java.policy the size of the trusted platform is reduced to java.base.
  7. High scaling policy and non-blocking security manager cache, to prevent repeated permission checks, to address performance issues, now there is no limit to the number of domains on the call stack and policy files can be thousands of lines long without issue.

Discussion on OpenJDK lists revealed that Oracle company policy didn't allow public collaboration on security issues, however OpenJDK had no objection to the community maintaining it, which is what I'm doing, although obtaining a TCK license doesn't appear likely, unless one of the existing licensees are willing to assist.

@pfirmstone
Copy link

Actually, what's interesting, since OpenJDK removed Authorization, it's up 6 points from 24 to 18, improper privilege management is up 7 from 22 to 15 and code injection is up 12 points from 23 to 11, exposure of sensitive information to an unauthorized actor is up 13, from 30 to 17.

https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html

@rmuir
Copy link
Contributor

rmuir commented Dec 13, 2024

As mentioned in the comments, as we were operating the service I haven’t seen any value from it

Really funny, since base test class here OpenSearchTestCase subclasses LuceneTestCase and uses randomized-runner and test setup is similar. When you have thousands of tests you need such isolation just to maintain test suite.

SecurityManager stops the problems before you see them at "operating the service" and allows your tests to safely run in parallel without stomping on each other's files, binding to each other's ports, etc. Fails on such problems before they get merged. Fails on shenanigans from third-party libraries at test -time before they get merged and cause chaos in CI or maybe elsewhere.

I remember how much "fun" CI builds were before this was there: tests doing exactly these things and meddling with each other. You can't even fix the tests as fast as developers add new ones doing new crazy things. And developers might use Windows or MacOS, not some AWS environment with no multicast, etc. Its important to fail on them early in development lifecycle (e.g. on their machine), and to be able to reproduce failures from CI.

If you are a security guy looking at this like "oh I've never seen this thing stop me from getting owned", you are looking at the problem wrong.

Sure, security manager sucks, security guys dont understand it, developers don't understand it, its this complex beast in no-man's land. But the guarantees that it gives in the test process alone are not easily replaced.

@rmuir
Copy link
Contributor

rmuir commented Dec 13, 2024

Link to very simple policy used by lucene to keep 16000+ tests in order: https://github.com/apache/lucene/blob/main/gradle/testing/randomization/policies/tests.policy

Similar stuff happening here in opensearch, the setup is just more complex here, so going thru that much simpler lucene tests policy file is easy to reason about, when thinking about sandboxing test suite and preventing trouble from entering the codebase in the first place:

You can consider using systemd sandboxing for test VM execution, it may help contain the filesystem at least. might not be so terrible now that IDEs have widespread devcontainer support. But you have to implement such devcontainer setup and force everyone to use it and adjust gradle test execution to run each jvm with separate namespaces and so on.

Maybe even with some fancy seccomp setup you can prevent the tests from binding to anything except localhost ephemeral ports, too.

But fancy devcontainer setup still won't solve problems such as preventing tests from messing with things like environment variables and system properties, these have side effects for other tests, for stuff like that, security manager is good. If you don't stop it, developers will do it.

@kumargu
Copy link
Contributor

kumargu commented Dec 13, 2024

I think one follow up I would like to have is whether there is any point in maintaining the effort with the Java SM

Security manager is not flawed for what it does, it's flawed for how it does its job. If you look at GraalVM (Oracle) polygot sandboxing policies, the concepts are similar to what security manager does, but it does it in a more modern and cleaner way.

The Security Manager cannot address 19 of the 25 most dangerous issues identified by industry leaders in 2020,

I can't remember of any recent attack as nasty as the Log4j remote code execution vulnerability. Opensearch was protected from that attack; thanks to security manager.

Security is built in layers, no single protection mechanism can fully protect against all sorts attack vectors. A simple example is: Is IAM sufficient for security in cloud, the answer is big NO. We (Opensearch) may not be able to find a full alternative, nor do we really want to find a full replacement (to keep some things simple); but we are clear that we need to strengthen Opensearch in lack of SM.

@pfirmstone
Copy link

We use SM for Authorization, but we don't just use it for code as JEP411 authors assume, we use it to grant permission to users using specific code, often the code or user alone doesn't have the permission, so the user can't use the permission with foreign code and the code doesn't have permission with a different user. Parsed data comes from users, who should be authenticated. OpenJDK itself bases permissions around code and often uses AllPermission, end points often don't run with the authenticated user's Subject, eg RMI doesn't. It is unfortunate that SM has roots that go deep into the JDK, support for permission is also implemented c++ code, not just Java code.
openjdk/jdk@59d4e28

Currently OpenJDK doesn't prevent loading of untrusted code and has no mechanism to do so, anyone who can find a way to inject a URL into string that's passed to URLClassLoader will be capable of injecting code. This will be blamed on the code that didn't parse input properly, also there's a lot of library code and no one audits everything. OpenJDK developers are assuming that server code is static, audited and external data input is properly checked during parsing. This assumption eliminates the possibility of using dynamic class loading safely.

https://www.exploit-db.com/papers/45517

One lesson from history is, attackers use privileged context to set SecurityManager null to disable it, this was the last step in many gadget chain attacks. This could have been easily addressed simply by throwing an IllegalArgumentException in Security::setSecurityManager if sm is null. Injection attacks always focused on obtaining privileged context, so we limit privileged context, but now OpenJDK has made everything privileged context, it's going to be much harder to defend against gadget attacks.

Historically, attacks on Java's sandbox have done a lot of good in hardening Java, it's an arms race, now OpenJDK has given up that arms race, they've lost the client market, thanks to flaws in Java Serialization's design, this occurred during Sun Microsystems final days, prior to Oracle when funding was limited. I reimplemented Java Serialization over a decade ago, when I needed to secure it. I had to give up circular object graphs, used a standard constructor signature and isolated parameters from each class within their inheritance hierarchy, it reads ahead to ensure parameter types are correct before instantiation and has limit checks to defend against billion laugh style attacks. When I presented it to OpenJDK and offered to donate it...

https://github.com/pfirmstone/JGDMS/wiki#atomic-serialization-example

I think too much emphasis was placed on backward compatibility over security and too little too late was done to fix java Serialization, it's the gift the keeps giving.

Jdk-with-authorization is more than just preserving SecurityManager, it's about improving security, making it simpler to reason about and taking advantage of the historical security hardening developed over decades, while taking advantage of modern features in recent Java releases.

Recently I refactored Permission for immutability and PermissionCollection classes to use generics. I addressed race conditions in Permission implementations as their specification requires them to be immutable and threadsafe, but many weren't. Permission and PermissionCollection's are no longer Serializable, changes in implementation and support for old serial form meant the implantations couldn't be immutable and support Serialization. OpenJDK chose to sacrifice the safety and security provided by immutability and thread safety, to preserve backward compatibility with Serialization. I suspect this is why the default SecurityManager and Policy provider didn't perform, had OpenJDK developers made them non-blocking and performant, they would have had to deal with the race conditions. In JGDMS we called methods that initialized fields in Permission instances before publishing them to other threads.

@kumargu
Copy link
Contributor

kumargu commented Dec 28, 2024

Summarizing our next steps and plan of action for 3.0 release.

Goal

We try to answer below meta questions —

  1. Do we need a replacement for both open-source distribution?
  2. What are the known alternatives of security manager?

Ideally, we want the latest and greatest version of JAVA to be used in the Opensearch. We would like to use JDK-24 for 3.0 release of Opensearch expected to land in April 2025. Based on the known alternatives and their protection domain, we will to take a call what options are sufficient to place us in a confident state to live without security manager.

Do we need a replacement?

The open-source distribution heavily depends on security manager acting as a first line of security defense. Hence we must find a replacement for security manager. Again, we will not look for a full replacement. Until we are convinced with the new available security posture; we cannot upgrade Opensearch core and Plugins to JDK-24 — obviously we don’t want to remain pinned an older JDK version while a new (better) version is available.

Requirements

Before diving into alternatives to the Security manager, let’s first examine the types of protections it currently provides in OpenSearch. These will serve as the baseline requirements for identifying suitable alternatives.
We will categorize these requirements by priority:

  • Priority A: These are critical requirements that must be addressed by one or more alternative solutions.
  • Lower Priorities: These are less critical and considered "nice-to-have,"

Priority A

  1. Controlled read, write, and execute permissions for specific files and directories. restrict hard or symbolic link creation
  2. Controlled access to specific IPs, ports, or protocols.
  3. Disallow system calls (some examples )
    1. subprocess creation by plugins,
    2. reboot
    3. system exit
  4. Disallow native access
  5. Controlled access to system properties and environment variables.
  6. Controlled Class Loading and Reflection:
    1. prevented unauthorized access to private fields or methods
    2. controlled dynamic class loading. Use pre-approved class loaders for dynamic class loading.
    3. restricted use of reflection
    4. Disallows implementation of arbitrary host interface
  7. Controlled ability to load and access key stores containing private keys and certificates.
  8. Restricted operations for creating or signing crypto keys.

Priority B

(not a blocker for 3.0)

  1. Reduce or completely avoid shared memory between plugins and core.
  2. Restricts the maximum number of stack frames that can be pushed on the stack by plugins, to prevent against unbounded recursions
  3. Limits the size of the output that plugins code writes to standard output / error.
  4. Prevent plugins to monitor SSL sessions established with SSL peers
  5. Prevent plugins invalidate sessions which may slow down performance.

Alternatives

1 Systemd sandboxing

[GH issue: https://github.com//issues/16729]

Systemd provides security features that can be used to isolate processes from each other as well as from the underlying operating system. In other words it allow you to setup privilege separation between the different components of the OS.

Today, there already exists a systemd setup which you can optionally use to start you Opensearch process. Moving ahead, we will suggest starting your Opensearch process with systemd as the most preferred and secure way. Most importantly it requires no infrastructure to setup on linux systems and hence distribution and usage becomes really useful.

While there are whole lot of configs out there to build a highly secure sanboxed environment we will discuss the ones which interests our requirements and their usages (for clarity). Infact some of the configs available could bring in more protection than security-manager.

  1. File System Restrictions:
    1. ReadOnlyDirectories=, InaccessiblePaths=, ReadWritePaths= Grants a service specific read-write access to certain paths, while making the rest of the file system read-only or inaccessible.
  2. Network Access Control: Restrict and control network communication via allowlisting/ deny-listing IP address over which process can communicate. Similarly allow the sockers to which a process can bind itself, f.e Opensearch core can be allowed to bind to (9200, 9200..)
  3. System Call Filtering: Systemd uses seccomp to implement filtering by syscalls. Systemd uses this to allow or blocks specific system calls using seccomp filters heavily reducing the attack surface of the exposed kernel.
  4. Capability Bounding: Allows to limit in a relatively fine grained fashion which kernel capabilities a service once started retains.
    1. f.e CapabilityBoundingSet=CAP_CHOWN,CAP_KILL. This would allow the service to only use the "CAP_CHOWN" (change ownership) and "CAP_KILL" (terminate processes) capabilities.
  5. Controlled ability to load and access key stores containing private keys and certificates

Overall this option does a great job to secure the Opensearch process against common side effects of vulnerabilities and untrusted code disrupting the OpenSearch process.

Limitations —

  1. In the current model, both OpenSearch core and its plugins execute under a single process. This design introduces a bit of concern: if a plugin, such as the security plugin, requires elevated privileges— to access a trust store—those elevated permissions must be configured at the global systemd service level—hence these permissions are applied uniformly across all plugins. A more secure and ideal approach would involve implementing finegrained, plugin-specific systemd configurations to enforce the principle of least privilege which we c
  2. Controlled Class Loading and access via reflection - Today, the classloader filtering currently integrates with security manager, just as a convenient way to provide a list of allowable classes, but it doesn't have to work this way. It can be changed to get its list of allowed classes through a new implementation of custom classloader. This is generally a good way anyways to abstract out this logic outside of security-manager.
  3. Disallow Native access
  4. No direct replacement in Windows.

2 GraalVM sandboxing

[GH issue :https://github.com//issues/16861]

Oracle GraalVM is a high-performance JDK that enhances Java and JVM-based applications through its Ahead-Of-Time (AOT) compiler.

Beyond performance improvements, GraalVM also offers a sandboxing mechanism, which is particularly relevant for securely executing guest code within a host application.
The sandboxing feature establishes an isolation boundary between host and guest code, comparable to the separation between user mode and kernel mode in operating systems. In this context:

  • Host: OpenSearch Core.
  • Guest: OpenSearch Plugins.

This isolation ensures that guest code executes in a restricted and controlled environment, separate from the host's privileges. However, as of now, GraalVM supports JavaScript as a guest language, with full support for Java as a guest language is WIP refer [GR-49729] [Espresso] Support running without native access]

While full guest Java support is still under development, GraalVM’s existing features (Expresso) can be used to:

  1. Isolate and Execute Legacy Code: GraalVM allows running an older JVM version (guest JVM) in a sandboxed environment while the host JVM operates with a newer version.
  2. No Compilation Target Changes: Both the OpenSearch Core and plugins can continue to run as JIT-compiled code without modifications to their compilation targets.

The overall idea is to spawn a Guest GraalVM JVM with security manager enabled and guest and host share their objects via low level GraalVM interoperability API. Next lets’s see some high level steps to achieve this. You can also refer the PoC for a better understanding #16863

Proposal

Host Environment:

  • OpenSearch Core and trusted components (e.g., Lucene and trusted plugins) run on a modern JVM version supported by GraalVM (e.g., JDK 24).
  • The Security Manager is disabled in this environment, as it is deprecated in newer Java versions.

Guest Environment:

  • Non-trusted components (e.g., plugins) run on an older JDK version (up to JDK 23) where the Security Manager is still available.
  • The Security Manager is configured with the same security policies currently used by OpenSearch Core.

A GraalVM Engine :

  • A GraalVM Engine is initialized to provide runtime support for interaction between host and guest environments. This engine facilitates secure communication using GraalVM’s low-level APIs.
  • The guest environment runs with the Security Manager enabled, ensuring that untrusted code is executed in a controlled context with appropriate restrictions.

Limitations

  1. No support of GraalVM/Espresso on Windows.
  2. Slow boot-up time of the spawned JVM (but that ideally one time cost to pay)
  3. Debugging is hard, most error are very low level GraalVM implementations details.
  4. Communications between host and spawned JVM/context is currently very limited. At least one major bug fix is known but there has not been a full confirmation if that would be picked in upcoming Jan release. This majorly blocks us to further run our experiments. We have however requested the GraalVM team to try-pick get the bug fix available in the Jan version release.

Performance — While we don’t expect any performance impact, it is yet to be benchmarked and published.

Take-aways — Overall this approach allows to move forward with Java versions (JDK-24 and beyond) while preserving usage of security manager as it is used today.

  1. While this looks hacky, this area of work will setup the ground work to fully utilise the sandboxing capabilities of GraalVM in future, allowing to build isolation boundaries between trusted and untrusted with fine-grained sandboxing polices.
  2. No wonder this will also allow plugin authors to write plugins in different languages, such as JS, Rust etc.

3 Plugin level systemd

[GH issue: https://github.com//issues/16753]

Earlier we proposed to strengthen the Opensearch core security model via additional systemd configs such as limiting access to sockets and files. An advancement / extension of such sandboxing would be to run (some) plugins as a separate systemd unit (aka separate process), each of it with its own restrictive systemd config . This is akin to security-manager having plugin level security policies. This will also allow some plugins to run with elevated privileges without elevating the privileges of Core. The overall idea would be to expose a secure REST server within Opensearch core where plugin ↔ core interactions will be over secure, fast, bidirectional IPC. Such as IPC could be over Unix domain sockets which is fast, lightweight and can be modelled to use POSIX permissions to lock down access to the file descriptor (FD).

This idea is an overlap of work being proposed as part of Project Extensions which is being currently halted for
a. ambiguity around the added performance impact from ser/de when running plugins outside of core and
b. a large chunk of work involved requiring a rewrite of plugins which are tightly coupled with core.

4 JDK fork (not preferred)

The idea is to maintain a fork of JDK preserving the security manager in JDK-24 and beyond.

However, this approach is not ideal, as it would introduce significant overhead in maintaining the fork, particularly in porting bug fixes and updates from the upstream JDK. This solution should only be considered as a last resort if none of the previously discussed alternatives prove to be viable.

Conclusion

Assuming 3.0 lands in April 2025 with JDK-24, we are left with around three months of room from to pick alternatives which makes us feel comfortable to live without security manager. While this doc discussed multiple overlapping alternative, not all of these alternatives might be needed to be implemented necessarily for the 3.0 release.

[1] Systemd sandboxing alone is very powerful and covers for a lot of what security manager already does today. It will protect Opensearch from most security risks. This will become our first line of defence. I would say, we are 90% covered with just [1]. Its a low hanging fruit and even if we are not able to ship 3.0 with JDK-24, we would still like to ship [1].

When it comes to [2] GraalVM sandboxing, it essentially means continuing usage of security manager even with JDK-24. The hardest part of the integration with GraalVM was already done by Andriy in his POC (#16861) and we would now assume that the integration could be delivered by March 2025.

Callouts for [2]:

[1] Plugins which are run in sandbox JVM, can only be upgraded to JDK-23. Once we have the full sandboxing available in Graal oracle/graal#10239, then these plugins can be upgraded to JDK-24 or beyond.

[2] Not all plugins actually need to be instantiated within the Graal based forked sandbox, plugins which are Tightly coupled with OpenSearch Core, Trusted or Performance-critical can continue to run in the host JVM without sandboxing on >=JDK-24.

We believe that [1] and [2] provide enough confidence to proceed with upgrading to JDK 24, with delivery expected by mid-March 2025. Once [2] evolves into a fully developed sandboxing environment (anticipated in Q2 2025), we plan to treat #1 and [2] collectively as a replacement for the Security Manager.

We are temporarily setting aside [3], as it represents a significant amount of work, and meeting the April deadline seems unlikely. If GraalVM sandboxing integration proves problematic (e.g., harder debugging, unexpected bugs, perf issue etc.) within our ecosystem, we will revisit [3]. However, GraalVM community is very supportive and it has been smooth working with them. On the other hand, if GraalVM integration aligns well with our needs, we may reconsider using Extensions on GraalVM. This presents a major potential advantage, making the risk of GraalVM integration worthwhile.

@reta
Copy link
Collaborator Author

reta commented Dec 29, 2024

Thanks @kumargu , I think the Java agent is also on the table, right? [1] Or it was excluded on purpose?

[1] #16731

@pfirmstone
Copy link

pfirmstone commented Dec 30, 2024

A few thoughts / questions:

Is there a way to avoid needing SecurityManager in the Graal guest environment?
If the guest environment is process isolated and that process can be restricted by systemd, then each plugin can be isolated within its own process. The problem then becomes one of establishing communications between the Host and Guest processes. I'm concerned that Serialization might be a requirement of communications between processes, or is this concern unfounded?

In JGDMS there's a declared @AtomicSerial API for serialization / deserialization, for use with any protocol, I was working on support for ASN.1, but halted work after JEP411, until a solution was found for SM. This API is hardened against gadget attacks by failure atomicity and provides utility methods for input validation.

JGDMS also has JERI (Jini Extensible Remote Invocation), which was designed by the people who designed RMI to address the pitfalls with RMI.

If someone wanted, these features could be copied from JGDMS (AL2.0 license), and stripped down to their bare minimum, to use for communications between Host and Guests. I can provide guidance on how it works.

As an aside, the fork of OpenJDK I'm currently maintaining with SM, contains significant performance enhancements and security improvements, if people would like to test and provide performance comparisons and feedback, that would be greatly appreciated. The maintenance cost has been less than expected and I've been able to make significant SM improvements in a short space of time. Whether I continue to maintain a fork is dependent on community interest and viability of other possible solutions.

Recent build artifacts based on fork of OpenJDK 25, master branch:

Linux x64: https://github.com/pfirmstone/jdk-with-authorization/actions/runs/12497991476/artifacts/2362229379
MacOS x64: https://github.com/pfirmstone/jdk-with-authorization/actions/runs/12497991476/artifacts/2362228554
Windows x64: https://github.com/pfirmstone/jdk-with-authorization/actions/runs/12497991476/artifacts/2362245599

There's also a OpenJDK 24 fork branch here:
https://github.com/pfirmstone/jdk-with-authorization/tree/jdk24-with-authorization-trunk

The use of a hybrid Graal Systemd solution is compelling. If the guest is to use encryption over network connections, I think that might need to be performed by the host, for the guest, as it's not safe for the guest to have access to encryption keys, etc. On second thoughts, maybe independent truststore/ keystore's could be provided for each guest?

@pfirmstone
Copy link

pfirmstone commented Dec 30, 2024

Just documenting my forking strategy here in case it has been misunderstood:

  1. Weekly merge of OpenJDK master, into jdk-with-authorization master that contains reversions of SM removal, tests are run manually following merging. Only minor changes are made to master copy, to address any merge conflicts or test failures. This is not intended for release. Weekends are quiet, not many commits occur over the weekend, I've found this is a good time to merge.
  2. Weekly merge of master copy into trunk.
  3. Trunk is the development branch.

There were a large number of merge conflicts during JEP 486, not unexpected.
Now that JEP 486 has completed, merge conflicts have been rare.
All merge conflicts are dealt with in the merge between OpenJDK master and master copy.
There are no merge conflicts from master copy into trunk.
Interestingly the OpenJDK team were maintaining permission checks right up until JEP 486.
Additional Permission's have been added to trunk.

Release branches follow the same strategy, so that all upstream fixes and patches are included with weekly merges.

Permission checks were like shotgun surgery, as they were spread throughout OpenJDK, it was a big job to remove them.

We have a discord channel if anyone wants to become involved, let me know.

The largest maintenance task isn't merging from upstream; it's looking at new JEP features and determining how they need to be protected by new permission checks.

Some recent fixes:
pfirmstone/jdk-with-authorization#44
pfirmstone/jdk-with-authorization#40
pfirmstone/jdk-with-authorization#41
pfirmstone/jdk-with-authorization#28
pfirmstone/jdk-with-authorization#22
pfirmstone/jdk-with-authorization#32
pfirmstone/jdk-with-authorization#5

@kumargu
Copy link
Contributor

kumargu commented Dec 30, 2024

Thanks @kumargu , I think the Java agent is also on the table, right? [1] Or it was excluded on purpose?

[1] #16731

I wanted to sync with you on the outcome of the PoC before including it here. I was not clear if the PoC was finally working end-to-end. Secondly, I wanted an opinion if we'd need it if we had the Graal integration.

@kumargu
Copy link
Contributor

kumargu commented Dec 30, 2024

@pfirmstone (going to answer some of your comments and will come back to others later)

Is there a way to avoid needing SecurityManager in the Graal guest environment?

this is a temporary hack. It won't be needed once oracle/graal#10239 is addressed.

I'm concerned that Serialization might be a requirement of communications between processes, or is this concern unfounded?

that's the biggest concern for in-proc communication between plugins and core (discussed as con in Option 3).

Just documenting my forking strategy here in case it has been misunderstood:

I don't think we/I misunderstood the intentions here. We understand the dedication and amount of work you have put in to get this working. The challenge with fork is not only maintainability. A. This is not a long term solution, if we have a long term solution (GraalVM), we would like to pursue it. B. Cloud providers (such as AWS) or other organizations consuming a fork has to be convinced of usage of forked JDK given Open JDK states that security manager is not the right tooling for securing Java applications (although we know how useful security manager is).

In general, we want to move away from what is deprecated and use more modern tools (if available). If an alternative is not available, we will stick with it. GraalVM usage with security manager is a small step to help us migrate to JDK-24. When JAVA sandboxing is available in GraalVM, we will remove usage of of security manager. That's the long term goal. That step is risky too, because GraalVM is very new, so we also don't want to overcommit and take baby steps.

@pfirmstone
Copy link

I'm concerned that Serialization might be a requirement of communications between processes, or is this concern unfounded?

that's the biggest concern for in-proc communication between plugins and core (discussed as con in Option 3).

I think I may know a solution for that, but it requires modification to suit your use case. Currently it depends on SecurityManager, for authentication and authorization. But I don't think you need encryption, authorization and authentication for inter-process communications, it implements a subset of Java serialization (using a common constructor signature), without support for circular object graphs (million laugh attacks), it has defensive mechanisms that expect periodical stream resets, array and stream size limits, it doesn't serialize collections, instead it uses serializers that serialize an unmodifiable copy (not entirely true as it is array based, so could be modified in stream) and has api tooling to assist developers to perform type and input validation, such as checking collection's contain the correct types before copying their contents to a new collection. The api also allows invariant checks between subclass and superclasses, prior to calling a superclass and each class in an object has its own namespace for constructor arguments.

https://github.com/pfirmstone/JGDMS/tree/trunk/JGDMS/jgdms-jeri
https://github.com/pfirmstone/JGDMS/tree/trunk/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/io

IMHO Java serialization vulnerabilities destroyed the client Java market. A lot more could have been done sooner to address it, but I think timing and limited resources had a lot to do with it.

SM is battle hardened, so I'm just basically leveraging that and addressing well documented published issues by security researchers (low hanging fruit). I have made some breaking changes, Permission's are no longer Serializable and it's no longer possible to set SM null (usually the last trick in a gadget attack), removed static permissions granted by code (prevents URL injection attacks) and reduced the size of the trusted platform to the java.base module. But it's also possibly an interim measure until something better comes along. It's also possible nothing better will come along, as security needs to be designed in at a language level, so it could become a long term interim measure. OpenJDK was very fast moving from deprecation to removal. It seems they've bet the farm on virtual threads, the asynchronous concurrency features hide valuable debugging information, so it makes sense they want to address that, however these aren't needed for high scalability, immutability, thread confinement, garbage collection, safe publication and NIO are more than sufficient for most, I suspect virtual threads will be a fizzer, I could be wrong, but I think they're trying to find a solution for a non-problem, but then there are some very promising, like the foreign function api, future possibilities such as reified generics. I still use primitive types, bit-shift operations etc, when I need performance and nothing else will cut it. Some of the tricks used in pooling threads in the past was to reduce their assigned memory, smaller object headers, there's plenty of good stuff in the pipeline.

@pfirmstone
Copy link

@kumargu I would like to see your efforts succeed.

@reta
Copy link
Collaborator Author

reta commented Dec 30, 2024

I wanted to sync with you on the outcome of the PoC before including it here. I was not clear if the PoC was finally working end-to-end. Secondly, I wanted an opinion if we'd need it if we had the Graal integration.

Yes, it is working end-to-end (for the socket connection as PoC), thanks @kumargu

@pfirmstone
Copy link

@kumargu It appears Graal doesn't use marshalling, it appears to be using memory access to java object structures...

@kumargu
Copy link
Contributor

kumargu commented Jan 3, 2025

@kumargu It appears Graal doesn't use marshalling, it appears to be using memory access to java object structures...

I think that is true, only if you use GraalVM building a native image. We are not going to use the native image, we just leverage sandboxing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Security Project-wide roadmap label security Anything security related v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: New
Development

No branches or pull requests