-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Follow idea of immutable /usr vs. mutable overrides in /etc #5
Comments
Sounds good to me. 👍 |
I'm uncomfortable with putting executables in /etc, and I strongly think users shouldn't reuse the same provider+agent name when modifying an agent, as it greatly complicates troubleshooting. The currently recommended approach for modifying resource agents is to create a new, custom provider under /usr/lib/ocf/resource.d. I could see extending the standard to allow providers in an alternate location, such as /usr/local, /opt, or /srv (followed by ocf/resource.d), or even allowing an OCF_RA_PATH environment variable. I'm not convinced it's a good idea though, as custom OCF scripts are not any more mutable than the commonly distributed ones. In production, few users are going to modify custom scripts directly; they are going to have a development environment, and then push changes to all production nodes (comparable to updating the resource-agents package). |
On 21/11/17 22:00 +0000, Ken Gaillot wrote:
I'm uncomfortable with putting executables in /etc,
There's a bunch of executable glue scripts already (including
/etc/rc.d/init.d ones for non-systemd systems), which is exactly
what resource agents are meant to be. I see no conflict here.
and I strongly think users shouldn't reuse the same provider+agent
name when modifying an agent, as it greatly complicates
troubleshooting.
Resource managers would need to identify the particular file clearly,
true. There perhaps, if the idea is deemed good enough, should also be
a provision in the specification wrt. explicit remapping of agent
specification to particular path, like SIGHUP signal sent to the
resource manager. Prior to that, it would keep initially figured path.
The currently recommended approach for modifying resource agents is
to create a new, custom provider under /usr/lib/ocf/resource.d.
I could see extending the standard to allow providers in an alternate
location, such as /usr/local, /opt, or /srv (followed by
ocf/resource.d), or even allowing an OCF_RA_PATH environment
variable. I'm not convinced it's a good idea though, as custom OCF
scripts are not any more mutable than the commonly distributed ones.
In production, few users are going to modify custom scripts
directly; they are going to have a development environment, and then
push changes to all production nodes (comparable to updating the
resource-agents package).
That's not the use case I had in mind, going that deep as to
also change the actual configuration.
Rather something like:
http://oss.clusterlabs.org/pipermail/users/2017-August/006303.html
Anyway, having bulk synchronization of /etc across the nodes can be
appealing (also for systemd unit files, that can likewise be employed
with a resource manager if there's a support).
…--
Jan (Poki)
|
I'd rather not. Doesn't the provider concept offer enough
flexibility? As Ken said, it would also be quite difficult to
figure out which RA is being run if the resource manager is
allowed to look at more than one place for the same resource
configuration.
|
And you can already make custom or similarly named directories in /usr/lib/ocf/resource.d/heartbeat to avoid clashing with the agents provided by the distro. |
On 22/11/17 14:27 +0000, Dejan Muhamedagic wrote:
I'd rather not. Doesn't the provider concept offer enough
flexibility? As Ken said, it would also be quite difficult to
figure out which RA is being run if the resource manager is
allowed to look at more than one place for the same resource
configuration.
Additional idea sketched above would make the flip only at defined
moments (initial start, being told to rescan the agent mapping). Not
at arbitrary points, which would indeed make the situation hard to
follow.
On 22/11/17 14:36 +0000, Oyvind Albrigtsen wrote:
And you can already make custom or similarly named directories in
/usr/lib/ocf/resource.d/heartbeat to avoid clashing with the agents
provided by the distro.
Naturally, but I thought it's clear that I anticipated it for a bit
different use cases by now.
…--
Jan (Poki)
|
On 22/11/17 17:11 +0100, Jan Pokorný wrote:
Additional idea sketched above would make the flip only at defined
moments (initial start, being told to rescan the agent mapping). Not
at arbitrary points, which would indeed make the situation hard to
follow.
On the other hand, let's not fall into the fallacy that current
situation is a breeze in the "which agent variant was run, exactly"
matter, at least with pacemaker in particular:
- respective agent files are not locked for the pacemaker's lifespan,
so can be edited anytime
- ditto agents are not copied to a private temporary location first
(or even copied into the memory to be executed from, which would
be doable for hashbang/non-binary executables)
- checksums of the agents are not logged/remembered+rechecked
(or ditto on timestamp comparison basis)
Which already makes it rather difficult to tell which variant of the
agent was run in any particular moment in the past unless you can
testify nothing has intervened (and even then it's not 100%).
So I don't see any remarkable regression, pros and cons summed
together IMHO yields a positive result here when the mentioned
additional idea of explicit rescans is mixed in.
…--
Jan (Poki)
|
The main benefit as I see it would be enabling the sysadmin to add their own agents on top of a read-only |
On 23/11/17 09:01 +0000, Kristoffer Grönlund wrote:
The main benefit as I see it would be enabling the sysadmin to add
their own agents on top of a read-only `/usr` file system delivered
by a transactional update mechanism.
The other practical value is that administrator would (one wants to
say, finally) gain power to defuse OCF-based resources that are not,
by any mean, desired in the projected cluster from the set of agents
that get installed unconditionally through the common distribution
channels, sometimes including ocf:heartbeat:anything, which may be
unsettling on its own:
http://lists.clusterlabs.org/pipermail/users/2016-January/002178.html
This is very similar to and directly inspired with systemd's masking
approach.
So when the cluster should only ever serve for minimalistic
httpd + virtual IP combo, the solution would be to run this
upon each install/update of resource-agents in a RPM-based distro:
```
# mkdir -p /etc/lib/ocf/resource.d/heartbeat
# rpm -ql resource-agents \
| grep '/usr/lib/ocf/resource.d/heartbeat/[^.].*' \
| grep -vE 'apache|IPaddr2' \
| sed "s|/usr|/etc|" | xargs -I{} echo ln -s /dev/null {}
```
For this to work harmonically, resource managers should further
realize zero size of the discovered agents like this and exclude
them from "try running" attempts (incl. at the system location,
indeed).
For pacemaker in particular and putting fence-agents aside
(preferrably, there would be a convergence towards OCF in some aspects,
plus the agents are separated in discrete subpackages in el7, giving
administrator at least some say to what's available), the only way to
run an unrestricted command from cluster configuration would then be
"lsb:<script>" variety of agent specification...
…--
Jan (Poki)
|
Thinking about that, |
I'm sorry, but I don't understand this argument at all. Why is the administrator trying to prevent the administrator from configuring resources? I also don't recall any actual argument for why the |
On 27/11/17 11:23 +0000, Kristoffer Grönlund wrote:
> The other practical value is that administrator would (one wants to
> say, finally) gain power to defuse OCF-based resources
I'm sorry, but I don't understand this argument at all. Why is the
administrator trying to prevent the administrator from configuring
resources?
Why to administratively prevent _selected_ (this needs stressing out)
OCF resources (out of all delivered en masse with resource-agents
project, for instance)?
Mostly to allow (on opt-in basis) one to follow least-privilege
principle with the intention to keep any kind of intrusion as limited
as possible, especially for static cluster deployments where additional
resource agents are irrelevant at any rate.
Needless to mention one such intrusion enabler from the recent
time:
http://oss.clusterlabs.org/pipermail/users/2016-November/004432.html
Sure, one might emulate something like that using acls:
```
<acls>
<acl_target id="bob">
<role id="admin"/>
</acl_target>
<acl_role id="admin">
<acl_permission id="admin-deny-1" kind="deny" xpath="//primitive[@Class='ocf'
and
@Provider='heartbeat'
and
@type!='apache'
and
@type!='IPaddr2']"/>
<acl_permission id="admin-write-1" kind="write" xpath="/cib/resources"/>
<acl_permission id="admin-read-1" kind="read" xpath="/cib"/>
</acl_role>
</acls>
```
But beside being rather clumsy, it doesn't cover the fully privileged users
(e.g. hacluster user) -- it cannot by design.
Does it answer your question?
I also don't recall any actual argument for why the `anything` agent
is problematic...
This one together with Stateful are perfectly fine for code-less
experimenting with how pacemaker works and kicking off custom agents,
but quite an antipattern for the production use where you want launcher
as tightly fitting as possible, getting the monitoring right, covering
the corner cases, etc. O:-)
Plus add the above security aspect into the equation. It's not
a security measure per se, but onion-like approach to security hardening
(just as SELinux is, for instance) makes sense when it doesn't impose
new pains.
Voluntary constraining of the agents' repertoire is IMHO one of the easy
wins on this front.
…--
Jan (Poki)
|
Yeah, I think I follow what you're saying. Of course the |
While there are common existing cases of executables under /etc, they are exceptions, not the rule. System administrators expect /etc to contain configuration, and executables to be located elsewhere, except in unusual cases. I believe this is recommended in the LSB, with good reason. An example is that resource agents do not necessarily need to be scripts, they can be compiled, but /etc is architecture-independent.
The main goal is whether enterprise support personnel can reasonably determine whether a particular agent is supported, not the exact agent code used. If the user can override an OS-provided agent, extra steps must be taken with every support case to check whether that has happened. The current recommendation of using a different provider name makes it immediately clear. Also, the provider name is intended to indicate who provided the agent. If a custom script reuses a provider name, it obscures that indicator. The current recommendation of using a different provider name when modifying a script makes it clear where the agent came from.
I don't believe this accomplishes that. When users modify or create resource agents, they typically get them working, then rarely or never touch them again. They tend to change less frequently than OS-supplied resource agents. Custom agents don't prevent /usr from being read-only any more than OS-supplied ones do. In either case, there has to be a mechanism to temporarily make /usr writeable during updates. Even if a non-/usr location is perceived to be desirable, I would argue for using a custom provider name, and have the non-/usr location be where to look for additional providers.
I agree. Disabling particular resource agents is no different than disabling particular binaries provided with any other package. If someone wishes to disable unused resource agents, likely they want to disable unused binaries from other packages as well, and already have a generic mechanism for doing so. Also, this is a security risk, not a mitigation. Being able to write a script into /etc that is automatically run as root without having to touch the pacemaker configuration destroys any security gained by mounting /usr read-only. And I can't imagine any scenario where a security compromise that allows an unused OCF agent as a vector doesn't have an easier vector elsewhere. Pacemaker runs as root and can run arbitrary executables. They don't have to be in the OCF agent directory. Regarding non-production agents such as Dummy, anything, etc., it is up to each distribution to decide which agents are installed by which packages. For example, RHEL already removes some agents distributed upstream. Any distribution could move such agents to a resource-agents-testing package, for example, or create a separate package for each resource agent, allowing users to install only the ones they need. Similarly users who compile their own can build packages as they like. Bottom line, I could see some value in having alternate locations for providers, but I think users should be shepherded into using a unique provider name if they modify or create an agent. |
On 27/11/17 18:18 +0000, Kristoffer Grönlund wrote:
Yeah, I think I follow what you're saying. Of course the `apache`
agent might not be the best example to allow when trying to avoid
privilege escalation, since it can be trivially configured to
execute arbitrary executables. Though that might be an argument for
fixing `apache`. ;)
To be honest, I didn't even start considering these trivial
bypasses, I vaguely remember I observed a nasty injection
(ClusterLabs/resource-agents#878 (comment)
could be related), which is an inherent risk with
execute-based-on-parameter unless there is a targeted
scrutiny.
Back to your point, when daemon executable is deemed absolutely
necessary parameter, there can always be a (preferably infloop-free)
check that all elements of the traversal path down to the binary are
owned by root-, and at least the binary is non-writable by others.
That would be a good start.
…--
Jan (Poki)
|
On 27/11/17 18:52 +0000, Ken Gaillot wrote:
> There's a bunch of executable glue scripts already (including
> /etc/rc.d/init.d ones for non-systemd systems), which is exactly
> what resource agents are meant to be. I see no conflict here.
While there are common existing cases of executables under /etc,
they are exceptions, not the rule.
This is then a subjectively inferred rule, not a given fact.
And I am not cheered when that's used as a base to naysay what
I believe is a good, versatile mechanism.
System administrators expect /etc to contain configuration, and
executables to be located elsewhere, except in unusual cases.
Ditto, plus resource agents are mostly configuration-dealing glue
to semi-supervise actual heavylifters. And initscripts were not
different in this aspect, while also present in /etc.
I believe this is recommended in the LSB, with good reason.
Ditto, plus a brief look at
https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/etc.htmln
/etc/cron.daily
A directory containing _shell scripts to be executed_ once a day;
An example is that resource agents do not necessarily need to be
scripts, they can be compiled, but /etc is architecture-independent.
That's up to careful consideration, symlinks from /etc are always
an option.
> On the other hand, let's not fall into the fallacy that current
> situation is a breeze in the "which agent variant was run, exactly"
> matter, at least with pacemaker in particular:
The main goal is whether enterprise support personnel can reasonably
determine whether a particular agent is supported, not the exact
agent code used. If the user can override an OS-provided agent,
extra steps must be taken with every support case to check whether
that has happened. The current recommendation of using a different
provider name makes it immediately clear.
You are talking about the happy cases (users caring about what was
written on the topic, etc.) while I am about the pessimistic scenarios.
And for these, there's next to no difference, except that with proper
tooling support, it may be immediately clear that /etc override is
what's in use (cf. hidden in-situ changes in the agents).
And a different provider can be used also with /etc location.
Also, the provider name is intended to indicate who provided the
agent. If a custom script reuses a provider name, it obscures that
indicator. The current recommendation of using a different provider
name when modifying a script makes it clear where the agent came
from.
See above.
> The main benefit as I see it would be enabling the sysadmin to add
> their own agents on top of a read-only /usr file system delivered
> by a transactional update mechanism.
I don't believe this accomplishes that. When users modify or create
resource agents, they typically get them working, then rarely or
never touch them again. They tend to change *less* frequently than
OS-supplied resource agents. Custom agents don't prevent /usr from
being read-only any more than OS-supplied ones do.
What I had in mind, say, a bunch of VMs could share the same /usr so
as to save space. Or systemd stateless approach could be used (it's
on topic for HA quite a lot, actually, after fencing, reboot to the
perfectly known state of the system, as an extension of rebooting the
machine to -- perhaps not so perfectly known, e.g., that one corrupted
file blocking further progress may still be present -- known state).
In these scenarios, one could treat /usr as distro-updates driven,
verifiable, reproducible part of the machine data, and contrary,
/etc/ being the dynamic, localized, customized factor, sharable
only selectively (csync2 seems like really a neat tool for that)
if at all. In my view, there's really no other intuitive alternative
to an overridding location than under /etc.
And even in regular uses, you don't want to bet resource-agents or
other packages will happen to acquire the same provider label you
chose for your custom agents, do you? If you placed the provider
in /etc, you'd have a benefits of:
= no potential loss of the code for the agents
(packages are really not expected to ship anything under /etc/ocf)
-> good
- your agents will have a priority even if the stated provider clash
happens -> good
In either case, there has to be a mechanism to temporarily make /usr
writeable during updates.
That's out of scope here.
Even if a non-/usr location is perceived to be desirable, I would
argue for using a custom provider name, and have the non-/usr
location be where to look for additional providers.
Sure the custom providers would be applicable in /etc as well.
> I'm sorry, but I don't understand this argument at all. Why is the
> administrator trying to prevent the administrator from configuring
> resources?
>
> I also don't recall any actual argument for why the anything agent
> is problematic...
I agree. Disabling particular resource agents is no different than
disabling particular binaries provided with any other package.
Except that with heartbeat's anything and the like, you can run
arbitrary executables with arbitrary arguments, i.e., whatever you
want? Whereas the surface is substantially limited with proper ones,
especially when hardenings like suggested in the previous comment
get applied.
If someone wishes to disable unused resource agents, likely they
want to disable unused binaries from other packages as well, and
already have a generic mechanism for doing so.
There must really be a misunderstanding, this part of the proposal
is about agents that pose unnecessary attack surface for cases the
(possily cluster-wide) execution of the agents becomes possible as
a matter of a breach or compromise of some kind (like the referred
CVE). It has nothing to do with unused binaries across the systems
(which furthermore regular users cannot run as root unless there's
some other exploit or security weakness), even if some agents allow
that (could be tightened further...).
This might even go a full circle:
- distribution-provided resource-agents only support distro-delivered
binaries where applicable (apache -> system's version of httpd),
with no possibility to override the executable through cluster
configuration, similarly, custom built resource-agents would
either configure-time figure out the correct paths or provide
respective toggles to preset the paths to executables accordingly
- if a different httpd is required, one is free to either override
the same agent using /etc location and/or use a custom provider,
but then is clearly on her own
And when agents are defaults-injection ready (like when using
": ${OCR_RESKEY_foo_default=bar}", the customization could easily
be as short as three lines: shebang, export of customized
OCR_RESKEY_foo_default, source of the original).
Main benefit is that by default, you'll get tailored setup without
possibility to override and run the binary on cluster configuration
level in said accident or similar scenarios.
Also, this is a security risk, not a mitigation. Being able to write
a script into /etc that is automatically run as root without having
to touch the pacemaker configuration destroys any security gained by
mounting /usr read-only.
I don't follow. How is a normal user privileged to write to /etc?
And I can't imagine any scenario where a security compromise that
allows an unused OCF agent as a vector doesn't have an easier vector
elsewhere. Pacemaker runs as root and can run arbitrary executables.
They don't have to be in the OCF agent directory.
Yes, in pacemaker, we should be looking at least at restricting lsb
agents not to allow trivial "parent directory" escapes, which is what
I actually used to translate rgmanager's "script" resources to CIB
equivalent in "clufter ccs2pcs*" :-)
(information about what needs to be symliked where could be enough
on clufter side)
Regarding non-production agents such as Dummy, anything, etc., it is
up to each distribution to decide which agents are installed by
which packages. For example, RHEL already removes some agents
distributed upstream. Any distribution could move such agents to
a resource-agents-testing package, for example, or create a separate
package for each resource agent, allowing users to install only the
ones they need.
Wow, case of telepathy, just discussed this idea today with pcs folks
:)
Similarly users who compile their own can build packages as they
like.
Bottom line, I could see some value in having alternate locations
for providers, but I think users should be shepherded into using a
unique provider name if they modify or create an agent.
Unless they want to defuse particular agents or keep unified CIB
in cases prompting the "adapting" overrides (perhaps along with
/usr being downright locked-down).
…--
Jan (Poki)
|
From http://refspecs.linuxbase.org/FHS_2.3/fhs-2.3.html#PURPOSE6 : "The /etc hierarchy contains configuration files. A 'configuration file' is a local file used to control the operation of a program; it must be static and cannot be an executable binary." The existence of exceptions to this is simply a result of decades of organic growth, before any standards existed (even POSIX). The subjectiveness of system administrators' expectation that /etc does not normally contain executables does not reduce the legitimacy of the expectation. Following common expectations, even loosely subjective ones, helps system administrators do their jobs.
If a user directly modifies a script deployed by an OS package, the next OS update of that package will overwrite it. That's an effective enforcement mechanism that quickly educates anyone who didn't pay attention to the documentation. If an administrator or someone else troubleshooting a cluster problem wants to look at the resource agent code, they're going to go to the standard location first. If the behavior doesn't fit the code they see, they'll just get confused. There won't be any obvious indication that there's an override.
That's feasible regardless of where custom agents are, and regardless of whether users can override an existing provider or require a unique provider name.
The point of mounting /usr read-only is to disallow root from writing to it. The vulnerability is to exploits that allow only writing files as root, as opposed to full shell access. If the attacker can replace a common command with a trojan, it will end up being executed. Allowing that same attacker to write an OCF override to /etc, and having pacemaker automatically run it without any configuration change required, provides a way around a read-only /usr. Existing scripts under /etc could be attacked in the same way, which is a good reason why they shouldn't be there, and are there only for historical reasons. From a security standpoint, mounting /usr read-only is stronger when paired with all other filesystems being mounted ro and/or noexec. As an example, Gentoo recommends mounting /etc read-only as well, with symlinks for files that need to be updated: https://wiki.gentoo.org/wiki/Filesystem/Security#Mount_options The bottom line from a security standpoint is that all executables should be on read-only partitions, otherwise the protection is only partial. (This is one reason this is not a common setup.) |
On 28/11/17 00:17 +0000, Ken Gaillot wrote:
> Also, this is a security risk, not a mitigation. Being able to write
> a script into /etc that is automatically run as root without having
> to touch the pacemaker configuration destroys any security gained by
> mounting /usr read-only.
> I don't follow. How is a normal user privileged to write to /etc?
The point of mounting /usr read-only is to disallow *root* from
writing to it
One of them, not all!
The vulnerability is to exploits that allow only writing files as
root, as opposed to full shell access. If the attacker can replace
a common command with a trojan, it will end up being executed.
Allowing that same attacker to write an OCF override to /etc,
But when you can write /etc/{passwd,shadow} amongst others, it's
a case lost already!!!
and having pacemaker automatically run it without any configuration
change required, provides a way around a read-only /usr.
Where did I say doing that to /usr primarily for security against
intruders? The main idea is to separate domains of distributor-provided
files and admin-delivered, prioritized ones, and this applies
regardless if /usr is set immutable or not. But this scheme comes
useful there just as well. And your linked, dated FHS (as opposed to
originally mentioned LSB), also makes it clear that /etc is unshareable
(localized) companion of /usr that is shareable (for that happening,
it needs to be "locked-down" in some way), i.e. one of the other
use cases I have in mind.
Existing scripts under /etc could be attacked in the same way, which
is a good reason why they shouldn't be there
Viz /etc/{passwd,shadow}...
and are there only for historical reasons.
Speculation.
From a security standpoint, mounting /usr read-only is stronger when
paired with all other filesystems being mounted ro and/or noexec. As
an example, Gentoo recommends mounting /etc read-only as well, with
symlinks for files that need to be updated:
https://wiki.gentoo.org/wiki/Filesystem/Security#Mount_options
But it doesn't tell to have /etc as noexec, likely for a reason,
and I don't see it coming. And the same solution -- symlinks --
would apply here as well, really depends how paranoid the
administrators want to go, but then, they would likely avoid
resource-agents altogether because in case of some enabling
vulnerability regarding resource manager, they will currently
rather assist with arbitrary execution. This is in part covered
by allowing one to narrow that surface by only allowing those
really employed in the proposed dualism.
The bottom line from a security standpoint is that *all* executables
should be on read-only partitions, otherwise the protection is only
partial. (This is one reason this is not a common setup.)
Ok, if that's your opinion, let's also add an explicit provision that
OCF scripts are not necessarily executable, in which case the resource
manager is responsible to parse and interpret shebang on its own.
(Just teasing your fascination around "being executable" while it
is more or less just a syntactical sugar, easily paralleled in user
space, provided by the kernel in case of non-binaries. It's just
a formalist's game if you realize that.)
…--
Jan (Poki)
|
There are many practical reasons why we want to copy this growingly
popular scheme, while enabling users to modify the agents per their
needs, for instance:
having solely static data in
/usr
allows one to share that asread-only (or sparsely utilized copy-on-write) mount point with
their VMs and containers so as to save space
no conflict-on-update issue
Hence my expectation is that OCF standard will address this,
presumably in
resource-agent-api.md
by replacingwith something like
The text was updated successfully, but these errors were encountered: