Tracee Roadmap update March 23 #2890
itaysk
started this conversation in
Development
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Over the past few months Tracee has been undergoing some fundamental improvements. I wanted to add some context to this, explain the motivation behind it, and lay out the plan for what's to come.
TL;DR: here's the high level plan:
Use case
Tracee was designed around a use case that looked like this: Tracee detect suspicious malicious behavior at runtime. You install it in your server or cluster and when it detects "something bad" it would alert you.
While Tracee did deliver that premise, there were still some gaps from users' expectations:
Given the above, we've modified Tracee's primary use case and mission statement: Tracee uses eBPF to tap into your system and expose hundreds of events that helps you understand how your system behaves.
Note that the focus on behavioral detections is replaced with access to many events which can be useful in different scenarios. The goal is to give you tools that help you understand how your system behaves, not to do the understanding for you.
Architecture
Tracee was made of two distinct components (binaries) that worked together:
tracee-ebpf
generates low level tracing data, which is passed totracee-rules
to find "bad" patterns in it. To support the previously discussed use case, when you "ran Tracee" (i.edocker run tracee
/helm install tracee
), it would orchestrate the two components internally so that the output is only security detections.The motivations for this design were:
Over time we experienced some hardships in maintaining this architecture:
These architectural challenges add to the previously discussed user experience challenges which led us to move toward a "unified architecture". The new architecture has a single binary -
tracee
. When executed it immediately emits events, including simple observability and security insights. It has a straightforward configuration that speaks to the use case described. Internally the code is split in two libraries for code hygiene, but to the user it's just one tool.User-friendly observability events
Tracee's new experience allows and encourages the user to trace any system activity events (regardless of security insights). You could always trace system calls with Tracee, which is a good starting point, but we believe are not ideal:
For these reasons, Tracee's signatures detections never used syscalls, and instead had been relying on alternative kprobe-based framework that produced more robust events for security detections. We want to not only expose these previously internal raw events, but to make it a spotlight feature which shapes Tracee's new experience.
To address this, we are introducing a higher-level set of events that represents commonly useful system activity events in a user-friendly, reliable and consistent way. These will address all common observability questions while abstracting away the details. This will include:
These new events will be the foundation to tracing with Tracee, as well as for creating security detections on top of.
Everything is an event
In the previous experience tracee-ebpf emitted structured tracing data, while tracee-rules emitted semi-structured security alerts. To support the new desired experience we are moving to what we refer to as "everything is an event". With this approach every kind of information that is emitted by Tracee will be logged as event entity that is following a common data model.
This includes syscalls, system observability events, internal security events which previously were not exposed to the user, and most importantly, tracee-rules's behavioral signatures detections. All of the above will now follow the same event model and consumed in the same way. In addition, user-defined custom events follow the same pattern and forensic capturing will be too.
To support this notion, the event schema might evolve and the UX options around it might change as we implement this approach.
Scopes
Tracee has a very rich and powerful filtering mechanism that lets you target specific workloads and scenarios, but filters could not be applied granularly for specific workloads or events. For example, you could not ignore some events only for specific workloads, or filter by event argument that is shared with other events. This kind of requirement is especially important for performance and efficiency optimizations.
As part of the new experience we are also introducing the "multi-scope" architecture for Tracee. This essentially lets you group a bunch of filters to a "scope", and there can be multiple scopes handled by the same Tracee instance. For example, if I want to ignore some events for some workloads but not others, I could create a scope which applies only to those relevant workloads (by using Tracee's existing powerful filters), and set the selected events for that scope (by using Tracee's event selection). Scope is like a bubble of tracing configuration that gives you fine control over what is being traced, and it is critical for real world advanced applications of runtime security.
Policies
While Tracee already has an impressive feature set, which keeps expanding, when we considered advanced applications of runtime security in a production servers or clusters, we understand that the mechanisms used to employ these features affects the experience nonetheless. The Command Line Interface (CLI) has its place as a basic experience, but for those advanced scenarios we want to provide a more scalable model for thinking about runtime security, which is based on declarative policies.
A policy is a declarative configuration that binds "scope" with "events". The policy tells the story of "for these workloads (scope), collect this information (events), in this way (event filters, outputs)".
In addition it makes place to define "action" to take when a policy condition is matched, opening the door to more interesting use cases in runtime security such as enforcement.
Being a declarative model the policy also ties well to the Kubernetes deployment story, where we expect policies to take a central role in the UX, perhaps as a Kubernetes CRD.
Summary
We are reimagining Tracee based on your feedback and experience. Here's the high level plan:
This is a high level roadmap that will be broken into deliverable work items and delivered incrementally over the next releases (some of it has already made it out by now).
We hope this excites you as much as us, and we welcome you to provide any feedback about this, in this discussion or in relevant GitHub issues.
Beta Was this translation helpful? Give feedback.
All reactions