Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(misconf): Add optional start/end_line config values for ignorefile misconf #7339

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

michael-a-shelton
Copy link

Description

The proposal is to add the ability to define the start/end lines as optional config fields in the ignore file.

Current:

  - id: KSV048
    paths:
      - templates/test-template.yaml

Proposed:

  - id: KSV048
    start_line: 9
    end_line: 11
    paths:
      - templates/test-template.yaml

This is a relatively non-invasive small change, adding two fields to the IgnoreFinding struct, passing the values from the calling functions into the Match method, where a conditional then does the line comparsions. The value add here is to allow more granular control over the explicit rules being ignored, instead of ignoring the rule across the entire rendered output, but over specific lines in the output.

Related issues

None - Discussion 7325

Related PRs

None

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@CLAassistant
Copy link

CLAassistant commented Aug 13, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

continue
}

log.Debug("Ignored", log.String("id", id), log.String("target", path))
return &finding
if finding.StartLine != 0 && finding.EndLine != 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should only those finds that exactly match the start and end of the lines be ignored? Or is it better to ignore everything between the passed position? For example, I want to ignore a misconfig that is found inside a Terraform block, but I want to specify the position of the block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth here and thought it made more sense to be explicit to match the lines of the finding itself instead of a range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-a-shelton could you give an example of why passing a range isn't ideal? Just trying to understand. To me it feels passing an explicit range would have more flexibility.

Copy link
Author

@michael-a-shelton michael-a-shelton Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, someone typos the "end_line" in their ignore file. Maybe they meant to do:

start_line: 5
end_line: 10

but instead, fat-fingered the number on end_line so it was something like:

start_line: 5
end_line: 100

This would still pass if using a range, but should actually instead fail as now you're ignoring more than intended.

The start and end line numbers are actually how the finding itself is captured as well, so those fields are there mirroring the CauseMetadata struct for the misconf: https://github.com/aquasecurity/trivy/blob/main/pkg/fanal/types/misconf.go#L37-L38

pkg/result/ignore.go Outdated Show resolved Hide resolved
@@ -83,19 +92,26 @@ func (i *IgnoreFinding) UnmarshalYAML(value *yaml.Node) error {

type IgnoreFindings []IgnoreFinding

func (f *IgnoreFindings) Match(id, path string, pkg *packageurl.PackageURL) *IgnoreFinding {
func (f *IgnoreFindings) Match(id string, results *FindingsResults) *IgnoreFinding {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id already exists in FindingsResults.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also debated on this quite a bit. The only scanner passing multiple IDs is the misconf scanner. I decided to avoid refactoring all the other scanner functions up the stack to support a slice of strings for IDs, when they only had one, and instead just pass the single string down.

I could just as well refactor those and move the for loop from the misconf method into the Match method, but felt that a bit clunky to put a for loop over all other scanners, when only one uses it (even though it would just be one iteration for the others).

I'm open to making that change if we want to make the method signature friendlier. Open to feedback here.

@michael-a-shelton
Copy link
Author

I left a comment on the discussion but will leave it here in hopes to get some feedback. Can we use a CCLA instead of ICLA? This contribution is on behalf of JPMorgan

@brooklynrob
Copy link

brooklynrob commented Aug 28, 2024

CLA assistant check Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.You have signed the CLA already but the status is still pending? Let us recheck it.

I am the CLA at approver JPMC and @michael-a-shelton is approved to contribute on our behalf. It appears you have an ICLA that you use as a form of CCLA? Is that correct?

@simar7
Copy link
Member

simar7 commented Aug 28, 2024

CLA assistant check Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.You have signed the CLA already but the status is still pending? Let us recheck it.

I am the CLA at approver JPMC and @michael-a-shelton is approved to contribute on our behalf. It appears you have an ICLA that you use as a form of CCLA? Is that correct?

cc @itaysk

@itaysk
Copy link
Contributor

itaysk commented Sep 6, 2024

@brooklynrob if you want to do CCLA please send an email to [email protected]

@michael-a-shelton
Copy link
Author

@knqyf263 Any chance for a review for approval on this? Would like to either close/merge this, or adjust any final feedback you may have.

@simar7 simar7 self-requested a review September 10, 2024 21:33
Copy link
Member

@simar7 simar7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm but we probably need to hear back on the CCLA proceedings prior to merge.

@knqyf263
Copy link
Collaborator

Sorry, I thought it was about misconfigurations, so I left the review to @simar7 and @nikpivkin and didn't look at it.

IMHO, it should be done by Rego. There is some discussion about whether it should be an exact match or a range, for example, but this depends on the user. Also, other users may want to filter by other fields in the CauseMetadata. I think that it is not practical to cover all of those use cases in the ignorefile.

@michael-a-shelton
Copy link
Author

Rego is a syntax that a lot of engineers are not familiar with. We are intending to drop this into dozens and dozens of repositories and asking 50-75 engineers to use it. Adding lines to a config file is a far smaller learning curve than adding in/maintaining rego, project to project, engineer to engineer.

By not having line configurations in an ignore file, you're actually producing false findings by ignoring entire rulesets for an entire scan, when it should be more explicit (or at least allow that level of granularity). Specifically, for helm template configuration checks example; it could have two deployment manifests, one that has privileged containers and one that does not. Not having a line enforcement on the ignore file would ignore that rule across both entirely when it should not, and allow privileged containers where it was not intended on a configuration check.

@knqyf263
Copy link
Collaborator

knqyf263 commented Sep 11, 2024

Rego is a syntax that a lot of engineers are not familiar with. We are intending to drop this into dozens and dozens of repositories and asking 50-75 engineers to use it.

Once someone writes a simple example, others in your organization can copy and tweak it. Actually, the line-based filtering can be written within 10 lines.

package trivy

default ignore = false

# Ignore AVD-AWS-0086 in line 3-5
ignore {
    input.ID == "AVD-AWS-0086"
    input.CauseMetadata.StartLine == 3
    input.CauseMetadata.EndLine == 5
}

You can even write the line range as below.

ignore {
    input.ID == "AVD-AWS-0086"
    input.CauseMetadata.StartLine > 2
    input.CauseMetadata.EndLine < 6
}

As described above, line-based filtering is not expensive to learn and ultimately gives you more flexibility. For example, engineers in other teams may want to filter by the service name.

ignore {
    input.ID == "AVD-AWS-0086"
    input.CauseMetadata.Service == "s3"
}

While I agree that the learning cost of Rego is high (in fact, I'm always struggling with Rego), I believe it is low for this case as it simply accesses each field of the JSON, like jq. Adding richer examples here may help.

@simar7 @nikpivkin, what do you think? The above Rego is simple enough for me, but if you both consider adding this support to .trivyignore makes sense, I'm open to it. I just left my thoughts.

By not having line configurations in an ignore file, you're actually producing false findings by ignoring entire rulesets for an entire scan, when it should be more explicit (or at least allow that level of granularity). Specifically, for helm template configuration checks example; it could have two deployment manifests, one that has privileged containers and one that does not. Not having a line enforcement on the ignore file would ignore that rule across both entirely when it should not, and allow privileged containers where it was not intended on a configuration check.

Yes, I understand the use case, but many demands exist, and it is difficult to address them all in the ignore file. .trivyignore is assumed to be used for simple filtering only.

@simar7
Copy link
Member

simar7 commented Sep 11, 2024

@knqyf263 your examples have made me rethink this. To be honest @michael-a-shelton if you still think we should add your changes for any reasons not addressed above, please do let us know. I do think to @knqyf263's point that in this particular case Rego is in fact not that hard to write plus is more flexible. At the same time, I don't want to be a gatekeeper for a change if it has a valid reason to exist.

Also to @knqyf263 's point about adding examples such as this is a great idea. This isn't the first time we've received a request to have more granular control over ignores and I feel with Rego's flexibility, we could add more insight into how to tackle such cases. As an example, we've re-written our own checks in Rego so there's a lot more examples to learn from. See here.

@nikpivkin is on leave at the moment but once he's back, I'd like to hear from him as well on what he has to say.

@michael-a-shelton
Copy link
Author

michael-a-shelton commented Sep 11, 2024

I do not think rego solves my issue. Rego is inherently hard to get engineers to learn, especially those who do not work with it regularly. It would be difficult to scale to some of our needs. For example, one chart has 200 ignore rules in the ignore file that we run, when using the granular control. This is a CNCF OSS chart and not our own. Doing that in rego would be unwieldly and unfriendly to those not familiar with rego itself. That is one chart out of the 40-50 we're pushing.

The config file approach is in line with other linters (Which is essentially what the misconf is for us, here, for helm templating). It is important to us to try to avoid additional cognitive load on engineers, and staying to the same pattern of maintaining config that is the standard across most tools in the space is ideal. Already managing linters for go, python, terraform, markdown, etc, it is a pattern that has no learning curve.

I also think the ignorefile concept for misconf is inherently broken without the use of explicit lines. So, it is not so much a feature as it is a potential bugfix.

For example; running the misconf scan against a template that has clusterroles. ksv048 flags over privileged cluster roles, however there is a need for the CRUD verbs on one of the admin roles. By simply adding ksv048 to the ignore file, it is now essentially disabling that rule for the admin role and other possibly read-only roles in the same file. This is a security risk. By stating we only want to ignore ksv048 for lines 9-14, it will still be enforced on the read-only clusterrole.

We could split this into separate files for this scenario, however we have seen this same pattern a lot in upstream OSS components and we try to avoid manipulating those and deviating from upstream.

I realize the yaml ignorefile is experimental, but it is something we would like to use. The other alternative would be creating a rego wrapper and loading the yaml config ourselves and parsing it in rego, which I would like to avoid doing.

@knqyf263
Copy link
Collaborator

This isn't the first time we've received a request to have more granular control over ignores and I feel with Rego's flexibility, we could add more insight into how to tackle such cases

Exactly. Ignoring by line number exact matching is necessary in this case, but other users have other needs.

@nikpivkin is on leave at the moment but once he's back, I'd like to hear from him as well on what he has to say.

Ah, I forgot about it. I'd like to hear his thoughts as well.

Rego is inherently hard to get engineers to learn, especially those who do not work with it regularly. It would be difficult to scale to some of our needs.

If you are limiting yourself to a specific application (e.g., line-based filtering), there is no need to learn Rego syntax. Simply modify the line numbers in the sample I wrote.

These two files are almost the same.

  - id: AVD-AWS-0086
    start_line: 3
    end_line: 5
ignore {
    input.ID == "AVD-AWS-0086"
    input.CauseMetadata.StartLine == 3
    input.CauseMetadata.EndLine == 5
}

What is the difference for you? Debugging Rego can be difficult if it does not work as expected, but so is YAML. Without reading the source code, it is impossible to understand how .trivyignore.yaml is processed.
Like @simar7 said, I am not opposed to adding the feature if there is a good reason. I'm just trying to understand what is preventing you from using Rego.

@michael-a-shelton
Copy link
Author

It is managing code versus managing config. Code has significantly more overhead with it in regards to TLM. It is also quite a bit more verbose and more prone to error. For a one off scenario, I agree the rego approach is fine. At scale and longevity is where issues may start popping up.

OPA v1 changes are an example here; this does not fit that syntax and would break when required/moved to it. Also, if CauseMetadata or any part of that struct changes, all of the rego now becomes invalidated.

One chart out of the many we work with has over 200 entries for an ignore file. ID, start_line, end_line, statement. That is 800 lines, which is a lot in it's own right.

However, moving that to rego now becomes 1200 lines (adding the lines for the ignore block and the spaces before/after). Sure, we could write more advanced rego to eliminate duplication of ignore blocks per ID and handle the sets of lines, but that gets back into managing and requiring engineers now to understand sets/comprehension etc in rego. Let's now take that number and multiple it times the amount of charts we're managing of 40-50, and you see how quickly that becomes unmanageable. Now, not every repo has 200 entries, but lets use that number and extrapolate... thats 48k lines of rego that now have to be managed.

Should a change to CauseMetadata struct happen, or the flip to OPAv1 syntax, a config file is abstracted away from those changes and would not require any uplift. The rego approach would require alterations to 48k lines of uplift.

@knqyf263
Copy link
Collaborator

At scale and longevity is where issues may start popping up.

This sounds like the opposite argument to me. As the operation scales, YAML becomes inadequate, and more flexible filtering is needed. This is actually what we have seen over the years.

However, moving that to rego now becomes 1200 lines (adding the lines for the ignore block and the spaces before/after).

A comparison with regard to the line number would help the transition to Rego more, although I don't think it is meaningful. As you also mentioned, rules can be written in short as follows.

package trivy

default ignore = false

ignore {
    rule = [
        {"id": "AVD-AWS-0086", "start": 1, "end": 3},
        {"id": "AVD-AWS-0087", "start": 1, "end": 6},
        {"id": "AVD-AWS-0088", "start": 10, "end": 20}
        {"id": "AVD-AWS-0089", "start": 20, "end": 30}
    ][_]
    input.ID == rule.id
    input.CauseMetadata.StartLine == rule.start
    input.CauseMetadata.EndLine == rule.end
}

If there are 200 rules, YAML will result in 800 lines, whereas Rego will only require 210 lines. If there are 50 repositories, with YAML it is 40k lines, but with Rego it is about 10k lines. However, again, a comparison of line numbers doesn't make sense to me.

OPA v1 changes make sense, but it can also happen in Trivy's ignorefile. As you probably know, Trivy used to have .trivyignore, but needed to introduce .trivyignore.yaml for more flexibility. Furthermore, .trivyignore.yaml is still an experimental feature that is subject to breaking changes. We try to minimize such change, but sometimes it is unavoidable. It may not be correct to assume that a change in .trivyignore.yaml is less disruptive than one in Rego.

I assume that OPA is not yet stable as it is below v1, which is why breaking changes will be made at the time of becoming v1, but it is expected to be stable after reaching v1. Also, for migration to the new syntax, migration methods are supported at language level, such as opa fmt --refactor-local-variables. If changes are made to .trivyignore.yaml, it is difficult to provide such functionality.

In conclusion, our policy is to recommend Rego for more advanced filtering and implement it in .trivyignore.yaml for the basic filtering that most people need. Otherwise, we end up implementing very complex logic in .trivyignore.yaml. @simar7 should know more about how much need there is, so I'd defer to him.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants