Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grype image scan results non-deterministic #522

Open
Dentrax opened this issue Dec 8, 2021 · 10 comments
Open

grype image scan results non-deterministic #522

Dentrax opened this issue Dec 8, 2021 · 10 comments
Labels
bug Something isn't working I/O Describes bug or enhancement around application input or output

Comments

@Dentrax
Copy link

Dentrax commented Dec 8, 2021

What happened:

grype generates different output content for the same image, which breaks the reproducibility.

Motivation comes from the in-toto/attestation#58 to put output result digest in the vuln spec. cc: @developer-guy

Not sure whether this is intentional or time/map object related.

What you expected to happen:

All the output results for the exactly same IMAGE@sha256:digest should generate the same digest.

How to reproduce it (as minimally and precisely as possible):

$ docker image pull golang:1.17
$ grype golang:1.17 --output cyclonedx --file result1
$ docker image rmi golang:1.17
$ grype golang:1.17 --output cyclonedx --file result2
$ sha256sum result1
5ce609c4b26876f6394d57182346bbf46bc863753010b990b862d49caa9874ee result1
$ sha256sum result2
d1929364ecb727d14615d1b813f3c9b499ada7ef4979ff1c933c73444fdff84d result2
$ grype golang:1.17 --output json --file json1
$ grype golang:1.17 --output json --file json2
$ sha256sum json1
c9883e5a2c64631448145beaf20af51a1d085f3d47bba3db6555d28839a82072  json1
$ sha256sum json2
c0e2090127d7ba33f3b6f09732a26789f5c65f9c7068063507813793b35d44a3  json2

Anything else we need to know?:

I tried the same commands with the trivy. And both SARIF & JSON output formats produced same digest:

$ trivy image --format template --template "@./sarif.tpl" -o report1.sarif golang:1.17
$ trivy image --format template --template "@./sarif.tpl" -o report2.sarif golang:1.17
$ sha256sum 2fcf23e25debe08f3b5490f077bd926d85afc67c94df28f173bd0aa766e4ce24 report1.sarif
$ sha256sum 2fcf23e25debe08f3b5490f077bd926d85afc67c94df28f173bd0aa766e4ce24 report2.sarif

Maybe we can get help from the trivy team so cc'ing @knqyf263.

trivy: 0.21.2

Environment:

  • Output of grype version: 0.26.1
  • OS (e.g: cat /etc/os-release or similar): macOS 11
@Dentrax Dentrax added the bug Something isn't working label Dec 8, 2021
@luhring
Copy link
Contributor

luhring commented Dec 8, 2021

Hi @Dentrax, thanks for the issue!

I saw you ran this command:

grype golang:1.17 --output cyclonedx --file result1

The CycloneDX output contains data that's known to be nondeterministic, like a timestamp. Because of this, there's no way to expect the digests of two scans to be identical.

I see you ran Trivy with a template specified. You can do the same thing with Grype, and this gives you enough control of Grype's output to ensure that results are reproducible (and that you'd get the same digest between multiple scans).

Does that make sense?

@Dentrax Dentrax changed the title grype image scan results aren't reproducible grype image scan results non-deterministic Dec 8, 2021
@Dentrax
Copy link
Author

Dentrax commented Dec 8, 2021

I tired to pass --output json flag as you can see in the issue, but it produces non-deterministic digests too. I think it's related to what you said for cyclonedx. (timestamp etc.)

@luhring Ability to pass custom templates would make sense!

@luhring
Copy link
Contributor

luhring commented Dec 8, 2021

Cool!

For how to use templates with Grype, see: https://github.com/anchore/grype#using-templates

For the JSON output format (and possibly others), I think it's worth a discussion on if we want to modify the format to become deterministic. This would mean that we lose metadata like timestamps, but maybe that's okay. 🤔

@luhring luhring added the I/O Describes bug or enhancement around application input or output label Dec 8, 2021
@luhring
Copy link
Contributor

luhring commented Dec 8, 2021

Another thought... in the name of reproducible results, even with code changes to Grype's output formats, I think we should document the additional steps needed to be performed by the user in order to guarantee a reproducible result, such as:

  • obtaining the vulnerability database ahead of time, and telling Grype not to update the database at execution time
  • ensuring that the scan target itself is referenced in a deterministic way (e.g. an image digest)

@Dentrax
Copy link
Author

Dentrax commented Dec 8, 2021

* obtaining the vulnerability database ahead of time, and telling Grype not to update the database at execution time

* ensuring that the scan target itself is referenced in a deterministic way (e.g. an image **digest**)

Sounds so cool! Moreover, by performing this actions, maybe we can upload the deterministic scan result digest to fulcio Rekor. 🤔

So we can ensure any image foo@sha256:bar in this case, produces exactly baz scan result digest. Not so sure what we can do it later, but it would be a cool idea. cc: @dlorenc

@luhring
Copy link
Contributor

luhring commented Dec 8, 2021

That's interesting. Would we want to upload the scan signature+digest to Rekor? I'm not familiar with how this would fit into Fulcio yet.


we can ensure any image foo@sha256:bar in this case, produces exactly baz scan result digest

There's another important point about reproducibility here: A given fixed image digest should be scanned frequently, and with the latest vulnerability data available at the time, because new vulnerabilities are discovered every day (and, even previously discovered vulnerabilities have their data in upstream data sources updated from time to time).

With this recommended approach of scanning repeatedly, with new vulnerability data, we wouldn't want to assert that all scan results have the same digest. We'd want to allow for new vulnerability matches to be discovered, reported, and used as input to policies wherever appropriate.

^ This point might be obvious, but I wanted to make it explicit just in case, since we're talking about having an image scan produce consistent results. 😃

@Dentrax
Copy link
Author

Dentrax commented Dec 8, 2021

I'm not familiar with how this would fit into Fulcio yet.

My bad, I meant Rekor. 🙈

we wouldn't want to assert that all scan results have the same digest.

Oh, now I clearly see the concern and why we should not assert the digests. But what if we are using the same vuln-db version? Let's assume we have the vuln-db versioned v1. And 2 same images with the same digests. In this case, would it make sense to assert that all scan results have the same digest?

So we can push a tlog to Rekor such as: I scanned the image foo@sha256:bar against vuln-db v1 using grype v0.26.1 and I expect a JSON output that has digest qux.

But still not so sure whether it makes sense since we update the vuln-db every X hour. 🤷

@luhring
Copy link
Contributor

luhring commented Dec 8, 2021

But what if we are using the same vuln-db version? Let's assume we have the vuln-db versioned v1. And 2 same images with the same digests. In this case, would it make sense to assert that all scan results have the same digest?

Yup, exactly! We would be able to expect reproducible scan results in this particular scenario.

So we can push a tlog to Rekor such as: I scanned the image foo@sha256:bar against vuln-db v1 using grype v0.26.1 and I expect a JSON output that has digest qux.

Yeah, I like this. And IMHO we should also provide more information about the vulnerability database, including its digest.

But still not so sure whether it makes sense since we update the vuln-db every X hour.

I think we should strive for reproducibility 💯 under the right circumstances. And we should think about how people will consume these kinds of vulnerability scan attestations and Rekor entries to make informed decisions about the security of their artifacts.

@Dentrax
Copy link
Author

Dentrax commented Feb 7, 2022

How should we proceed here? :)

@tgerla tgerla assigned tgerla and unassigned luhring and tgerla Aug 11, 2022
@wagoodman
Copy link
Contributor

Not all output formats are guarenteed to be reproducible. For instance, CycloneDX can never be reproducible given that IDs are recommended to be random.

That being said, there is a chance to make grype JSON documents reproducible:

❯ grype golang:1.17 --output json --file result1.json
 ✔ Vulnerability DB                [no update available]
 ✔ Loaded image                                                                                                                                golang:1.17
 ✔ Parsed image                                                                    sha256:8685b3216ef4a80742c4d5f29f547838997cc0c7cca68222cfdab7c6821ccf5b
 ✔ Scanned for vulnerabilities     [1130 vulnerability matches]
   ├── by severity: 36 critical, 288 high, 308 medium, 32 low, 448 negligible (18 unknown)
   └── by status:   443 fixed, 687 not-fixed, 0 ignored
A newer version of grype is available for download: 0.74.2 (installed version is 0.74.0)

❯ grype golang:1.17 --output json --file result2.json
 ✔ Vulnerability DB                [no update available]
 ✔ Loaded image                                                                                                                                golang:1.17
 ✔ Parsed image                                                                    sha256:8685b3216ef4a80742c4d5f29f547838997cc0c7cca68222cfdab7c6821ccf5b
 ✔ Scanned for vulnerabilities     [1130 vulnerability matches]
   ├── by severity: 36 critical, 288 high, 308 medium, 32 low, 448 negligible (18 unknown)
   └── by status:   443 fixed, 687 not-fixed, 0 ignored
A newer version of grype is available for download: 0.74.2 (installed version is 0.74.0)
# $ diff result1.json result2.json
134982c134982
<    "file": "result1.json",
---
>    "file": "result2.json",
135062c135062
<   "timestamp": "2024-01-25T16:31:22.174899-05:00"
---
>   "timestamp": "2024-01-25T16:31:36.511252-05:00"

Keeping a time element is critical to vulnerability scans, but there are two time elements in the json output:

cat result2.json | jq '.descriptor'
{
  "name": "grype",
  "version": "0.74.0",
  "configuration": {
    ...
  },
  "db": {
    "built": "2024-01-25T01:27:56Z",
    "schemaVersion": 5,
    "location": ".../Library/Caches/grype/db/5",
    "checksum": "sha256:0e70dc967985e5a56678500b60aefb9442183c03301261252c7abd7dfae92784",
    "error": null
  },
  "timestamp": "2024-01-25T16:31:36.511252-05:00"
}

Note:

  • .descriptor.timestamp: when grype was invoked
  • .descriptor.db.built: the time the data was sourced and built into the DB

We could add an option that would remove the .descriptor.timestamp from the grype output, which would make results reproducible when the same configuration/DB is being used. For use cases when you are using different DBs or configuration it is necessary to get the subselection of the grype document you need to do that:

❯ cat result1.json | jq '.matches' | sha256sum
d149e542ee35687266abd6cef70b0038131ee854eb0750d98244acf2c3d760b6  -

❯ cat result2.json | jq '.matches' | sha256sum
d149e542ee35687266abd6cef70b0038131ee854eb0750d98244acf2c3d760b6  -

This could be something like GRYPE_TIMESTAMP=false (env), but probably not a CLI flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working I/O Describes bug or enhancement around application input or output
Projects
Status: No status
Development

No branches or pull requests

4 participants