-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grype image scan results non-deterministic #522
Comments
Hi @Dentrax, thanks for the issue! I saw you ran this command:
The CycloneDX output contains data that's known to be nondeterministic, like a timestamp. Because of this, there's no way to expect the digests of two scans to be identical. I see you ran Trivy with a template specified. You can do the same thing with Grype, and this gives you enough control of Grype's output to ensure that results are reproducible (and that you'd get the same digest between multiple scans). Does that make sense? |
I tired to pass @luhring Ability to pass custom templates would make sense! |
Cool! For how to use templates with Grype, see: https://github.com/anchore/grype#using-templates For the JSON output format (and possibly others), I think it's worth a discussion on if we want to modify the format to become deterministic. This would mean that we lose metadata like timestamps, but maybe that's okay. 🤔 |
Another thought... in the name of reproducible results, even with code changes to Grype's output formats, I think we should document the additional steps needed to be performed by the user in order to guarantee a reproducible result, such as:
|
Sounds so cool! Moreover, by performing this actions, maybe we can upload the deterministic scan result digest to So we can ensure any image |
That's interesting. Would we want to upload the scan signature+digest to Rekor? I'm not familiar with how this would fit into Fulcio yet.
There's another important point about reproducibility here: A given fixed image digest should be scanned frequently, and with the latest vulnerability data available at the time, because new vulnerabilities are discovered every day (and, even previously discovered vulnerabilities have their data in upstream data sources updated from time to time). With this recommended approach of scanning repeatedly, with new vulnerability data, we wouldn't want to assert that all scan results have the same digest. We'd want to allow for new vulnerability matches to be discovered, reported, and used as input to policies wherever appropriate. ^ This point might be obvious, but I wanted to make it explicit just in case, since we're talking about having an image scan produce consistent results. 😃 |
My bad, I meant Rekor. 🙈
Oh, now I clearly see the concern and why we should not assert the digests. But what if we are using the same vuln-db version? Let's assume we have the vuln-db versioned So we can push a tlog to Rekor such as: I scanned the image But still not so sure whether it makes sense since we update the vuln-db every X hour. 🤷 |
Yup, exactly! We would be able to expect reproducible scan results in this particular scenario.
Yeah, I like this. And IMHO we should also provide more information about the vulnerability database, including its digest.
I think we should strive for reproducibility 💯 under the right circumstances. And we should think about how people will consume these kinds of vulnerability scan attestations and Rekor entries to make informed decisions about the security of their artifacts. |
How should we proceed here? :) |
Not all output formats are guarenteed to be reproducible. For instance, CycloneDX can never be reproducible given that IDs are recommended to be random. That being said, there is a chance to make grype JSON documents reproducible:
# $ diff result1.json result2.json
134982c134982
< "file": "result1.json",
---
> "file": "result2.json",
135062c135062
< "timestamp": "2024-01-25T16:31:22.174899-05:00"
---
> "timestamp": "2024-01-25T16:31:36.511252-05:00" Keeping a time element is critical to vulnerability scans, but there are two time elements in the json output:
{
"name": "grype",
"version": "0.74.0",
"configuration": {
...
},
"db": {
"built": "2024-01-25T01:27:56Z",
"schemaVersion": 5,
"location": ".../Library/Caches/grype/db/5",
"checksum": "sha256:0e70dc967985e5a56678500b60aefb9442183c03301261252c7abd7dfae92784",
"error": null
},
"timestamp": "2024-01-25T16:31:36.511252-05:00"
} Note:
We could add an option that would remove the
This could be something like |
What happened:
grype generates different output content for the same image, which breaks the reproducibility.
Motivation comes from the in-toto/attestation#58 to put output result digest in the vuln spec. cc: @developer-guy
Not sure whether this is intentional or time/map object related.
What you expected to happen:
All the output results for the exactly same
IMAGE@sha256:digest
should generate the same digest.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
I tried the same commands with the trivy. And both SARIF & JSON output formats produced same digest:
Maybe we can get help from the trivy team so cc'ing @knqyf263.
trivy
:0.21.2
Environment:
grype version
:0.26.1
cat /etc/os-release
or similar):macOS 11
The text was updated successfully, but these errors were encountered: