Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining a generalized predicate format for "human reviews" of artifacts #77

Open
adityasaky opened this issue Dec 5, 2021 · 27 comments · May be fixed by #151
Open

Defining a generalized predicate format for "human reviews" of artifacts #77

adityasaky opened this issue Dec 5, 2021 · 27 comments · May be fixed by #151

Comments

@adityasaky
Copy link
Member

adityasaky commented Dec 5, 2021

We started discussing what code review attestations should look like, and @iamwillbar suggested checking if we could define a generalized predicate for reviews. We could then derive code review and other formats (perhaps vuln scans like #58) from this general predicate. As a starting point, we've potentially got the following sections in the predicate.

Edit: the general consensus is that we should only handle attestations for human reviews here, so that's what we're going to be focusing on.

Meta Information

In the case of code reviews, this could identify the person performing the review, but in the case of a vuln scan, it could identify the scanner used. One thing to note is that this may be unnecessary in the case of some reviews, because we could use the signature to identify the functionary as we do with links. Further, we could capture temporal information about when the review was performed.

Artifact Information

This section essentially identifies the artifacts the review applies to. We can incorporate ITE-4 here as well. One project we're looking to work with is crev, so we could use something like purl as well to identify package-level reviews as a whole. One open question may be reviews for diffs, and how they'd chain together to apply to a file as a whole.

Edit: we can probably lean on VCS semantics via ITE-4 and tie CRs to the underlying VCS changes / commits they correspond to. Also, as noted below, this would be part of the statement's subject rather than the predicate, but it was included here to nail down exactly what we're likely to be capturing in a review.

Review Result

This is pretty self explanatory for both CRs and vuln scans. I'm not sure if a negative result CR should exist, except maybe for package-results as they're used in crev.


These are all early thoughts in defining a generalized review format, and I'm curious to hear what people think about this. Also open to hearing about other projects like crev we should be paying attention to when working on this.

@dn-scribe
Copy link

I suggest first create a list of possible policies, some examples that are inline with your thoughts. Additions are welcomed. (in parenthesis - requirements from the code review predicate):
At the statement level:

  1. Do not push code without a code review . (The statement level should answer that)
  2. Code reviews shall be done by authorized personnel (The statement level should answer that)

At the predicated level:

  1. Do not push code that was not approved (predicate should include a status that can have the value 'approved')
  2. Every code review must have artifacts (code fixes, document, comment) (predicate should have a resulting_artifacts field with links to such artifacts).
  3. If the reviewer did not approve, an update should be issued within a certain time. (predicate should include status with the value 'not approved' + time stamp).
  4. If the reviewer approved conditionally, require the developer to attest to complying the conditions (predicate should include status 'conditionally approved').
  5. Code review should include reviewing tests. (the predicate should include a field of reviewed_artifacts with a list of such artifacts with links).
  6. A policy for safety related products: Code review requires participation of a set of participants (QA, DevOps, etc.) (calls for a field 'participants').
  7. Code reviewer should attest to going through some checklist (predicate should include a field 'tests' which could be a list of objects)

Regarding using the same attestation for code reviews and for code-scanning:
I suggest not.

  1. vlun and sast are automatic testing steps and they do not replace the code review that has value of being done by another developer, and naturally has different results (structural suggestions, backward compatibility issues, integration issues - things that are not produced from a code scanning tool).
  2. vlun and sast results fit into some well defined formats (like SARIF that is supported by many sast tools), while code review does not clearly need a similar format.

Regarding the artifact information:
Isn't this the goal of the subject (at the statement level)?
Since the attestation format allows for multiple subjects the code review attestation could require a subject at the project level (project name or PR ID and additional subjects - artifacts.

@adityasaky
Copy link
Member Author

Regarding the artifact information:
Isn't this the goal of the subject (at the statement level)?
Since the attestation format allows for multiple subjects the code review attestation could require a subject at the project level (project name or PR ID and additional subjects - artifacts.

It is! In my mind, I was trying to separate out the types of artifacts. IMO files and packages have some reasonable answers, but I'm more curious about diffs/patches/changes. I think we can lean on VCS semantics here, either service-specific (ITE-4 resolvers for GitHub PRs) or agnostic (git commit ranges).

At the statement level

I think this makes sense, and we can likely bundle all the meta-information here.

Every code review must have artifacts (code fixes, document, comment) (predicate should have a resulting_artifacts field with links to such artifacts).

Hmm, this is interesting. I was visualizing something more along the lines of "this is approved" after several iterations have addressed reviewer points. I do wonder how this would work when commit-history is rewritten (say squashed) pre-merge, and if we could still connect the different attestations in such cases.

Code reviewer should attest to going through some checklist

This makes sense, and something to attach to the policy for reviews as well :)

vlun and sast are automatic testing steps and they do not replace the code review that has value of being done by another developer, and naturally has different results (structural suggestions, backward compatibility issues, integration issues - things that are not produced from a code scanning tool).

I think I agree, given the type of CR attestation you have in mind vs what I was thinking of. I'm interested to hear what other folks have in mind too.

@TomHennen
Copy link
Contributor

I agree with @dn-scribe that it probably doesn't make sense to combine vuln scans and code reviews in a single type.

I also agree that leaning on the VCS (or code review system) semantics for code review makes sense. In my mind each could have been separate 'types'. Is the desire to have a generic way to determine if a given commit was reviewed regardless of the VCS/review system used?

Where do you imagine the code review attestations being used?

Would they be used:

a) when looking at a source control system to see what was reviewed and what wasn't?
b) when looking at artifacts compiled from source in a VCS?
c) other?

If 'b' then how does the evaluator ever know that all the source was reviewed vs just bits and pieces?

I think there may be some overlap with #47 which aims to help solve 'b' but doesn't do much for 'a'. (I should update that with our proposal today, it's been too long)

@iamwillbar
Copy link

I wouldn't combine vuln scans (since those are automated scans), however, I think there's value in a base "human reviewed artifact and the outcome was X" that can be used as-is or specialized when there is a need for X to be more structured. For example, we may want to capture that a threat model was reviewed, this may be a code review, but it may also be outside (such as a document or presentation). There are also reviews of people's identification, organization's certifications, service's terms of service, etc. that organizations may want to be able to capture as attestations.

Code review attestations can then specialize this by constraining the artifact and providing a code-review specific structure to capture the outcome.

@adityasaky
Copy link
Member Author

Given the consensus about handling vuln scans and the like independently, I'm going to generalize this to handle "human reviews", as @iamwillbar suggested. I'll rewrite the original issue to reflect this.

Is the desire to have a generic way to determine if a given commit was reviewed regardless of the VCS/review system used?

In my mind, I think we can lean on ITE-4 semantics here, and talk in abstract, VCS-agnostic terms when defining how CRs for a particular change work. I've primarily used git, and I'm guessing mercurial is a system we want to consider (and others?). A quick look at the git-mercurial rosetta seemed to indicate that this could work for these systems at least, what do you think?

If 'b' then how does the evaluator ever know that all the source was reviewed vs just bits and pieces?

Would VCS semantics help here? We could capture the source being built against a checkpoint in the VCS, and cross reference with code reviews available? We'd have to be smart about connecting differential CRs. We'd also have to be sure that connecting a series of CRs mapped to changes is sufficient as a review of the codebase as a whole at that checkpoint. I suspect it is, and matches the typical workflow.

Where do you imagine the code review attestations being used?

I'd love broader community input here, specifically for corporate use cases. I personally see this as fulfilling the SLSA two-party review requirements, using a threshold of CR attestations to sign off on changes.

@adityasaky adityasaky changed the title Defining a generalized predicate format for "reviews" Defining a generalized predicate format for "human reviews" of artifacts Dec 12, 2021
@TomHennen
Copy link
Contributor

Is the desire to have a generic way to determine if a given commit was reviewed regardless of the VCS/review system used?

In my mind, I think we can lean on ITE-4 semantics here, and talk in abstract, VCS-agnostic terms when defining how CRs for a particular change work. I've primarily used git, and I'm guessing mercurial is a system we want to consider (and others?). A quick look at the git-mercurial rosetta seemed to indicate that this could work for these systems at least, what do you think?

I suspect we might need to consider both VCS semantics as well as the code review system semantics. My understanding is that Gerrit and GitHub reviews work differently. Is that right?

If 'b' then how does the evaluator ever know that all the source was reviewed vs just bits and pieces?

Would VCS semantics help here? We could capture the source being built against a checkpoint in the VCS, and cross reference with code reviews available? We'd have to be smart about connecting differential CRs. We'd also have to be sure that connecting a series of CRs mapped to changes is sufficient as a review of the codebase as a whole at that checkpoint. I suspect it is, and matches the typical workflow.

I'd be wary of just taking this approach. For repos that are old and have had lots of reviews that could result in thousands of attestations for each repo that need to be evaluated at the time of use. The verifier would also need to have access to all those attestations and propagating them all through CI/CD would probably be infeasible.

That being said I do think this could be used very effectively to audit a repo and make sure it's living up to what it says it's doing. E.g. use this approach (attestation for each review) in combination with #47 (comment) (which only has the repo claim "hey everything in this repo requires 2 party review") could be very effective.

With #47 (comment) systems can easily and quickly make decisions about artifacts without having to communicate thousands of attestations, while the attestation-per-review (as discussed here) approach allows for automated auditing of repos and much more information about the individual reviews.

WDYT?

@adityasaky
Copy link
Member Author

My understanding is that Gerrit and GitHub reviews work differently.

I actually plan to use some time this coming break to review some of the popular systems / services to understand how much we can abstract. I should have a better answer then.

For repos that are old and have had lots of reviews that could result in thousands of attestations for each repo that need to be evaluated at the time of use.

That's a great point. I do wonder if we can use something akin to a summary link, but instead to replace some significant number of CR attestations. The summary would identify the attestations it has replaced as well as the checkpoint it applies to, and without getting too much into the implementation of this, I could see the replaced attestations being fetched from a Rekor instance or a Grafeas server perhaps for further validation when necessary.

With #47 (comment) systems can easily and quickly make decisions about artifacts without having to communicate thousands of attestations, while the attestation-per-review (as discussed here) approach allows for automated auditing of repos and much more information about the individual reviews.

I think this is sane.

@adityasaky
Copy link
Member Author

I actually plan to use some time this coming break to review some of the popular systems / services to understand how much we can abstract. I should have a better answer then.

I currently think we can use VCS semantics to identify the snippets of code a review applies to. We could maybe define a boolean value that abstracts an approval / rejection in a review across systems like GitHub ("Approve") and Gerrit ("+2") with the actual CR system specific information also in the predicate, but this may be unnecessary considering the corresponding policies for an attestation will also be CR-specific. What do you think? I plan to start working on the specifics of the attestation format taking these as examples, I suspect it'll be easier to discuss them then.

@TomHennen
Copy link
Contributor

Would it make any sense to start with some definition of the policies we'd like to evaluate on these predicates?

@adityasaky
Copy link
Member Author

Makes sense, starting with #77 (comment)?

@TomHennen
Copy link
Contributor

That is a pretty exhaustive list of things that could go into an attestation. Do we have any thoughts about who the audience for the attestations would be? Would the goal be for it to cross organizational boundaries and still be understandable? E.g. would a list of participants that maps to DevOps, QA, etc... make sense outside an organization where people have different policies for code review?

I'll admit that my primary goal for some code review attestation is just to be able to tell if a given built artifact meets the SLSA source requirements. Beyond that it doesn't matter to me what the exact procedures were followed. It could be counter-productive in some sense if, from SLSA's perspective, we had to care about the difference between 'approved conditionally', 'approved', and 'reviewed'.

but I'm sure there are audiences beyond SLSA!

@adityasaky
Copy link
Member Author

audiences

That's a good question. I'd like to hear everyone's thoughts, but going back to @dn-scribe's comment above, I can see a couple of usecases where source control attestations may not suffice. For example, here we could verify that a person defined in a CODEOWNER file reviewed a particular PR (as of a particular commit), allowing more granularity than just the set of people who have the commit/merge bit set in a repository. This also ties in with the roles @dn-scribe mentioned above, such as dedicated QA teams and the like. I'm also curious about the scope for defining policies around conditional approval, because pre-merge there should be a full approval anyway? I'd also like to better understand some sample policies for review-specific artifacts like the documents and comments.

Going back to crev-like attestations, the use cases are probably more clearly defined. The subject would be a package + version, perhaps identified using pURL, and the policy is a straightforward "was this dependency version audited by a trusted reviewer".

@colek42
Copy link
Member

colek42 commented Jan 24, 2022

Relevant: https://paragonie.com/blog/2022/01/solving-open-source-supply-chain-security-for-php-ecosystem

We are looking to create a "code review" attestation for witness. The prior work by the PHP team looks like it could fit our customers' needs.

We plan on using some sort of user auth flow (OIDC) to sign the attestation collection we create.

witness sign commit --key-provider=fulcio --review="sec-audit" --comment="LGTM"

Witness supports custom types, which is our current way forward, but we would love to adopt a community standard.

vote-against: "This is malicious; don't install it."
reproduced: "We were able to reproduce this update from the source code." Requires open source software.
spot-check: "We quickly looked at the code and didn't identify anything obviously malicious, like an infinite loop or a cryptocurrency miner. It's probably not terrible."
code-review: "We thoroughly reviewed this software update, and we have some concept of how the changes fit into the larger project, and we think it's fine."
sec-audit: "We're security experts and took the time to deliberately audit this update, its dependencies, how its dependencies are used, etc."

@TomHennen
Copy link
Contributor

That makes sense to me. One thing that could be clarified is what scope each of those statements apply to.

E.g. does 'code-review' apply to just a single change while 'sec-audit' applies to the entire repo?

@joshuagl
Copy link
Contributor

That makes sense to me. One thing that could be clarified is what scope each of those statements apply to.

The scope in Gossamer is a versioned release (see their docs). It's all fairly loose at the moment, though.

@TomHennen
Copy link
Contributor

TomHennen commented Jan 26, 2022

Gotcha. So I think it would be pretty easy to create a predicate type that indicate the various items mentioned above. It should probably be clear about what the scope is. Something like...

{
  "_type": "https://in-toto.io/Statement/v0.1",
  // Assuming the release is a file and we can just hash it.
  "subject": [{"name": "_", "digest": {"sha256": "5678..."}}],
  "predicateType": "https://slsa.dev/review/versionedRelease/v0.1",
  "predicate": {
    "reviews": [
      "code-review", // Indicates all the code in the release was reviewed.
      "sec-audit", // indicates all the code in the release was security audited
    ]
    "attestor": { "id": "mailto:person@example.com" },
  }
}

@joshuagl
Copy link
Contributor

+1

Reasoning out loud: we would include the attestor in the object because we want that id to be explicit and in authenticated data, rather than relying on an undefined (outside of the scope of in-toto attestations and the recommended DSSE envelope) mechanism to map the unauthenticated keyid in the envelope to an attesting entity.

Should the attestor object support multiple `id's, such that a single company could sign an attestation for multiple reviewers?

@TomHennen
Copy link
Contributor

TomHennen commented Jan 26, 2022

we would include the attestor in the object because we want that id to be explicit and in authenticated data

Yes exactly. We take the same approach with builder.id in slsa.dev/provenance.
We have an internal map that says something like "keyid: [a, b, c] can sign things for builder.id = 'foo'". This lets us right fairly straightforward policies referencing builder.id instead of all the keys that could be used to sign an attestation.

Should the attestor object support multiple `id's, such that a single company could sign an attestation for multiple reviewers?

I think that would make things like the above description harder, so I'd advise against it?

@colek42
Copy link
Member

colek42 commented Jan 27, 2022

In witness we solve the problem of multiple attestations with an attestationCollection this lets us scope attestation code narrowly, but bind them together for purposes of trust.

For example, an attestationCollection may contain the following attestations for this use case ["git", "oidc", "review", "tpm"] - this would bind the trust from OIDC auth, the review data, the hardware identity of the machine used, and the commit status of the code being reviewed. I don't think we need to couple the identity data with the actual review as it limits the ability for users to use custom auth patterns. I am also a bit hesitant in coupling the actual commit hash to the review data. I would like to support cases in which git is not used, i.e. data sets, tarballs, artwork, etc.

@adityasaky
Copy link
Member Author

vote-against: "This is malicious; don't install it."
reproduced: "We were able to reproduce this update from the source code." Requires open source software.
spot-check: "We quickly looked at the code and didn't identify anything obviously malicious, like an infinite loop or a cryptocurrency miner. It's probably not terrible."
code-review: "We thoroughly reviewed this software update, and we have some concept of how the changes fit into the larger project, and we think it's fine."
sec-audit: "We're security experts and took the time to deliberately audit this update, its dependencies, how its dependencies are used, etc."

I took a quick look at how this maps to crev's fields, which are more granular, and tried to map them to Gossamer's. For spot-check, code-review, sec-audit, we should probably map it to one value for rating, and then think about whether other combinations of thoroughness + rating are important to consider as well. Also, in the crev mapping, code-review seems to be a subset of a sec-audit, while that's not necessarily the case with the Gossamer options.

Gossamer crev
vote-against rating = dangerous
reproduced I'm not certain this should be part of this attestation type
spot-check thoroughness = low, rating = neutral / positive / strong?
code-review thoroughness = medium, rating = neutral / positive / strong?
sec-audit thoroughness = high, rating = neutral / positive / strong?

Gotcha. So I think it would be pretty easy to create a predicate type that indicate the various items mentioned above. It should probably be clear about what the scope is. Something like...

Using subjects to scope the review sounds good to me, we can use ITE-4 semantics there for non-file artifacts as well. I'm less certain about using different predicate types for versioned releases vs reviews not necessarily attached to a versioned release. Should this only remain the purview of the subject field?

@MarkLodato
Copy link
Contributor

Note: I recommend against negative votes because it violates the monotonic principle: if someone deletes the attestation, it would cause a policy to go from DENY to ALLOW. Instead, it's better to consider only positive votes so that you don't have to worry about deletion as an attack vector.

As discussed today, I think the next step would be to create an ITE to describe the proposal in detail and then map that to specific use cases. In particular, it would be great to hear from specific customers (e.g. @TomHennen) on how they would use the attestation and if it would work for their use case.

@SantiagoTorres
Copy link
Member

Bump, I wonder if we'd like to bring in this into the discussion

@jnaulty
Copy link

jnaulty commented Feb 17, 2022

Hey @SantiagoTorres I was thinking of this issue too when I started reviewing the Sig v2 design: https://gist.github.com/lrvick/d4b87c600cc074dfcd00a01ee6275420?permalink_comment_id=4065601#gistcomment-4065601

I brought this gist up this morning at the OpenSSF Supply Chain Integrity meeting, I'm glad it might gain traction in this particular issue thread :)

@lrvick, the author of the gist and I are always down to chat on matrix.
I'm very interested in exploring the intersection of these two problems (e.g., the roots behind Sig v2 and human-reviews in in-toto).

Are any of you all active in #in-toto:libera.chat It would be great to have more 'real-time' chats for people who don't use Slack ;)

@adityasaky
Copy link
Member Author

@MarkLodato do you have any thoughts on how to avoid negative votes for dependency reviews?

@adityasaky
Copy link
Member Author

The Sig v2 spec looks great! I've looked at crev a bit but haven't come across https://github.com/git-wotr/spec/blob/master/design.org before. Taking a look...

@MarkLodato
Copy link
Contributor

Note: I recommend against negative votes because it violates the monotonic principle: if someone deletes the attestation, it would cause a policy to go from DENY to ALLOW. Instead, it's better to consider only positive votes so that you don't have to worry about deletion as an attack vector.

Let me add more nuance here. I consider two types of policies:

  • "Baseline" policies set the minimum standard for acceptance. If an artifact does not meet the baseline policy, it is not allowed at all. For these, the monotonic principle applies.
  • "Best effort" policies provide additional checks with lower guarantees. These can be non-monotonic.

Both types of policies can be continuously verified, but I think only baseline policies should be enforced, for reliability reasons.

Example for vulnerability scans:

  • Enforced "baseline" policy: require there to exist at least one attestation saying that the image was free of known vulnerabilities at some recent point in time. This gives us confidence that the image was deployed with some minimum standard for security.
  • Continuously verified "best effort" policy: alert if any known high severity vulnerabilities exist. This allows the team to take action to address the vulnerabilities, if needed.

@MarkLodato do you have any thoughts on how to avoid negative votes for dependency reviews?

To clarify, I caution against individual negative vote attestations because they cannot be used in monotonic policies. Some ideas on how to make it monotonic:

  • A single attestation showing the complete collection of votes (positive or negative) at some point in time (e.g. submission). Then a monotonic policy could require there to exist an attestation showing no negative votes, similar to the vulnerability scan example above.
  • One attestation per vote. Then a monotonic policy could require there to exist N positive vote attestations.

If desired, a best effort policy could still support detecting negative votes after the fact, but you want to at least support the "baseline" monotonic use case.

(I hope that made sense.)

@adityasaky
Copy link
Member Author

@MarkLodato thanks for the clarifications!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants