Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Version Control required for SLSA Level1 #127

Closed
tograla opened this issue Aug 9, 2021 · 26 comments · Fixed by #164
Closed

Make Version Control required for SLSA Level1 #127

tograla opened this issue Aug 9, 2021 · 26 comments · Fixed by #164
Assignees
Labels
slsa 1 Applies to a SLSA 1 requirement spec-change Modification to the spec (requirements, schema, etc.)

Comments

@tograla
Copy link

tograla commented Aug 9, 2021

I vote for making version control system a requirement for SLSA Level1.

In the Source requirements section, none of the items is needed for Level1, whereas we could pick and choose from:

  • using a source-control management system,
  • ensuring peer-review for pull requests (no direct pushes to the main/master branch),
  • keeping the record of who did what by collecting logs
  • ensuring MFA when logging to the SCM

Of course I'm not proposing all these should go to L1, but instead could be spread between L1-L4 in a meaningful way.

@dlorenc
Copy link

dlorenc commented Aug 9, 2021

+1 on requiring source control for L1.

@trishankatdatadog
Copy link
Member

+1

@joshuagl
Copy link
Member

joshuagl commented Aug 9, 2021

Agreed. While I recognise that there are projects which don't use revision control, they really should and it seems like a reasonable requirement for L1.

@bobcatfish
Copy link

A couple quick thoughts:

  • It does feel weird to be able to make any kind of security declaration about something which you can't track down the source of (i.e. if there is no version control required, depending on the language, you may never be able to see the source from which it was built)
  • Would it make sense to include some kind of "lower trust" slsa level, e.g. something like "SLSA level 0"? If the goal is to gradually lead people from an unsafe state to a safer one, maybe it does make sense to include a path from no version control to version control that doesn't leave folks in that state out entirely
  • I looked at the definitions of each level to get a sense of what "level 1" is supposed to mean and I guess that "Documentation of the build process" could be taken to mean: the build process is documented, but the source is ... not?

@TomHennen
Copy link
Contributor

TomHennen commented Aug 9, 2021

I'd like to understand the reason folks would like to require source control at L1. @bobcatfish has provided some thoughts, does anyone else care to expand? [edit] I'd be especially interested in knowing what specific source control properties people are hoping for? As @tograla mentioned there are a number of properties that could be spread out among the levels.

I think we may want to consider products like AWS Lamba and Cloud Run which both let you deploy directly from source on disk. These products could fairly easily meet SLSA L1 and have authenticated provenance even if they didn't fetch the source direct from source control themselves (the provenance could include an aws bucket path & hash). Is the suggestion that they not qualify for SLSA L1 or that they could qualify and instead it's just important that the user assert that the source they deployed came from source control1 (even if there's no way to verify that). If we think this deployment method shouldn't qualify at L1, what is the benefit vs the reduced visibility into how the artifacts were built?

Personally I'm rather wary of adding more requirements at L1 because it can make adoption harder and thus make tracking what improvements need to be made harder. One of the use cases I have for SLSA is to be able to look at all the artifacts that are in use, understand their supply chain security situation, and to use that information to figure out where investments need to be made. The easier it is to adopt SLSA L1, the more people will adopt it (hopefully), which via the provenance, can tell anyone who cares where the build came from (modulo the security properties of an un-authenticated provenance) so they can then see what extra work needs to be done to increase the SLSA level. So SLSA L1 is less about making any security statements, but rather about being the starting point for improving the security of your supply chain.

Contrast having L1 provenance (which lists builder, some materials, some build parameters, etc...) with having no provenance for an artifact. It seems like it would be harder to track down where improvements need to be made.

1. Their CI/CD system could fetch the source from source control, then execute the deployment command. The builder won't know this happened.

@TomHennen
Copy link
Contributor

Oh, I should also add that we do have some people that are working on meeting SLSA L1 and L2 (with the goal of getting to L3/L4 in the long term). In general adding requirements to lower levels can cause trouble for implementors who are trying to plan work. So I think it's also worth weighing the benefit of adding things now, with the cost to implementors, especially if it winds up burning good-will with the SLSA project. Maybe that's not a good way for me to think about it?

@dlorenc
Copy link

dlorenc commented Aug 9, 2021

I don't think it's onerous at all to require people to use source control for their code. I think you're getting at something different though, which is how the build system accesses the source code (directly from the SCM repo, vs. as a static export).

That's a separate concern IMO, and should live in the provenance or build system requirements. I don't think this requirement should have any implication on any build systems. If it's meant to be interpreted that way, then it's very unclear.

@MarkLodato MarkLodato added spec-change Modification to the spec (requirements, schema, etc.) slsa 1 Applies to a SLSA 1 requirement labels Aug 10, 2021
@MarkLodato
Copy link
Member

There are two underlying issues here:

  1. Clarify the goals / use-cases for each level #111 to @bobcatfish's third point.
  2. Better define "source" #129 to @dlorenc's point.

@TomHennen and I were using "source" to mean the build configuration, which in many cases does not live in source control. Azure DevOps Pipelines and Google Cloud Build are two examples where the common case is to configure via GUI rather than config-as-code (#115). Therefore, adding a requirement that the build configuration is version controlled would either (a) prevent these systems from reaching SLSA 1, which is undesirable; (b) force us to redefine version control to include these GUI-based systems, which may be OK; or (c) force these systems to reimplement themselves on top of version control, which indeed is onerous.

If we're talking about the "top-level source code", then the issue is the ability to automatically verify the requirement. An unstated SLSA principle is that all of these requirements are automatically verifiable; they're not just guidelines. Thus, if we add a requirement, we need a technical means to verify it.

  • In some cases, such as LUCI, the system that generates the provenance doesn't know where the sources came from. Plumbing that information through the system may be a fair amount of work.
  • In other cases, the actual sources look like regular dependencies. For example, all Debian packages are built from "source" packages (dsc files) rather than from version control. In these cases, how do we verify the requirement?

So I'm cautious about adding the requirement to SLSA 1 until we figure out how we'll handle these cases. What do you all think?

@TomHennen
Copy link
Contributor

I'm sorry this is so long...

Clarifications

As @MarkLodato pointed out that it might be worth clarifying what is actually meant by 'source'. (more discussion in #129)

There are at least [1] "build configuration", [2] "primary source code input", and [3] "all other dependencies. I'm not sure all systems have a way to differentiate between 2 and 3.

Currently the SLSA provenance only requires 1 at L1 ("The provenance identifies the source containing the top-level build script, via an immutable reference.") and #115 proposes removing that requirement at lower levels and instead just documenting the build command. There are currently no requirements to record 2 & 3 until level 4 ("The provenance includes all transitive dependencies listed in Dependencies Complete").

I suspect most folks in this thread are talking 2 "primary source code input" here, is that right? If nothing else it seems this is something that could be clarified in the source requirements section...

Options

1. 'Weak' source control requirement at L1

For this option we make source control required (maybe with fewer required features than L2+), but we explicitly do not require the builder to know about or attest to the source control system used. Projects would simply state "we use source control" and be able to meet this requirement. The provenance would not always be able to contain a reference to the source control system used.

This option would make it clear that we (SLSA) think source control is a good thing. It would not, however, allow users to trace a binary back to its source code (which I believe is something @bobcatfish was suggesting would be helpful in this comment).

2. 'Strong' source control requirement at L1

As in 1 we make source control required at L1, but we explicitly do require the builder to attest to the source control system used.

This option would make it clear that we think source control is a good thing and allow all SLSA artifacts at L1 to be traced back to source code with some degree of confidence. It would, however, push popular workflows (e.g. Lambda and Cloud Run 'from source' flows) out of SLSA and into SLSA L0 (which we'd meant to mean 'things with no provenance', though I can't seem to find this documented). Debian may even have this problem due to the fact that things are (IIUC) built from source packages rather than directly from a source repo.

3. No source control requirement at L1.

This is the current state. Users can get to L1 even if they don't use source control or if the builder doesn't fetch the source from the source control system. The builder may still identify the source used by including the path & hash of the artifact in the provenance under materials.

This option doesn't send as strong a message about source control, nor does it allow someone to trace all L1 artifacts back to the source repo used. It does allow popular workflows to be L1 and still provides some traceability in the provenance.

Open questions

  • Which type of source (see clarifications above) are we actually talking about?
  • Do we want users that are mechanically evaluating provenance to be able to determine if a given artifact meets the 'source' requirement? (Not doing so would violate the unstated principle Mark mentioned).
    • Would we need to differentiate between source and binary dependencies?
    • How could we handle cases like Debian binaries where the input to the build came from a 'Source Package' and not directly from source control? (E.g. how can you tell that yes the primary source is in source control).
  • What value does adding a source control requirement at L1 have given workflows that don’t do this just won’t have any SLSA level? Is requiring source control at higher levels not sufficient indication that SLSA thinks source control is important?

Other stuff

I think there's a separate question of governance (am I using this word right?) regarding how to determine when it's OK to add requirements to previously defined levels. There can certainly be advantages to adding reqs at lower levels, but it has costs as well. Teams that are working on building features, documentation, etc... around SLSA would need to adapt their plans. One of the advantages of 'levels' is that it gives people a shortcut to talk about the requirements, if these requirements are changed often (which isn't necessarily what we're doing here!) then the labels become much less valuable since their meaning can change. Note that we do say "Reminder: SLSA is in alpha. The definitions below are not yet finalized and subject to change, particularly SLSA 3-4." so there is still wiggle room to change things now.

Perhaps this should be discussed in another issue or at the bi-weekly SLSA meeting (this Wednesday!)?

@dlorenc
Copy link

dlorenc commented Aug 10, 2021

An unstated SLSA principle is that all of these requirements are automatically verifiable; they're not just guidelines.

I think this is worth stating somewhere up front, because many of these other guidelines are not automatically verifiable either. Retention history, superuser access, etc. all come to mind as much harder to verify.

@TomHennen
Copy link
Contributor

I think this is worth stating somewhere up front, because many of these other guidelines are not automatically verifiable either. Retention history, superuser access, etc. all come to mind as much harder to verify.

Yes, agreed. What would be the best way to do that? I could send a PR documenting how that could work as well as documenting open questions that need to be resolved...

@MarkLodato
Copy link
Member

I filed #130 about the automatic verification principle. Let's follow up there.

@tograla
Copy link
Author

tograla commented Aug 11, 2021

Thanks everyone for all your thoughts! This discussion has been quite informative and I do agree more clarification about the term 'source' is needed. My intention behind proposing SCM for L1 was to drive good software development practices (and to enable tracability of changes that SCM systems provide) and pave the way for automated build process.

Thinking about supply-chain security in a broader perspective with the Lambda/CloudRun services example in mind, I ask myself question about integrity of code in such a design. What guarantee do end-users have that the code uploaded to Lambda is actually the same code that sits in the coresponding repository declared as 'the source', if the code was first downloaded locally?

Definetely one of the desired properties of an end-to-end pipeline is to reduce (even eliminate) stages in the process where malicous or accidental tampering could occur, hence the process should be automated (chain-of-custody analogy). With that said, I wouldn't mind keeping pipeline designs that lack this capabiity at rather low SLSA Levels ('zero' or allow 1 (one) at best).

@xiaowen
Copy link

xiaowen commented Aug 12, 2021

RE whether it's onerous to require people to use source control: it depends on what your current process is.

I've heard of a lot of Google Cloud Functions (GCF) users that write some "glue" code in the GCF web UI and deploy by pressing the "deploy" button. Will that meet SLSA 1? What exactly is the definition of "source control" and does that GCF flow meet that? The GCF web UI doesn't support having a "change description/justification", but it does save each version of the source code. If we must force users to completely change how they work today to meet SLSA 1, then it could be onerous for those users.

This all comes down to what we want to achieve with each SLSA level. I could see a good story here by saying that GCF can help beginner users enter SLSA by having this basic flow meet SLSA 1, and they get some benefits right away like basic provenance info. Then as users get more advanced, they can switch to a different flow, use a compliant source control system, and get SLSA 2+.

@06kellyjac
Copy link
Contributor

There's no reason someone couldn't get started following SLSA 1 or SLSA 2 and in their scenario accept they're not using VCS in order to get started on their supply chain security journey.
And it's not too difficult to either develop in VCS then copy to the GCF UI or work on it in the GCF UI and copy into VCS.

Either way, as mentioned above SLSA is planned to be automatically verifiable and I can't see that being easy with just storing your code in the GCF UI. At the very least you'd have to put in extra work to pull the source out of GCF using the API yourself, at that point it'd be easier to just use modern IaC practices.

If we must force users to completely change how they work today to meet SLSA 1, then it could be onerous for those users.

Supply chain security is a difficult problem, I don't think lvl 1 should be so easy that everyone gets it for free.
As mentioned you can still have SLSA goals if you're going to skip a criterion here and there for the time being, the guidance is still solid.

I've heard of a lot of Google Cloud Functions (GCF) users that write some "glue" code in the GCF web UI and deploy by pressing the "deploy" button.

You could also argue that using the UI doesnt quite meet Build - Scripted build and I'm not sure how you'd get provenance info for Provenance - Available, which is both of the currently required criteria.
At this point I'd say having X users that are able to put code in the UI for cloud functions doesn't constitute much of a supply chain.

You need code -> ... -> production but it looks more like code -> production or even code -> GCF Black box -> production here. There are interesting questions as to how GCF itself could provide providence and other aspects of SLSA to give confidence that the code you submit makes it to production (the "GCF Black box" in the flow mentioned previously)


I'm not too familiar with GCF and the little work I've done with it has used terraform so LMK if I missed the mark anywhere here. Maybe if you have any parallels with AWS Lambda that'd help as I have more experience with that ecosystem.

@TomHennen
Copy link
Contributor

Either way, as mentioned above SLSA is planned to be automatically verifiable and I can't see that being easy with just storing your code in the GCF UI

When I think of the 'automatically verifiable' goal I'm thinking of how we can verify the requirements at each level. Clarifying that is the goal of #130.

If we mean verifying the requirements of the levels, then including a source requirement at Level 1 would mean that we'd need to figure out some way to verify the source code the builder built from was stored in a VCS. Under that interpretation how could a process verify that a given artifact was built from source stored in a VCS if the builder didn't fetch the code from VCS itself? There's also the question of what is meant by 'source' in this suggestion. Do we mean the 'build entrypoint' was stored in VCS? That could be pretty easily determined (but wouldn't allow some builders like Tekton to onboard today since they don't support config as code, see #115 for more).

If we don't mean build entrypoint but rather 'the primary source', lots of builds have binary dependencies that aren't stored in VCS but would eventually be covered by transitive SLSA (which is a problem that's been deferred for now). When inspecting provenance how could a verifier know that some blob downloaded from an HTTP endpoint in materials isn't source code and thus didn't need to be fetched via git/hg/...?

#129 hopes to get clarity on what we mean by 'source' (the bottom line is we probably need to be more specific).

You could also argue that using the UI doesnt quite meet Build - Scripted build

This would be an interesting discussion to have, would you care to start a new issue to talk about it?

I'm not sure how you'd get provenance info for Provenance - Available

The Provenance Available requirement is meant to indicate that whatever provenance is generated is accessible to the consumer of the artifact.

There are interesting questions as to how GCF itself could provide providence

My thought is that GCF (or whatever it's using under the hood) would be classified as a builder and need to meet the Build Requirements.

@MarkLodato
Copy link
Member

I'll be on vacation for the next four weeks and wanted to jot down some thoughts before leaving.

Before making this decision, we really need to resolve #111 and clarify the benefit of requiring version control at this level, more than just "it's a good idea." For example, the reason might be that identifying the source revision allows one to join with static analysis on the source or to perform age-based checks (e.g. no sources older than 1 month). We just need to spell this out and compare this to the increased cost of adoption.

Assuming we agree that it is desirable, one option is to require version control for "primary source" but not "build configs" at SLSA 1 & 2 (#129). This might be a decent middle ground:

  • Version control on the source is a common sense thing that almost everyone likely does anyway, so it likely is not onerous for the user. (Verification can be tricky; see below.)
  • Build configuration, on the other hand, is less likely to be version controlled, as mentioned above. Removing this requirement from SLSA 2 would solve SLSA 1 & 2 should not require config-as-code #115.

The question then becomes how to verify the source requirement. Here's one idea:

  • The provenance MUST include both the "build config" and "primary source" inputs.
    • It's OK for these two to be the same, as is the case with GitHub Actions.
    • This assumes the builder has some notion of primary source. In GCB, for example, this would be the commit that triggered the build. I believe this is true for most builders, but it would be an impediment for ones that don't.
  • If the build system knows the source location, it records it in the provenance and the verifier checks that the URI is a VCS, e.g. git+https.
  • Otherwise, the build system just records the hash of the source artifact, and a separate attestation maps this artifact to the original VCS. The verifier then chains the two together.
    • For example, consider Google Cloud Build (GCB) where the source code from the working directory is packaged up into a tarball and sent to the server.
      • The client generates a "source attestation"(*) with subject = hash of the tarball and predicate identifying the git commit and, if known, upstream URI. The client would only generate this for a "clean" work tree with no uncommitted changes.
      • The builder generates a provenance attestation with subject = output artifact and materials[?].digest = hash of the tarball.
      • The verifier requires both attestations, checking that the two hashes match.
    • This does have non-trivial cost:
      • At L2, we'd need some trusted way to verify the source attestation. Client could no longer be trusted.
      • We now have to propagate an additional source attestation, which is more plumbing work.
      • Verification is now much more complex since you have to chain attestations. We were hoping to defer this problem until later.

(*) I'm using "attestation" here liberally, since it wouldn't be signed at L1.

Another option that would work at L1 but not L2 would be to have the client just send the VCS metadata to the builder, who blindly records it in the provenance. For example, in the GCB case, the client could include the git commit in the tarball, and GCB outputs that instead of the hash of the tarball.

@trishankatdatadog
Copy link
Member

I guess the question is: how many software projects don't have source control, but use CI/CD?

@TomHennen
Copy link
Contributor

I think the question is "how many projects have CI/CD that isn't aware of source control being used". The AWS Lambda and Cloud Run cases mentioned in this comment are examples where the builder doesn't necessarily know if source control is used, so it can't be included in the provenance.

@trishankatdatadog
Copy link
Member

Right, not to mention that any CI/CD system could be used to pull source from anywhere outside of RCS.

@mlieberman85
Copy link
Member

I might be misunderstanding and maybe just semantics, but does the builder itself need to be aware where the code came from? the scenario described in the linked issue above is a fairly common one, generalized as something like:

  1. CI fetches code from source code control (or wherever) and puts on storage A
  2. CI triggers builder to compile, packages, build code from storage A and puts artifact(s) on storage B
  3. CI publishes artifact(s) from storage B to artifact repo.

So in this case I think the provenance is still traceable because you still know where you pulled the source from, with some caveats. If it's a push model where CI is supposed to just pick up whatever is pushed to some storage bucket or something, I would say provenance starts there, and I think it also highlights that's probably an anti-pattern.

@TomHennen
Copy link
Contributor

I think it depends on if the CI is happening in one integrated location and has visibility into past steps.

In the AWS Lambda/Cloud Run scenario the builder can be completely disconnected from the steps that fetch the source. The entrypoint is literally "here's a source tarball, please build/run it". This isn't necessarily bad. A very nice property that they have is that the build is taking place on a centrally managed service and not on some developers laptop. What's unfortunately is that they can't say what source repo that code came from.

I'm also not sure if that flow would match the Debian workflow where binaries packages are built from source packages, which at some point in the past were pulled from a source repo.

At higher levels (especially once we have a resolution to this issue [I have something in mind, just haven't had time to send a PR]) we'll actually be able to join a build provenance (which lists commit hash) which a source attestation, which will let us say what source requirements the source used met. This + chained verification (something else that needs to get worked out) would let us solve both of these use cases if necessary (provenance for the tarball/debian package could provided that traces the tarball back to the source repo).

Given that there are a number of ~common identified use cases where the builder doesn't necessarily fetch the source itself, and these other issues are unresolved (and require a lot more work, so probably higher SLSA level), my gut tells me that a source requirement at L1 would leave too many people unable to start adopting SLSA without making major changes to their infrastructure. That seems like it would be demotivating. So why not leave L1 as is, not require source code, and just make it clear that L1 is a starting point. Another, fairly easy, option, is explicitly add the recommended ○ for source control at L1.

What do folks think?

(Tagging the committee to make sure we have a breadth of opinions: @brunodom @david-a-wheeler @joshuagl @MarkLodato @mlieberman85 @trishankatdatadog @zakgreant)

@trishankatdatadog
Copy link
Member

Now that I think about it, where source code came from is not as important at all as who signed the source code. With the Datadog Agent integrations, our developers sign attestations about the hashes of source code, so where source code came from is totally irrelevant, so as long as what developers signed matches what went into the builder, which I think is what Tom means by chained verification.

If this is too strong a requirement for L1, which I can very well believe is the case, then requiring the builder to simply record the Merkle tree root of the source code or the hash of the source tarball or some such should be a good enough start. In any case, a source control management system is not required, and can be reserved for L2.

Does this help to clarify the matter?

@inferno-chromium
Copy link
Contributor

As per @trishankatdatadog - "In any case, a source control management system is not required, and can be reserved for L2.", looks like we are sticking with keeping things as-is (or add hash/merkle tree root of source code?) and keeping SCS requirement at L2.

Anyone else has thoughts to change this. It will be good to have concensus on this before v0.1 is cut this week.

@tograla
Copy link
Author

tograla commented Sep 15, 2021

Indeed, it seems so. Even though we seem to have gravitated to where we were in the first place, the discussion and take-aways have prompted other issues and useful wording clarification.

@joshuagl
Copy link
Member

Fascinating discussion. Keeping things as-is, with perhaps @TomHennen's suggestion to "explicitly add the recommended ○ for source control at L1", seems like the appropriate path forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
slsa 1 Applies to a SLSA 1 requirement spec-change Modification to the spec (requirements, schema, etc.)
Projects
None yet
Development

Successfully merging a pull request may close this issue.