Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TUF schema files #246

Draft
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

fridex
Copy link

@fridex fridex commented Aug 25, 2022

Add TUF JSON schema files. These schema files were produced as part of Datadog agent integration testsuite (additionally adjusted to remove Datadog specific parts) and have not been reviewed yet - I'm new to TUF/in-toto, so please review any possible inconsistencies and sorry for them.

Closes: #242

Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start, thanks, Fridolin, will really help the community!

I have two comments in general:

  1. Some of the assumptions (e.g., about version numbers and keyids) are incorrect with respect to the spec.
  2. The other assumptions are not incorrect, but they are specific to the Datadog Agent integrations, which will not be useful in general here.

Could we please make these changes and try again? Thanks!

schemas/in-toto-metadata-signer-x.schema.json Outdated Show resolved Hide resolved
schemas/in-toto-metadata-signer.schema.json Outdated Show resolved Hide resolved
schemas/wheels-signer-x.schema.json Outdated Show resolved Hide resolved
schemas/wheels-signer.schema.json Outdated Show resolved Hide resolved
schemas/timestamp.schema.json Show resolved Hide resolved
schemas/snapshot.schema.json Outdated Show resolved Hide resolved
schemas/snapshot.schema.json Outdated Show resolved Hide resolved
schemas/snapshot.schema.json Outdated Show resolved Hide resolved
schemas/root.schema.json Outdated Show resolved Hide resolved
schemas/targets.schema.json Outdated Show resolved Hide resolved
@fridex
Copy link
Author

fridex commented Sep 1, 2022

Thanks for the review, Trishank. Comments raised were addressed.

Please see also additional changes outside of the PR review comments (starting from d6731dc).

A concern I have is the listing of hashes in .signed.meta.snapshot\.json in snapshots.json. The schema proposed requires both sha256 and sha512, but the example from the specification lists only sha256 (also for example in targets.json). According to the spec, hash type is not enforced to my understanding.

Also, for completeness, should we include a schema for mirrors.json?

@trishankatdatadog
Copy link
Member

Also, for completeness, should we include a schema for mirrors.json?

Sorry, no need for mirrors, which is being replaced by TAP 4. A schema for the latter would be nice!

@trishankatdatadog
Copy link
Member

A concern I have is the listing of hashes in .signed.meta.snapshot\.json in snapshots.json. The schema proposed requires both sha256 and sha512, but the example from the specification lists only sha256 (also for example in targets.json). According to the spec, hash type is not enforced to my understanding.

The spec doesn't mandate using any particular hash algo, so I think having a list of algos, and requiring at least one of them makes the most sense.

@trishankatdatadog
Copy link
Member

Thanks, @fridex! If not too much to ask, do you think we could do some sort of integration testing against the Datadog Agent Integrations and/or Sigstore TUF metadata?

Cc @asraa @patflynn

@fridex fridex marked this pull request as ready for review September 13, 2022 12:53
@trishankatdatadog
Copy link
Member

Sigstore's current TUF repository is here: https://github.com/sigstore/root-signing/tree/main/repository/repository!

The schema validation passes except for date-time string in expires:

jsonschema.exceptions.ValidationError: '2021-12-18T13:28:12.99008-06:00' does not match '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$'

Failed validating 'pattern' in schema['properties']['signed']['properties']['expires']:
    {'pattern': '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$',
     'type': 'string'}

On instance['signed']['expires']:
    '2021-12-18T13:28:12.99008-06:00'

The spec says in the General principles section:

Metadata date-time follows the ISO 8601 standard. The expected format of the combined date and time string is "YYYY-MM-DDTHH:MM:SSZ". Time is always in UTC, and the "Z" time zone designator is attached to indicate a zero UTC offset. An example date-time string is "1985-10-21T01:21:00Z".

I've put back date-time format that covers also cases that are used in the root-signing repository in bcc2c17. If the date-time paragraph in the spec is applicable here, my guess is this needs a clarification (in TUF spec/sigstore/root-signing repo/TUF implementation).

I'd say the original sigstore metadata was incorrect in not following the TUF spec. Best to use the Z suffix specified in the spec. WDYT @asraa ?

Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, Fridolin, I can't think of any more complaints! 🎉

@fridex
Copy link
Author

fridex commented Sep 14, 2022

Well done, Fridolin, I can't think of any more complaints! 🎉

Thanks for your patience and thorough review, Trishank 👍🏻

@asraa
Copy link
Contributor

asraa commented Sep 15, 2022

I'd say the original sigstore metadata was incorrect in not following the TUF spec. Best to use the Z suffix specified in the spec. WDYT @asraa ?

And yes! We only got expiraiton TUF compliant in later roots, so early roots should not be compliant.

@trishankatdatadog
Copy link
Member

@patflynn would you be interested in reviewing this PR? 🙂

@patflynn
Copy link

Hi there! Sorry I missed all the pings on this thread! I'm really grateful for @fridex's contribution here. As long as the current historical set of metadata resources parse correctly I think we're doing as well as we could expect! LGTM.

For some context we discussed at Kubecon potentially moving sigstore TUF schema definition to proto at a future date, but I don't think that should block this.

@trishankatdatadog
Copy link
Member

Can we please get an amen from @lukpueh, @joshuagl, or @mnm678? 🙏🏽

@trishankatdatadog
Copy link
Member

trishankatdatadog commented Oct 28, 2022

I think somewhere we should also document that editors should update these JSON schema files whenever they update the metadata formats in the spec. Ideally, there should be code in GitHub Workflows that checks for this, but documentation will do for now.

@trishankatdatadog
Copy link
Member

@fridex could we please resolve conflicts in the meantime?

@lukpueh
Copy link
Member

lukpueh commented Nov 17, 2022

Great work, @fridex! What do you think about defining subschemas for re-occuring parts? The TUF spec has many of those and JSON schema seems to support it. It would make review and maintenance a lot easier.

@jku
Copy link
Member

jku commented Nov 20, 2022

I have done a few attempts at reviewing (and I can post some detail questions if you'd like them before I finish) but I feel like a "LGTM" without seeing results from a public validation test suite or real commitment from an open source project to use this data as part of a test suite, seems like a big ask from reviewers.

Basically, is there a plan/commitment to use these schemas somewhere? If not, is there value in adding 600 lines of hard to validate schema -- I think it's unlikely to match the implementations without continuous testing.

Would it make more sense to add a test suite (wherever it's planned) first, and then propose that the spec repo should maintain the schemas?

Obviously, I'm not a spec maintainer so if they are ok with this then my reservations are moot...

@trishankatdatadog
Copy link
Member

I have done a few attempts at reviewing (and I can post some detail questions if you'd like them before I finish) but I feel like a "LGTM" without seeing results from a public validation test suite or real commitment from an open source project to use this data as part of a test suite, seems like a big ask from reviewers.

Basically, is there a plan/commitment to use these schemas somewhere? If not, is there value in adding 600 lines of hard to validate schema -- I think it's unlikely to match the implementations without continuous testing.

Would it make more sense to add a test suite (wherever it's planned) first, and then propose that the spec repo should maintain the schemas?

Obviously, I'm not a spec maintainer so if they are ok with this then my reservations are moot...

These are good reservations, and we have thought about them. We do have an internal app that uses and tests against these schemas. A public test suite would be nice, but demands a fairly representative dataset. @fridex WDYT?

@lukpueh
Copy link
Member

lukpueh commented Nov 22, 2022

IMO it makes sense to not tie schemas hosted in the repo to schemas used in a test suite of an implementation. As @jku says, they might not even match. But wouldn't it be a good thing if there were reference schemas that made such a deviation apparent?

I agree that they are prone to become outdated if we don't automatically check spec changes against the schemas, which is hard to do because the format definitions don't use a standard language. Btw. same is true for the example metadata in the spec. Not sure how to solve this problem, without changing the format definition language. Maybe we could at least run the schemas against the example metadata?

@trishankatdatadog
Copy link
Member

Maybe we could at least run the schemas against the example metadata?

Here's what we can do, but it seems like a fair amount of nontrivial work. We could do something like runnable docstrings: we could decorate the example metadata written in the spec in such a way that when you compile the spec into a readable format, tests would run the example metadata against the schemas. Do I make sense?

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading through the whole thing I'm more convinced that the schema needs to be tested against metadata from multiple actual implementations. If it's not, the schema will contain bugs: they are really hard for humans to spot in this format.

I don't mean that there needs to be an automated test suite and I don't mean the tests should pass (disagreements about correctness are quite likely)... but some reviewed results from using the schema to validate metadata seem necessary to have trust on the schema.

},
"sig": {
"type": "string",
"minLength": 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would allow any length here:

  • signatures with empty sigs is a pattern that's useful to accomodate merging signature changes during threshold signing (see sigstore root-signing for an example)
  • a signature with length 0 is no more pointless than length 1 for example

"additionalProperties": true,
"properties": {
"keytype": {
"enum": ["rsa", "ed25519", "ecdsa-sha2-nistp256"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean only these three values are allowed? That seems incorrect

The correct way to handle the spec requirements seems to be:

  • allow any keys with keytype, scheme and keyval
  • if keytype in ["rsa", "ed25519", "ecdsa-sha2-nistp256"], then check that keyval contains a "public"

is the described check useful for anyone? not sure, but this is what the spec says if I read it correctly.

},
"keyval": {
"type": "object",
"required": ["public"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this another comment: "public" is only required for the three defined keytypes, not for others

}
},
"scheme": {
"enum": ["rsassa-pss-sha256", "ed25519", "ecdsa-sha2-nistp256"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean only these three values are allowed?

"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"required": ["signatures", "signed"],
"additionalProperties": false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean no extra properties are allowed? I think this is incorrect not just here but (almost) everywhere in the spec: there are only two or three places where extra properties would make the content non-compliant.

},
"length": {
"type": "integer",
"minimum": 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 should be allowed IMO, it's not the job of the spec to say repository can't contain empty files (and the spec doesn't do that, I believe)

The same check exists for targets meta: there it makes a little more sense (but not much): at least it's clear that 0 length metadata is not valid. But it's just as clear that anything below length of 100 or so is automatically invalid so why set the limit at 1?

"type": "object",
"additionalProperties": true,
"properties": {
"targets.json": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect you could match all items in the object with patternproperties, not just the "targets.json" field ?

"minimum": 1
},
"hashes": {
"type": "object"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined in targets schema but not here

"type": "boolean"
}
},
"required": ["name", "keyids", "threshold", "paths", "terminating"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paths is not required

@lukpueh
Copy link
Member

lukpueh commented Nov 29, 2022

We could do something like runnable docstrings: we could decorate the example metadata written in the spec in such a way that when you compile the spec into a readable format, tests would run the example metadata against the schemas. Do I make sense?

You do, @trishankatdatadog! Here's how I did something similar for an in-toto demo, that is running a script in CI that regex matches snippets from a markdown doc and asserts for an expected output.

@lukpueh
Copy link
Member

lukpueh commented Nov 29, 2022

So we have two suggestions here:

  1. Run the schema against real metadata from different implementations at least once and reference those static results e.g. in this PR, so that we can be more certain about the correctness of the schemas.
  2. Setup a CI job to "run schemas against the spec", to avoid that schemas become outdated when the spec changes.
    -> This is less trivial, because we can't actually just run schemas against the spec. We could run them against example metadata as described above, but then we'd still have to rely on spec changes being synced with either examples or schemas to detect things going out of sync.

Maybe we should start with 1. and ticketize 2.

@joshuagl
Copy link
Member

I am all for including the schemas in this organisation as a starting point for implementers (the context in which this came up, iirc). I would like to see them updated to use subschemas for reoccurring parts first as @lukpueh suggested and it is important to see them validated against real data as @jku suggested.

Given that we will not, at least to start, have a mechanism to test them over time I think it's worth putting them in a subdirectory which indicates they are reference material/a starting point such as contrib. We could even consider including them in the reference implementation (python-tuf) and adding a test that the schema remain valid against metadata files generated by python-tuf?

Thanks for working on this @fridex!

@trishankatdatadog
Copy link
Member

trishankatdatadog commented Nov 29, 2022

Given that we will not, at least to start, have a mechanism to test them over time I think it's worth putting them in a subdirectory which indicates they are reference material/a starting point such as contrib. We could even consider including them in the reference implementation (python-tuf) and adding a test that the schema remain valid against metadata files generated by python-tuf?

I think it's reasonable to start with this, and include specification as a submodule you could use to run schema tests against python-tuf. Then, we file issues to include more real-world tests, split schema into reusable subschemas, and so on. WDYT?

@fridex
Copy link
Author

fridex commented Dec 5, 2022

Thanks all for review comments. I'll incorporate suggestions.

Given that we will not, at least to start, have a mechanism to test them over time I think it's worth putting them in a subdirectory which indicates they are reference material/a starting point such as contrib. We could even consider including them in the reference implementation (python-tuf) and adding a test that the schema remain valid against metadata files generated by python-tuf?

I think it's reasonable to start with this, and include specification as a submodule you could use to run schema tests against python-tuf. Then, we file issues to include more real-world tests, split schema into reusable subschemas, and so on. WDYT?

IMHO it is very reasonable to test JSON schema files as discussed. If python-tuf is the referenced implementation, it might be a good idea to have tests with JSON schema included there (and later let other implementations use it as well). I'm personally not a big fan of Git submodules, however if we could create an automation (a GitHub action maybe?) that would keep the submodule up-to-date, that would be great. Semantically, the specification repo could be the right place for schema files. WDYT?

EDIT: Are signed commits required in the repo?

@lukpueh
Copy link
Member

lukpueh commented Dec 13, 2022

IMHO it is very reasonable to test JSON schema files as discussed. If python-tuf is the referenced implementation, it might be a good idea to have tests with JSON schema included there (and later let other implementations use it as well). I'm personally not a big fan of Git submodules, however if we could create an automation (a GitHub action maybe?) that would keep the submodule up-to-date, that would be great. Semantically, the specification repo could be the right place for schema files. WDYT?

Let's take one step at a time:

  1. DRY schemas (subschemas)
  2. Run schemas against an exemplary set of metadata (e.g. from spec example metadata, python-tuf, or go-tuf) for initial anecdotal validation
  3. Auto-run schemas against spec metadata (see "regex snippets" suggestions above)
  4. Auto-run schemas against implementation metadata (see submodule suggestions above)

IMO 1 and 2 are blockers for this PR and 3 and 4 is future work.

EDIT: Are signed commits required in the repo?

Does not look like it :)

@fridex fridex marked this pull request as draft May 2, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

It would be nice to have an OpenAPI or JSON Schema for TUF resource types
7 participants