-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TUF schema files #246
base: master
Are you sure you want to change the base?
Add TUF schema files #246
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start, thanks, Fridolin, will really help the community!
I have two comments in general:
- Some of the assumptions (e.g., about version numbers and keyids) are incorrect with respect to the spec.
- The other assumptions are not incorrect, but they are specific to the Datadog Agent integrations, which will not be useful in general here.
Could we please make these changes and try again? Thanks!
Thanks for the review, Trishank. Comments raised were addressed. Please see also additional changes outside of the PR review comments (starting from d6731dc). A concern I have is the listing of hashes in Also, for completeness, should we include a schema for mirrors.json? |
Sorry, no need for mirrors, which is being replaced by TAP 4. A schema for the latter would be nice! |
The spec doesn't mandate using any particular hash algo, so I think having a list of algos, and requiring at least one of them makes the most sense. |
I'd say the original sigstore metadata was incorrect in not following the TUF spec. Best to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, Fridolin, I can't think of any more complaints! 🎉
Thanks for your patience and thorough review, Trishank 👍🏻 |
And yes! We only got expiraiton TUF compliant in later roots, so early roots should not be compliant. |
@patflynn would you be interested in reviewing this PR? 🙂 |
Hi there! Sorry I missed all the pings on this thread! I'm really grateful for @fridex's contribution here. As long as the current historical set of metadata resources parse correctly I think we're doing as well as we could expect! LGTM. For some context we discussed at Kubecon potentially moving sigstore TUF schema definition to proto at a future date, but I don't think that should block this. |
I think somewhere we should also document that editors should update these JSON schema files whenever they update the metadata formats in the spec. Ideally, there should be code in GitHub Workflows that checks for this, but documentation will do for now. |
@fridex could we please resolve conflicts in the meantime? |
Great work, @fridex! What do you think about defining subschemas for re-occuring parts? The TUF spec has many of those and JSON schema seems to support it. It would make review and maintenance a lot easier. |
I have done a few attempts at reviewing (and I can post some detail questions if you'd like them before I finish) but I feel like a "LGTM" without seeing results from a public validation test suite or real commitment from an open source project to use this data as part of a test suite, seems like a big ask from reviewers. Basically, is there a plan/commitment to use these schemas somewhere? If not, is there value in adding 600 lines of hard to validate schema -- I think it's unlikely to match the implementations without continuous testing. Would it make more sense to add a test suite (wherever it's planned) first, and then propose that the spec repo should maintain the schemas? Obviously, I'm not a spec maintainer so if they are ok with this then my reservations are moot... |
These are good reservations, and we have thought about them. We do have an internal app that uses and tests against these schemas. A public test suite would be nice, but demands a fairly representative dataset. @fridex WDYT? |
IMO it makes sense to not tie schemas hosted in the repo to schemas used in a test suite of an implementation. As @jku says, they might not even match. But wouldn't it be a good thing if there were reference schemas that made such a deviation apparent? I agree that they are prone to become outdated if we don't automatically check spec changes against the schemas, which is hard to do because the format definitions don't use a standard language. Btw. same is true for the example metadata in the spec. Not sure how to solve this problem, without changing the format definition language. Maybe we could at least run the schemas against the example metadata? |
Here's what we can do, but it seems like a fair amount of nontrivial work. We could do something like runnable docstrings: we could decorate the example metadata written in the spec in such a way that when you compile the spec into a readable format, tests would run the example metadata against the schemas. Do I make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading through the whole thing I'm more convinced that the schema needs to be tested against metadata from multiple actual implementations. If it's not, the schema will contain bugs: they are really hard for humans to spot in this format.
I don't mean that there needs to be an automated test suite and I don't mean the tests should pass (disagreements about correctness are quite likely)... but some reviewed results from using the schema to validate metadata seem necessary to have trust on the schema.
}, | ||
"sig": { | ||
"type": "string", | ||
"minLength": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would allow any length here:
- signatures with empty sigs is a pattern that's useful to accomodate merging signature changes during threshold signing (see sigstore root-signing for an example)
- a signature with length 0 is no more pointless than length 1 for example
"additionalProperties": true, | ||
"properties": { | ||
"keytype": { | ||
"enum": ["rsa", "ed25519", "ecdsa-sha2-nistp256"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean only these three values are allowed? That seems incorrect
The correct way to handle the spec requirements seems to be:
- allow any keys with keytype, scheme and keyval
- if keytype in ["rsa", "ed25519", "ecdsa-sha2-nistp256"], then check that keyval contains a "public"
is the described check useful for anyone? not sure, but this is what the spec says if I read it correctly.
}, | ||
"keyval": { | ||
"type": "object", | ||
"required": ["public"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this another comment: "public" is only required for the three defined keytypes, not for others
} | ||
}, | ||
"scheme": { | ||
"enum": ["rsassa-pss-sha256", "ed25519", "ecdsa-sha2-nistp256"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean only these three values are allowed?
"$schema": "http://json-schema.org/draft-04/schema#", | ||
"type": "object", | ||
"required": ["signatures", "signed"], | ||
"additionalProperties": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean no extra properties are allowed? I think this is incorrect not just here but (almost) everywhere in the spec: there are only two or three places where extra properties would make the content non-compliant.
}, | ||
"length": { | ||
"type": "integer", | ||
"minimum": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 should be allowed IMO, it's not the job of the spec to say repository can't contain empty files (and the spec doesn't do that, I believe)
The same check exists for targets meta: there it makes a little more sense (but not much): at least it's clear that 0 length metadata is not valid. But it's just as clear that anything below length of 100 or so is automatically invalid so why set the limit at 1?
"type": "object", | ||
"additionalProperties": true, | ||
"properties": { | ||
"targets.json": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect you could match all items in the object with patternproperties, not just the "targets.json" field ?
"minimum": 1 | ||
}, | ||
"hashes": { | ||
"type": "object" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is defined in targets schema but not here
"type": "boolean" | ||
} | ||
}, | ||
"required": ["name", "keyids", "threshold", "paths", "terminating"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paths is not required
You do, @trishankatdatadog! Here's how I did something similar for an in-toto demo, that is running a script in CI that regex matches snippets from a markdown doc and asserts for an expected output. |
So we have two suggestions here:
Maybe we should start with 1. and ticketize 2. |
I am all for including the schemas in this organisation as a starting point for implementers (the context in which this came up, iirc). I would like to see them updated to use subschemas for reoccurring parts first as @lukpueh suggested and it is important to see them validated against real data as @jku suggested. Given that we will not, at least to start, have a mechanism to test them over time I think it's worth putting them in a subdirectory which indicates they are reference material/a starting point such as Thanks for working on this @fridex! |
I think it's reasonable to start with this, and include |
Thanks all for review comments. I'll incorporate suggestions.
IMHO it is very reasonable to test JSON schema files as discussed. If python-tuf is the referenced implementation, it might be a good idea to have tests with JSON schema included there (and later let other implementations use it as well). I'm personally not a big fan of Git submodules, however if we could create an automation (a GitHub action maybe?) that would keep the submodule up-to-date, that would be great. Semantically, the specification repo could be the right place for schema files. WDYT? EDIT: Are signed commits required in the repo? |
Let's take one step at a time:
IMO 1 and 2 are blockers for this PR and 3 and 4 is future work.
Does not look like it :) |
Add TUF JSON schema files. These schema files were produced as part of Datadog agent integration testsuite (additionally adjusted to remove Datadog specific parts) and have not been reviewed yet - I'm new to TUF/in-toto, so please review any possible inconsistencies and sorry for them.
Closes: #242