Add TUF schema files #246

fridex · 2022-08-25T17:22:14Z

Add TUF JSON schema files. These schema files were produced as part of Datadog agent integration testsuite (additionally adjusted to remove Datadog specific parts) and have not been reviewed yet - I'm new to TUF/in-toto, so please review any possible inconsistencies and sorry for them.

Closes: #242

trishankatdatadog

This is a great start, thanks, Fridolin, will really help the community!

I have two comments in general:

Some of the assumptions (e.g., about version numbers and keyids) are incorrect with respect to the spec.
The other assumptions are not incorrect, but they are specific to the Datadog Agent integrations, which will not be useful in general here.

Could we please make these changes and try again? Thanks!

schemas/in-toto-metadata-signer-x.schema.json

schemas/in-toto-metadata-signer.schema.json

schemas/wheels-signer-x.schema.json

schemas/wheels-signer.schema.json

schemas/timestamp.schema.json

schemas/snapshot.schema.json

schemas/root.schema.json

schemas/targets.schema.json

fridex · 2022-09-01T13:00:33Z

Thanks for the review, Trishank. Comments raised were addressed.

Please see also additional changes outside of the PR review comments (starting from d6731dc).

A concern I have is the listing of hashes in .signed.meta.snapshot\.json in snapshots.json. The schema proposed requires both sha256 and sha512, but the example from the specification lists only sha256 (also for example in targets.json). According to the spec, hash type is not enforced to my understanding.

Also, for completeness, should we include a schema for mirrors.json?

trishankatdatadog · 2022-09-06T15:33:16Z

Also, for completeness, should we include a schema for mirrors.json?

Sorry, no need for mirrors, which is being replaced by TAP 4. A schema for the latter would be nice!

trishankatdatadog · 2022-09-06T16:17:19Z

A concern I have is the listing of hashes in .signed.meta.snapshot\.json in snapshots.json. The schema proposed requires both sha256 and sha512, but the example from the specification lists only sha256 (also for example in targets.json). According to the spec, hash type is not enforced to my understanding.

The spec doesn't mandate using any particular hash algo, so I think having a list of algos, and requiring at least one of them makes the most sense.

schemas/root.schema.json

schemas/targets.schema.json

trishankatdatadog · 2022-09-07T19:00:35Z

Thanks, @fridex! If not too much to ask, do you think we could do some sort of integration testing against the Datadog Agent Integrations and/or Sigstore TUF metadata?

Cc @asraa @patflynn

trishankatdatadog · 2022-09-14T00:49:43Z

Sigstore's current TUF repository is here: https://github.com/sigstore/root-signing/tree/main/repository/repository!

The schema validation passes except for date-time string in expires:
jsonschema.exceptions.ValidationError: '2021-12-18T13:28:12.99008-06:00' does not match '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$'

Failed validating 'pattern' in schema['properties']['signed']['properties']['expires']:
    {'pattern': '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$',
     'type': 'string'}

On instance['signed']['expires']:
    '2021-12-18T13:28:12.99008-06:00'
The spec says in the General principles section:

Metadata date-time follows the ISO 8601 standard. The expected format of the combined date and time string is "YYYY-MM-DDTHH:MM:SSZ". Time is always in UTC, and the "Z" time zone designator is attached to indicate a zero UTC offset. An example date-time string is "1985-10-21T01:21:00Z".

I've put back date-time format that covers also cases that are used in the root-signing repository in bcc2c17. If the date-time paragraph in the spec is applicable here, my guess is this needs a clarification (in TUF spec/sigstore/root-signing repo/TUF implementation).

I'd say the original sigstore metadata was incorrect in not following the TUF spec. Best to use the Z suffix specified in the spec. WDYT @asraa ?

trishankatdatadog

Well done, Fridolin, I can't think of any more complaints! 🎉

fridex · 2022-09-14T14:27:38Z

Well done, Fridolin, I can't think of any more complaints! 🎉

Thanks for your patience and thorough review, Trishank 👍🏻

asraa · 2022-09-15T14:49:05Z

I'd say the original sigstore metadata was incorrect in not following the TUF spec. Best to use the Z suffix specified in the spec. WDYT @asraa ?

And yes! We only got expiraiton TUF compliant in later roots, so early roots should not be compliant.

trishankatdatadog · 2022-10-24T20:56:39Z

@patflynn would you be interested in reviewing this PR? 🙂

patflynn · 2022-10-28T15:10:41Z

Hi there! Sorry I missed all the pings on this thread! I'm really grateful for @fridex's contribution here. As long as the current historical set of metadata resources parse correctly I think we're doing as well as we could expect! LGTM.

For some context we discussed at Kubecon potentially moving sigstore TUF schema definition to proto at a future date, but I don't think that should block this.

trishankatdatadog · 2022-10-28T15:45:32Z

Can we please get an amen from @lukpueh, @joshuagl, or @mnm678? 🙏🏽

trishankatdatadog · 2022-10-28T15:47:41Z

I think somewhere we should also document that editors should update these JSON schema files whenever they update the metadata formats in the spec. Ideally, there should be code in GitHub Workflows that checks for this, but documentation will do for now.

trishankatdatadog · 2022-11-01T16:43:42Z

@fridex could we please resolve conflicts in the meantime?

lukpueh · 2022-11-17T15:39:00Z

Great work, @fridex! What do you think about defining subschemas for re-occuring parts? The TUF spec has many of those and JSON schema seems to support it. It would make review and maintenance a lot easier.

jku · 2022-11-20T13:39:39Z

I have done a few attempts at reviewing (and I can post some detail questions if you'd like them before I finish) but I feel like a "LGTM" without seeing results from a public validation test suite or real commitment from an open source project to use this data as part of a test suite, seems like a big ask from reviewers.

Basically, is there a plan/commitment to use these schemas somewhere? If not, is there value in adding 600 lines of hard to validate schema -- I think it's unlikely to match the implementations without continuous testing.

Would it make more sense to add a test suite (wherever it's planned) first, and then propose that the spec repo should maintain the schemas?

Obviously, I'm not a spec maintainer so if they are ok with this then my reservations are moot...

trishankatdatadog · 2022-11-21T15:58:36Z

I have done a few attempts at reviewing (and I can post some detail questions if you'd like them before I finish) but I feel like a "LGTM" without seeing results from a public validation test suite or real commitment from an open source project to use this data as part of a test suite, seems like a big ask from reviewers.

Basically, is there a plan/commitment to use these schemas somewhere? If not, is there value in adding 600 lines of hard to validate schema -- I think it's unlikely to match the implementations without continuous testing.

Would it make more sense to add a test suite (wherever it's planned) first, and then propose that the spec repo should maintain the schemas?

Obviously, I'm not a spec maintainer so if they are ok with this then my reservations are moot...

These are good reservations, and we have thought about them. We do have an internal app that uses and tests against these schemas. A public test suite would be nice, but demands a fairly representative dataset. @fridex WDYT?

lukpueh · 2022-11-22T09:35:51Z

IMO it makes sense to not tie schemas hosted in the repo to schemas used in a test suite of an implementation. As @jku says, they might not even match. But wouldn't it be a good thing if there were reference schemas that made such a deviation apparent?

I agree that they are prone to become outdated if we don't automatically check spec changes against the schemas, which is hard to do because the format definitions don't use a standard language. Btw. same is true for the example metadata in the spec. Not sure how to solve this problem, without changing the format definition language. Maybe we could at least run the schemas against the example metadata?

trishankatdatadog · 2022-11-28T19:34:17Z

Maybe we could at least run the schemas against the example metadata?

Here's what we can do, but it seems like a fair amount of nontrivial work. We could do something like runnable docstrings: we could decorate the example metadata written in the spec in such a way that when you compile the spec into a readable format, tests would run the example metadata against the schemas. Do I make sense?

jku

After reading through the whole thing I'm more convinced that the schema needs to be tested against metadata from multiple actual implementations. If it's not, the schema will contain bugs: they are really hard for humans to spot in this format.

I don't mean that there needs to be an automated test suite and I don't mean the tests should pass (disagreements about correctness are quite likely)... but some reviewed results from using the schema to validate metadata seem necessary to have trust on the schema.

jku · 2022-11-18T14:43:47Z

schemas/root.schema.json

+          },
+          "sig": {
+            "type": "string",
+            "minLength": 1


I would allow any length here:

signatures with empty sigs is a pattern that's useful to accomodate merging signature changes during threshold signing (see sigstore root-signing for an example)

a signature with length 0 is no more pointless than length 1 for example

jku · 2022-11-18T14:46:31Z

schemas/root.schema.json

+              "additionalProperties": true,
+              "properties": {
+                "keytype": {
+                  "enum": ["rsa", "ed25519", "ecdsa-sha2-nistp256"]


does this mean only these three values are allowed? That seems incorrect

The correct way to handle the spec requirements seems to be:

allow any keys with keytype, scheme and keyval

if keytype in ["rsa", "ed25519", "ecdsa-sha2-nistp256"], then check that keyval contains a "public"

is the described check useful for anyone? not sure, but this is what the spec says if I read it correctly.

jku · 2022-11-18T14:47:26Z

schemas/root.schema.json

+                },
+                "keyval": {
+                  "type": "object",
+                  "required": ["public"],


I mentioned this another comment: "public" is only required for the three defined keytypes, not for others

jku · 2022-11-18T14:48:18Z

schemas/root.schema.json

+                  }
+                },
+                "scheme": {
+                  "enum": ["rsassa-pss-sha256", "ed25519", "ecdsa-sha2-nistp256"]


does this mean only these three values are allowed?

jku · 2022-11-18T14:50:52Z

schemas/root.schema.json

+  "$schema": "http://json-schema.org/draft-04/schema#",
+  "type": "object",
+  "required": ["signatures", "signed"],
+  "additionalProperties": false,


Does this mean no extra properties are allowed? I think this is incorrect not just here but (almost) everywhere in the spec: there are only two or three places where extra properties would make the content non-compliant.

jku · 2022-11-19T13:15:28Z

schemas/targets.schema.json

+                },
+                "length": {
+                  "type": "integer",
+                  "minimum": 1


0 should be allowed IMO, it's not the job of the spec to say repository can't contain empty files (and the spec doesn't do that, I believe)

The same check exists for targets meta: there it makes a little more sense (but not much): at least it's clear that 0 length metadata is not valid. But it's just as clear that anything below length of 100 or so is automatically invalid so why set the limit at 1?

jku · 2022-11-29T08:20:00Z

schemas/snapshot.schema.json

+          "type": "object",
+          "additionalProperties": true,
+          "properties": {
+            "targets.json": {


I would expect you could match all items in the object with patternproperties, not just the "targets.json" field ?

jku · 2022-11-29T08:21:44Z

schemas/snapshot.schema.json

+                  "minimum": 1
+                },
+                "hashes": {
+                  "type": "object"


This is defined in targets schema but not here

jku · 2022-11-29T08:25:22Z

schemas/targets.schema.json

+                    "type": "boolean"
+                  }
+                },
+                "required": ["name", "keyids", "threshold", "paths", "terminating"]


paths is not required

lukpueh · 2022-11-29T09:00:47Z

We could do something like runnable docstrings: we could decorate the example metadata written in the spec in such a way that when you compile the spec into a readable format, tests would run the example metadata against the schemas. Do I make sense?

You do, @trishankatdatadog! Here's how I did something similar for an in-toto demo, that is running a script in CI that regex matches snippets from a markdown doc and asserts for an expected output.

lukpueh · 2022-11-29T09:12:46Z

So we have two suggestions here:

Run the schema against real metadata from different implementations at least once and reference those static results e.g. in this PR, so that we can be more certain about the correctness of the schemas.
Setup a CI job to "run schemas against the spec", to avoid that schemas become outdated when the spec changes.
-> This is less trivial, because we can't actually just run schemas against the spec. We could run them against example metadata as described above, but then we'd still have to rely on spec changes being synced with either examples or schemas to detect things going out of sync.

Maybe we should start with 1. and ticketize 2.

joshuagl · 2022-11-29T12:23:54Z

I am all for including the schemas in this organisation as a starting point for implementers (the context in which this came up, iirc). I would like to see them updated to use subschemas for reoccurring parts first as @lukpueh suggested and it is important to see them validated against real data as @jku suggested.

Given that we will not, at least to start, have a mechanism to test them over time I think it's worth putting them in a subdirectory which indicates they are reference material/a starting point such as contrib. We could even consider including them in the reference implementation (python-tuf) and adding a test that the schema remain valid against metadata files generated by python-tuf?

Thanks for working on this @fridex!

trishankatdatadog · 2022-11-29T18:28:42Z

Given that we will not, at least to start, have a mechanism to test them over time I think it's worth putting them in a subdirectory which indicates they are reference material/a starting point such as contrib. We could even consider including them in the reference implementation (python-tuf) and adding a test that the schema remain valid against metadata files generated by python-tuf?

I think it's reasonable to start with this, and include specification as a submodule you could use to run schema tests against python-tuf. Then, we file issues to include more real-world tests, split schema into reusable subschemas, and so on. WDYT?

fridex · 2022-12-05T13:29:43Z

Thanks all for review comments. I'll incorporate suggestions.

Given that we will not, at least to start, have a mechanism to test them over time I think it's worth putting them in a subdirectory which indicates they are reference material/a starting point such as contrib. We could even consider including them in the reference implementation (python-tuf) and adding a test that the schema remain valid against metadata files generated by python-tuf?

I think it's reasonable to start with this, and include specification as a submodule you could use to run schema tests against python-tuf. Then, we file issues to include more real-world tests, split schema into reusable subschemas, and so on. WDYT?

IMHO it is very reasonable to test JSON schema files as discussed. If python-tuf is the referenced implementation, it might be a good idea to have tests with JSON schema included there (and later let other implementations use it as well). I'm personally not a big fan of Git submodules, however if we could create an automation (a GitHub action maybe?) that would keep the submodule up-to-date, that would be great. Semantically, the specification repo could be the right place for schema files. WDYT?

EDIT: Are signed commits required in the repo?

lukpueh · 2022-12-13T09:29:42Z

IMHO it is very reasonable to test JSON schema files as discussed. If python-tuf is the referenced implementation, it might be a good idea to have tests with JSON schema included there (and later let other implementations use it as well). I'm personally not a big fan of Git submodules, however if we could create an automation (a GitHub action maybe?) that would keep the submodule up-to-date, that would be great. Semantically, the specification repo could be the right place for schema files. WDYT?

Let's take one step at a time:

DRY schemas (subschemas)
Run schemas against an exemplary set of metadata (e.g. from spec example metadata, python-tuf, or go-tuf) for initial anecdotal validation
Auto-run schemas against spec metadata (see "regex snippets" suggestions above)
Auto-run schemas against implementation metadata (see submodule suggestions above)

IMO 1 and 2 are blockers for this PR and 3 and 4 is future work.

EDIT: Are signed commits required in the repo?

Does not look like it :)

Add TUF and in-toto schema files

dc9fe0b

trishankatdatadog requested changes Aug 25, 2022

View reviewed changes

Fridolin Pokorny added 18 commits September 1, 2022 13:11

Remove files specific to Datadog agent integrations

24dc032

Minimum allowed version in 1 in timestamps.schema.json

885ea58

Allow versions without patch release number in timestamp.schema.json

767c54a

Allow optional patch version in spec_version field

6b6d6fb

Require only targets.json in snapshot schema

850f864

Version numbers are always 1 or above

40247c4

Neither specific targets nor delegations are expected in targets.json

975c34a

Use explicit pattern for datetime format

4c22256

Add missing version to targets.schema.json

46281e4

Loosen requirement on keyids

fb56026

Let users specify signature scheme

3a253ff

Add optional parts of the METAFILES section in snapshot.json

84ad9f4

Field consistent_snapshot is optional in root.json

d6731dc

Add optional mirror role to root.json

2affedf

Additionally specify targets in targets.json schema

05fdf4b

Targets is a required property in targets.json

17f3efd

Specify schema of delegations in targets.json

e6b6832

Adjust targets section of targets.json

3a59b30

Fix schema for targets in targets.json

945bd6f

fridex requested a review from trishankatdatadog September 1, 2022 13:06

Fridolin Pokorny added 2 commits September 1, 2022 15:15

Minimum version number is 1

c6c6674

Provide schema for mirrors.json

7705ca9

trishankatdatadog reviewed Sep 6, 2022

View reviewed changes

schemas/root.schema.json Outdated Show resolved Hide resolved

trishankatdatadog reviewed Sep 6, 2022

View reviewed changes

schemas/targets.schema.json Show resolved Hide resolved

fridex marked this pull request as ready for review September 13, 2022 12:53

fridex requested a review from trishankatdatadog September 13, 2022 12:53

Bump date in TUF spec

a037d25

Fridolin Pokorny added 2 commits September 14, 2022 09:10

Revert to using pattern for expires

dba81f9

Bump version of tuf spec

ecc07c9

trishankatdatadog approved these changes Sep 14, 2022

View reviewed changes

trishankatdatadog requested review from joshuagl and mnm678 September 14, 2022 14:24

trishankatdatadog mentioned this pull request Nov 3, 2022

Fix spec generation action following GitHub change #261

Merged

jku reviewed Nov 29, 2022

View reviewed changes

fridex marked this pull request as draft May 2, 2023 17:16

Add TUF schema files #246

Are you sure you want to change the base?

Add TUF schema files #246

Conversation

fridex commented Aug 25, 2022 • edited Loading

trishankatdatadog left a comment

Choose a reason for hiding this comment

fridex commented Sep 1, 2022 • edited Loading

trishankatdatadog commented Sep 6, 2022

trishankatdatadog commented Sep 6, 2022

trishankatdatadog commented Sep 7, 2022

trishankatdatadog commented Sep 14, 2022

trishankatdatadog left a comment

Choose a reason for hiding this comment

fridex commented Sep 14, 2022

asraa commented Sep 15, 2022

trishankatdatadog commented Oct 24, 2022

patflynn commented Oct 28, 2022

trishankatdatadog commented Oct 28, 2022

trishankatdatadog commented Oct 28, 2022 • edited Loading

trishankatdatadog commented Nov 1, 2022

lukpueh commented Nov 17, 2022

jku commented Nov 20, 2022

trishankatdatadog commented Nov 21, 2022

lukpueh commented Nov 22, 2022

trishankatdatadog commented Nov 28, 2022

jku left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukpueh commented Nov 29, 2022

lukpueh commented Nov 29, 2022

joshuagl commented Nov 29, 2022

trishankatdatadog commented Nov 29, 2022 • edited Loading

fridex commented Dec 5, 2022 • edited Loading

lukpueh commented Dec 13, 2022

fridex commented Aug 25, 2022 •

edited

Loading

fridex commented Sep 1, 2022 •

edited

Loading

trishankatdatadog commented Oct 28, 2022 •

edited

Loading

trishankatdatadog commented Nov 29, 2022 •

edited

Loading

fridex commented Dec 5, 2022 •

edited

Loading