Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does "Artifact tree singularity" mean? #37

Open
jsgf opened this issue Apr 5, 2022 · 4 comments
Open

What does "Artifact tree singularity" mean? #37

jsgf opened this issue Apr 5, 2022 · 4 comments

Comments

@jsgf
Copy link
Contributor

jsgf commented Apr 5, 2022

In glossary/artifact_tree/#artifact-tree-singularity it says:

Artifact tree singularity

An artifact should have precisely one artifact tree. All equivalent artifacts should have the same artifact tree.

I'm not sure how to parse this. My immediate interpretation is that every (derived?) artifact must have a unique graph ("tree") for creating it. But this doesn't make sense in practice - it's quite likely that you could get the same output from two different input sources if the difference between the sources is not material (eg a small comment change). It also suggests that there's precisely one way to way to create an empty file (or any other common bit pattern).

Perhaps this can be reconciled if the artifact's identity also incorporates the bom document it was derived from. But that's in conflict with "Artifact Equivalency":

Two artifacts are equivalent if and only if []byte(artifact1) == []byte(artifact2)

which seems to preclude considering the origins. (And later on it implies - but doesn't say explicitly - that the identifier is a function of the content alone.)

However this language is a bit ambiguous in that it doesn't clarify how the artifact and the artifact tree are related. It's possible to read "an artifact should have precisely one artifact tree" as meaning that a non-derived artifact can only be used as input to one specific build. (Though that's assuming "should" is actually "must" or "shall", or it raises the question of "what if it isn't?")

@edwarnicke
Copy link
Contributor

edwarnicke commented Apr 6, 2022

My immediate interpretation is that every (derived?) artifact must have a unique graph ("tree") for creating it.

Yes.

But this doesn't make sense in practice - it's quite likely that you could get the same output from two different input sources if the difference between the sources is not material (eg a small comment change).

If you make a small comment change, the artifact id of that source file will change, and the GitBOM document for every artifact that has the new source file as an input will change, which will change the GitBOM identifier embedded in the output artifact.

Perhaps this can be reconciled if the artifact's identity also incorporates the bom document it was derived from.

Exactly... which is what embedding the GitBOM identifier for the artifact into the artifact itself effectively does.

What you would like to avoid is a scenario where every producer of an artifact, even if run through exactly the same process with exactly the same input produces a distinct artifact tree (this is something that can and does happen with SBOMs). That's the crux of 'artifact tree singularilty'.

@jsgf
Copy link
Contributor Author

jsgf commented Apr 6, 2022

which will change the GitBOM identifier embedded in the output artifact

So that assumes (requires) that every file format will have a way of embeddeding the BOM info into the file itself?

It seems unfortunate that we'd end up with cases where the only difference between two artifacts is the GitBOM metadata itself - ie, they would be identical without the presence of the BOM info.

What you would like to avoid is a scenario where every producer of an artifact, even if run through exactly the same process with exactly the same input produces a distinct artifact tree (this is something that can and does happen with SBOMs). That's the crux of 'artifact tree singularilty'.

That's the other way around though - that's the question of preventing identical builds from having divergent graphs. What this is adding the constraint that different builds may not ever have convergent artifacts.

@edwarnicke
Copy link
Contributor

So that assumes (requires) that every file format will have a way of embeddeding the BOM info into the file itself?

@jsgf
Copy link
Contributor Author

jsgf commented Apr 6, 2022

Should the definition of Derived Artifact be updated to say that it must include BOM info?

Likewise should build tool mention the importance of embedding BOM info in their outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants