Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with artifacts tree hash #71

Open
maleadt opened this issue Sep 22, 2022 · 5 comments
Open

Issue with artifacts tree hash #71

maleadt opened this issue Sep 22, 2022 · 5 comments

Comments

@maleadt
Copy link

maleadt commented Sep 22, 2022

I think there's something wrong with some of the artifacts of this package:

Both are a problem because the artifact store is supposed to be content-addressable, so the tree hash should match.

@dpo
Copy link
Member

dpo commented Sep 25, 2022

The hash was generated by calling artifact_hash though: https://github.com/JuliaSmoothOptimizers/BundleAdjustmentModels.jl/blob/main/utils/gen_artifacts_toml.jl#L34 ?!

The string "problem-49-7776-pre.txt.bz2.sha256" doesn't appear in Artifacts.toml. so I'm not understanding that comment either.

@maleadt
Copy link
Author

maleadt commented Sep 26, 2022

The string "problem-49-7776-pre.txt.bz2.sha256" doesn't appear in Artifacts.toml. so I'm not understanding that comment either.

The file was somehow created after the test suite ran, which "polluted" the artifact directory.

@AntoninKns
Copy link
Member

AntoninKns commented Sep 26, 2022

I am not entirely sure to understand the problem you are facing but I will try to explain how I used the Julia Artifact technology in my code.

I originally wanted to use the code from PlatformEngines.jl but if I remember well I couldn't because at that time it didn't accept ".bz2" files which are directly given by the website from where I download the archives.

To be more precise it only exported the download_verify_unpack function and I wanted to directly access the download_verify function.

I created a workaround function here which itself calls this problematic function.

As you pointed it out above, the tree hash doesn't match what is expected. This problem is pointed out both in this function I overwrited and in the original function.

In the current version of my code I am downloading the artifact in a temporary file, then I do a verfication on the content of the file I am downloading and after that i'm forcing it to be moved to the directory corresponding to the hash git-tree-sha1 as given below.

["ladybug/problem-49-7776-pre.txt.bz2"]
git-tree-sha1 = "dd2da5f94014b5f9086a2b38a87f8c1bc171b9c2"
lazy = true

    [["ladybug/problem-49-7776-pre.txt.bz2".download]]
    sha256 = "1ccb15701a92a8ad909d30860a0108cd3f2d7916c1ecf2851e59a6198b9de6b0"
    url = "https://grail.cs.washington.edu/projects/bal/data/ladybug/problem-49-7776-pre.txt.bz2"

From my understing of the Julia Artifacts, this issue is not really dangerous since the tree hashs we use are generated here so it shouldn't be possible for them to overwrite other existing artifacts.

That being said I didn't think of it much until now but it can and should be fixed for Unix OS and Mac OS and it might lead to other issues I didn't think of.

For windows as it is pointed it in the most recent version of the code here we still need to fix some other errors first.

@maleadt
Copy link
Author

maleadt commented Sep 27, 2022

Let me illustrate the problem instead:

pkg> add BundleAdjustmentModels
pkg> test BundleAdjustmentModels
shell> ls ~/.julia/artifacts/dd2da5f94014b5f9086a2b38a87f8c1bc171b9c2
problem-49-7776-pre.txt.bz2  problem-49-7776-pre.txt.bz2.sha256

Problem 1: there should not be a .sha256 file in the artifacts directory. This indicates that this package somehow mutates the ~/.julia/artifacts directory, which is not OK, because it breaks the ability to verify the tree hash:

julia> Base.SHA1(Pkg.GitTools.tree_hash("/home/pkgeval/.julia/artifacts/dd2da5f94014b5f9086a2b38a87f8c1bc171b9c2"))
SHA1("fdceeb22d110bfeec92a2dd01ff7e81791bccc8c")

... which is different from the dd2da5f94014b5f9086a2b38a87f8c1bc171b9c2 hash that's listed in the Artifacts.toml.

Problem 2: even after removing this extraneous file, the hash does not match:

shell> rm /home/pkgeval/.julia/artifacts/dd2da5f94014b5f9086a2b38a87f8c1bc171b9c2/problem-49-7776-pre.txt.bz2.sha256
rm: remove write-protected regular file '/home/pkgeval/.julia/artifacts/dd2da5f94014b5f9086a2b38a87f8c1bc171b9c2/problem-49-7776-pre.txt.bz2.sha256'? y

julia> Base.SHA1(Pkg.GitTools.tree_hash("/home/pkgeval/.julia/artifacts/dd2da5f94014b5f9086a2b38a87f8c1bc171b9c2"))
SHA1("95a7106e3af85ad6e027e7e581eb5e7554c328ad")

Compare all this to another artifact I have around:

julia> Base.SHA1(Pkg.GitTools.tree_hash("/home/pkgeval/.julia/artifacts/7661e5a9aa217ce3c468389d834a4fb43b0911e8"))
SHA1("7661e5a9aa217ce3c468389d834a4fb43b0911e8")

i.e. the directory name (which is the tree hash) should match the contents.

@AntoninKns
Copy link
Member

Thank you for this further explanation.

Problem 1: Concerning the .sha256 file, it is created by the verify function from julia that verify if the archive downloaded is safe. It is only created in order to have faster further verifications so I can simply remove it.

Problem 2: This problem is more complex. What I clumsily tried to explain in my previous answer is that I removed every check that the tree hash truly corresponds to the content of the folder. The reason behind that is that since for some OS versions this verification does not work I bypassed it for all.

Since I already do a verification on the content of the archive when I download it and I generate a tree hash that should not overlap with other artifacts I didn't think it was much of a problem to have a tree hash that is different from the content of the directory until now.

Thank you for pointing it out and I am sorry if this raises problems in your code, I will try to solve it, I guess I made a mistake while generating these tree hashs.

If you want to solve this problem in other packages, my guess is that the problem comes from this download_artifact function. Since it does not verify tree hash for Windows and for some versions of Unix and Mac it might be the root of the problems in other packages.

If you have further questions or remarks I would be glad to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants