Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False validation error: invalid checksum #468

Open
diamondap opened this issue Jun 28, 2021 · 6 comments
Open

False validation error: invalid checksum #468

diamondap opened this issue Jun 28, 2021 · 6 comments

Comments

@diamondap
Copy link
Member

From PTSEM:

I recently uploaded a few dozen objects to our production repo using DART at the command line. Most completed successfully, but for one of them, DART returned an error message:

error: validate/completed - Operation completed with errors. Bad md5 digest for 'data/0056.mets.xml': manifest says 'c6daf78cdbb8129c59b0c672', file digest is 'f89314b6c6daf78cdbb8129c59b0c672'.

The odd thing about this is that the manifest doesn't actually say "c6da..." at all. It has "f893...":

f89314b6c6daf78cdbb8129c59b0c672 data/0056.mets.xml

I expanded the tar file and ran md5 on that file and got this:

MD5 (.dart/bags/ptsem.edu.theocom.0056/data/0056.mets.xml) = f89314b6c6daf78cdbb8129c59b0c672

I can't tell where "c6da..." is coming from. Any ideas?

@diamondap
Copy link
Member Author

diamondap commented Jun 28, 2021

Looks like it's reading the manifest incorrectly, skipping over the first few bytes. I wonder if this is a bug in the tar stream library.

The actual file md5 and the manifest md5 match. Both are "f89314b6c6daf78cdbb8129c59b0c672".

However, the error message reports the manifest md5 as "c6daf78cdbb8129c59b0c672", which omits the first 8 bytes.

f89314b6c6daf78cdbb8129c59b0c672
--------c6daf78cdbb8129c59b0c672

The bag in question has the following payload, amounting to 946 MB:

f89314b6c6daf78cdbb8129c59b0c672 data/0056.mets.xml
655211972a23f666b8533bbadab3c311 data/0056.mods.xml
3612e74091b8566cde10793744577a24 data/0056_archival_master_a.wav
e23d1aa6b4a1a1ede702fb7d6af1daec data/0056_access.mp3
393c16e6ed6bad4b92ed90ef8eb8bf3c data/0056.xml
dfcd367d4c3ab5c77937322fc9be7d0b data/0056_archival_master_AssetFront.JPG

The md5 manifest is added at the end of the bagging process, which means it's preceed by the files in this list in the tar archive. Is there something off in the tar headers to make them start reading the md5 manifest at byte 8 instead of byte zero?

@diamondap
Copy link
Member Author

From Greg at PTSEM:

I tried using DART on a different computer to package and upload the same files. That worked. My laptop has version 2.0.11.1795 whereas my desktop, where the failure occurred, has 2.0.11.1925. That difference may be irrelevant -- just letting you know.

@diamondap
Copy link
Member Author

diamondap commented Jun 29, 2021

From Greg:

I've uploaded dozens of objects successfully using DART at the command line, but I encountered a second error message -- which is similar but not identical to the one we corresponded about yesterday:

error: validate/completed - Operation completed with errors. Payload file data/01103.mets.xml not found in manifest-md5.txt

Actually that file is listed on the first line of manifest-md5.txt. So whereas yesterday the problem seemed to be skipping the first several bytes of manifest-md5.txt, today it seems to be skipping the first line. I've attached the manifest-md5.txt file and the log lines pertaining to this object.

If you'd like me to try anything in particular, let me know. Otherwise I'll try it with DART on my laptop computer, as I did yesterday successfully.

manifest-md5.txt

ff9df139371d90c0c28b73a6eda6f78d data/01103.mets.xml
7815e889302b28a354bde2b28b3f4be3 data/01103.mods.xml
6d89ec8b70cec1ed3598de8129d07f5d data/01103_archival_master.wav
5292e30e9aab1064e9eb2d951ab66cc3 data/01103_access.mp3
0ff840182c7cc384fe69e141db788d8d data/01103.xml
e2108bc4f8d5860807eec7d20bd25f15 data/01103_BoxFront.JPG
786433938aa46d775b47177ad8b9d21d data/01103_ReelFront.JPG
dd7dbb7763339f36e6fa3ab03df8bbd6 data/01103.full-text.xml

@diamondap
Copy link
Member Author

Specs on the two DART versions. Note that the validation fails on the desktop machine with the newer version of Node, but it works on the laptop with the older version.

Desktop - (fails)
macOS 10.15.7
DART 2.0.11 with Node.js v12.18.3 for darwin-x64-19.6.0.

Laptop - (succeeds)
macOS 10.15.7
DART 2.0.11 with Node.js v12.13.0 for darwin-x64-19.6.0.

@diamondap
Copy link
Member Author

This error occurs in two versions of DART on two different Macs. DART v2.0.11.1795 with Node.js 12.13.0 and DART v2.0.11.1925 with Node.js 12.18.3.

@diamondap
Copy link
Member Author

The bags in which these errors occur do not contain files over 8GB, so this issue is not related to the tar-stream library's occasional corruption of tarballs containing files >8GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant