hash for each prefix of a file (for verifying input) #266

lrvideckis · 2024-09-16T17:33:08Z

related to #72 but have a hash for each prefix of the file. Then you can binary search over lines to find your typo

lrvideckis · 2024-09-16T17:34:09Z

Idea communicated to me by https://codeforces.com/profile/enwask https://codeforces.com/profile/Alg01

lrvideckis · 2024-09-19T08:38:08Z

I implemented this for my lib https://github.com/programming-team-code/programming_team_code/releases/download/ptc/ptc.pdf

lrvideckis · 2024-11-22T18:39:49Z

after some of the North American ICPC regionals, I received some feedback that these hashes were useful (for example by using head -n 5 a.cpp | tr -d ’[:space:]’ | md5sum | cut -c-6). I would like to add this style to KACTL, but it would take some work:

I have a script which adds in these comments:

for header in ../library/**/*.hpp; do
	echo "adding hash codes for $header"
	for i in $(seq "$(wc --lines <"$header")" -5 1); do
		hash=$(head --lines "$i" "$header" | sed '/^#include/d' | cpp -dD -P -fpreprocessed | ./../library/contest/hash.sh)
		line_length=$(sed --quiet "${i}p" "$header" | wc --chars)
		# PDF wraps at 68 chars, and hash comment takes 8 chars total
		padding_length=$((68 - 8 - line_length))
		padding_length=$((padding_length > 0 ? padding_length : 0))
		padding=$(printf '%*s' "$padding_length" '')
		sed --in-place "${i}s/$/$padding\/\/${hash}/" "$header"
	done
done

But there's one problem: the script passes prefixes of each file to the cpp command, which will fail if there's a multi-line comment which is truncated in the middle. So the easiest fix would be to convert all the docs to single-line comments (I'm willing to do this)

The other thing this script assumes is that the code is formatted such that each line is 8-characters less than the line-wrap-length in the PDF. As the script will append //cbe787 at the end of the line, and you don't want it wrapping to keep it looking nice

simonlindholm · 2024-11-23T16:57:41Z

In theory it would be enough with one character per line (with a ~1/16 chance that the typo is actually on the previous line from the first mismatching one, or ~1/256 that it's two lines up). That's requires a more complex hashing script, but maybe it's fine if you combine it with a single complete hash to cover the majority case of no mistakes? I don't know, there's definitely value in having the hashing operation be as easy to use as possible.

I feel like adding //cbe787 to each line in the pdf in the regular font looks too ugly, but maybe with a smaller font it could work. (Maybe to the left of the code with a separator in between, similar to how most editors show line numbers?) tex/preprocessor.py is a better place to add this than a wrapper shell script; Python should also make it simpler to add logic to handle other types of comments and improve performance by avoiding the need for multiple shell invocations for each line.

fishy15 · 2024-11-24T08:24:13Z

I've been using a version of this in my implementation of kactl for a while, check out the preprocessor in my repo for more information link. It's still a bit ugly but it was mostly intended to be a quick hack implementation to use until I have the time to fix it.

Additionally for the overflow issue, my plan was to make the preprocessor issue an error if any line length is too long and to manually edit the input files to fix this. There are some other places in kactl that overflow even without the addition of the comment, so erroring if there is any wraparound should help catch those issues as well. (Normally wraparound is not an issue, but we want the exact line counts to line up so that the hash-every-five-lines is consistent.)

simonlindholm · 2024-11-24T12:33:04Z

Here's my one-hash-char-per-line idea:

# Hashes a file, ignoring all whitespace and comments. Use for
# verifying that code was correctly typed.
cpp -dD -P -fpreprocessed | python3 -c 'import sys, hashlib
y,z = b"",[b"b"]*5
x = [y.join(l.split()) for l in sys.stdin.buffer if l.strip()]
print("".join(hashlib.md5(y:=y+l).hexdigest()[0]for l in x+z))'

It's not ideal -- too long and the output hash is 89 chars long for geometry/FastDelaunay.h.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hash for each prefix of a file (for verifying input) #266

hash for each prefix of a file (for verifying input) #266

lrvideckis commented Sep 16, 2024

lrvideckis commented Sep 16, 2024

lrvideckis commented Sep 19, 2024

lrvideckis commented Nov 22, 2024

simonlindholm commented Nov 23, 2024

fishy15 commented Nov 24, 2024 •

edited

Loading

simonlindholm commented Nov 24, 2024

hash for each prefix of a file (for verifying input) #266

hash for each prefix of a file (for verifying input) #266

Comments

lrvideckis commented Sep 16, 2024

lrvideckis commented Sep 16, 2024

lrvideckis commented Sep 19, 2024

lrvideckis commented Nov 22, 2024

simonlindholm commented Nov 23, 2024

fishy15 commented Nov 24, 2024 • edited Loading

simonlindholm commented Nov 24, 2024

fishy15 commented Nov 24, 2024 •

edited

Loading