Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash for each prefix of a file (for verifying input) #266

Open
lrvideckis opened this issue Sep 16, 2024 · 6 comments
Open

hash for each prefix of a file (for verifying input) #266

lrvideckis opened this issue Sep 16, 2024 · 6 comments

Comments

@lrvideckis
Copy link
Contributor

related to #72 but have a hash for each prefix of the file. Then you can binary search over lines to find your typo

@lrvideckis
Copy link
Contributor Author

@lrvideckis
Copy link
Contributor Author

@lrvideckis
Copy link
Contributor Author

after some of the North American ICPC regionals, I received some feedback that these hashes were useful (for example by using head -n 5 a.cpp | tr -d ’[:space:]’ | md5sum | cut -c-6). I would like to add this style to KACTL, but it would take some work:

I have a script which adds in these comments:

for header in ../library/**/*.hpp; do
	echo "adding hash codes for $header"
	for i in $(seq "$(wc --lines <"$header")" -5 1); do
		hash=$(head --lines "$i" "$header" | sed '/^#include/d' | cpp -dD -P -fpreprocessed | ./../library/contest/hash.sh)
		line_length=$(sed --quiet "${i}p" "$header" | wc --chars)
		# PDF wraps at 68 chars, and hash comment takes 8 chars total
		padding_length=$((68 - 8 - line_length))
		padding_length=$((padding_length > 0 ? padding_length : 0))
		padding=$(printf '%*s' "$padding_length" '')
		sed --in-place "${i}s/$/$padding\/\/${hash}/" "$header"
	done
done

But there's one problem: the script passes prefixes of each file to the cpp command, which will fail if there's a multi-line comment which is truncated in the middle. So the easiest fix would be to convert all the docs to single-line comments (I'm willing to do this)

The other thing this script assumes is that the code is formatted such that each line is 8-characters less than the line-wrap-length in the PDF. As the script will append //cbe787 at the end of the line, and you don't want it wrapping to keep it looking nice

@simonlindholm
Copy link
Member

In theory it would be enough with one character per line (with a ~1/16 chance that the typo is actually on the previous line from the first mismatching one, or ~1/256 that it's two lines up). That's requires a more complex hashing script, but maybe it's fine if you combine it with a single complete hash to cover the majority case of no mistakes? I don't know, there's definitely value in having the hashing operation be as easy to use as possible.

I feel like adding //cbe787 to each line in the pdf in the regular font looks too ugly, but maybe with a smaller font it could work. (Maybe to the left of the code with a separator in between, similar to how most editors show line numbers?) tex/preprocessor.py is a better place to add this than a wrapper shell script; Python should also make it simpler to add logic to handle other types of comments and improve performance by avoiding the need for multiple shell invocations for each line.

@fishy15
Copy link

fishy15 commented Nov 24, 2024

I've been using a version of this in my implementation of kactl for a while, check out the preprocessor in my repo for more information link. It's still a bit ugly but it was mostly intended to be a quick hack implementation to use until I have the time to fix it.

Additionally for the overflow issue, my plan was to make the preprocessor issue an error if any line length is too long and to manually edit the input files to fix this. There are some other places in kactl that overflow even without the addition of the comment, so erroring if there is any wraparound should help catch those issues as well. (Normally wraparound is not an issue, but we want the exact line counts to line up so that the hash-every-five-lines is consistent.)

@simonlindholm
Copy link
Member

Here's my one-hash-char-per-line idea:

# Hashes a file, ignoring all whitespace and comments. Use for
# verifying that code was correctly typed.
cpp -dD -P -fpreprocessed | python3 -c 'import sys, hashlib
y,z = b"",[b"b"]*5
x = [y.join(l.split()) for l in sys.stdin.buffer if l.strip()]
print("".join(hashlib.md5(y:=y+l).hexdigest()[0]for l in x+z))'

It's not ideal -- too long and the output hash is 89 chars long for geometry/FastDelaunay.h.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants