Should tartufo insist on valid base64 encodings? #278
Unanswered
rbailey-godaddy
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Historically, we did a spotty job of this. The original
get_strings_of_set()
allowed=
to appear anywhere in a "base64" string, although technically it is only permitted as padding at the end. There's a PR in review that replaces this function with a regex-basedfind_string_encodings()
that does a better (but still imperfect) job of matching only legal base64 encodings.However, the reason I bring this up is that we have a request to add support for base64url encodings. By way of quick recap, base64url is just like base64 except it replaces
+
and/
in the encoding alphabet with-
and_
. I am considering speed-vs-accuracy tradeoffs in possible implementations. I see three options:You do not see "option 4 - make the user tell us with a command line flag" because there is nothing that says a repository -- or even a single file -- might not have both base64 and base64url encodings in it.
Given that we are looking for entropy and not sanity-checking validity, I am very drawn to option 2 due to its efficiency and simplicity, but I am looking for feedback on the real-world consequences of this strategy. There are two corner cases I have thought of:
+
(from base64) and_
(from base64url). Do we really care if this string is examined for high entropy, even if it is not actually a valid known encoding? History suggests that we do not. (Note previous mishandling of=
and lack of valid-length check.)onething-anotherthing
. Previously this would be considered as two separate base64 strings (onething
andanotherthing
), but now we would see oneonething-anotherthing
string. Previous findings that might be excluded by signature would no longer match, because the detected string had changed. A special case of this would be something like_onething
which previously would returnonething
and now would return_onething
.As you consider the second point, think about the likelihood of an otherwise-valid base64 encoding appearing in text immediately adjacent to either a
-
or_
(the two new characters). This seems to me to be unlikely -- but if I thought it was totally safe, I wouldn't be asking.Beta Was this translation helpful? Give feedback.
All reactions