hashquines/README.html

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>README</title>
  <style type="text/css">
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
  </style>
  <style type="text/css">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; position: absolute; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
  { position: relative; }
pre.numberSource a.sourceLine:empty
  { position: absolute; }
pre.numberSource a.sourceLine::before
  { content: attr(data-line-number);
    position: absolute; left: -5em; text-align: right; vertical-align: baseline;
    border: none; pointer-events: all;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {  }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
  </style>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<h1 id="hashquines">Hashquines</h1>
<p>Hashquines are files showing their own MD5.</p>
<p>It seems like an impossible magic trick because modifying the file's contents changes the hash, therefore the hash can't be known in advance. So it's the opposite strategy: make it possible to display <em>any</em> value, included the value of the actual hash of the file, without changing the overall hash of the file.</p>
<p>While the security risk they represent is debatable, they certainly show that MD5 is now just a fun toy.</p>
<h1 id="strategies">Strategies</h1>
<h2 id="self-check">Self-check</h2>
<p>'Cheating' by just computing the hash value and displaying it. They don't rely on hash collisions.</p>
<p>In Python:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb1-1" data-line-number="1"><span class="im">import</span> hashlib</a>
<a class="sourceLine" id="cb1-2" data-line-number="2"><span class="im">import</span> sys</a>
<a class="sourceLine" id="cb1-3" data-line-number="3"><span class="bu">print</span>(hashlib.md5(<span class="bu">open</span>(sys.argv[<span class="dv">0</span>],<span class="st">&quot;rb&quot;</span>).read()).hexdigest())</a></code></pre></div>
<p>In Batch:</p>
<pre class="batch"><code>md5sum %0
</code></pre>
<h2 id="read-an-encoded-value">Read an encoded value</h2>
<p>Chain enough fast collisions to be able to encode a hash value without modifying the final hash value, then some code is <em>executed</em> to check and display that value.</p>
<p>Some fixed offsets of FastColl collision blocks will be xored with <code>0x80</code>, so that bit can be checked reliably - in this case, one collision equals 1 bit of information.</p>
<p>An MD5 is 128 bits, so 128 Fastcoll blocks are needed.</p>
<h2 id="abuse-format-parsing">Abuse format parsing</h2>
<p>Abuse the format structure to make the parser display a digit or another while keeping the same hash value, repeat for each digit and for every value to be displayed (<code>32*16=512</code>).</p>
<p>Like colliding 2 images or documents, but instead colliding the contents of many different elements at various positions, to display the right value of the actual hash file without changing the overall file hash.</p>
<p>Depending on the format, a possible way is to use a collision as a switch to enable one character or another. In this case, <code>N</code> chained collisions display <code>N+1</code> different objects in the same position.</p>
<p>To display the hex value of an MD5, 16 characters are required for 32 digits, so <code>480</code> (=<code>32*15</code>) collisions if there's the ability to select between different characters.</p>
<p>If it's only possible to turn on/off decoded sequences, then <code>512</code> (=<code>32*16</code>) collisions are needed.</p>
<p>A way to reduce this amount of collision is to display the characters via 7-segments display, in which case each segments needs to be toggled on or off (like bits). So in this case, <code>224</code> (<code>=32*7</code>) collisions are required.</p>
<pre><code> -
| |
 -
| |
 -
</code></pre>
<h1 id="examples">Examples</h1>
<ul>
<li>a <a href="md5.ps">PostScript</a> 'encoding' hashquine by <em>Gregor Kopf</em> - cf <a href="https://github.com/angea/pocorgtfo#0x14">Poc||GTFO 0x14:09</a> (<a href="https://github.com/angea/pocorgtfo/blob/master/contents/articles/14-09.pdf">article 9</a>) (<a href="scripts/ps.py">script</a>)</li>
</ul>
<p><img src=pics/postscript.png height=200></img></p>
<ul>
<li>2 PDF hashquines: one via <a href="pocs/md5jpg.pdf">JPG</a>, one via <a href="pocs/md5text.pdf">text</a>, by <em>Mako</em> - cf <a href="https://github.com/angea/pocorgtfo#0x14">Poc||GTFO 0x14:10</a> (<a href="https://github.com/angea/pocorgtfo/blob/master/contents/articles/14-10.pdf">article 10</a>) (scripts: <a href="scripts/pdf_jpg.py">jpg</a>, <a href="scripts/pdf_txt.py">txt</a>)</li>
</ul>
<p><img src=pics/jpgpdf.png height=100></img></p>
<pre><code>$ pdftotext -q md5text.pdf -
66DA5E07C0FD4C921679A65931FF8393

$ md5sum md5text.pdf
66da5e07c0fd4c921679a65931ff8393 *md5text.pdf
</code></pre>
<ul>
<li>2 GIF hashquines by <em>spq</em> - cf <a href="https://github.com/angea/pocorgtfo#0x14">Poc||GTFO 0x14:11</a> (<a href="https://github.com/angea/pocorgtfo/blob/master/contents/articles/14-11.pdf">article 11</a>) (scripts: <a href="scripts/gif_seg.py">segments</a>, <a href="scripts/gif_avp.py">avp</a>)</li>
</ul>
<p><img src=pocs/md5.gif width=400></img> <img src=pocs/md5_avp_loop.gif width=300></img></p>
<ul>
<li>a NES 'encoding' hashquine, by Evan Sultanik and Evan Teran, in PoC||GTFO 0x14 (here as a <a href="pocs/hashquine.nes">standalone file</a>) - cf <a href="https://github.com/angea/pocorgtfo#0x14">Poc||GTFO 0x14:12</a> (<a href="https://github.com/angea/pocorgtfo/blob/master/contents/articles/14-12.pdf">article 12</a>) (<a href="scripts/nes.py">script</a>)</li>
</ul>
<p><img src=pics/nes.png height=200></img></p>
<ul>
<li><a href="https://github.com/angea/pocorgtfo#0x14">Poc||GTFO 0x14</a> is a polyglot file: simultaneously a JPG in PDF hashquine and a NES hashquine, with also a hidden cover while keeping the same MD5 - a classic collision, albeit the JPG picture has a lot of custom scans. (<a href="scripts/pocorgtfo14.py">script</a>)</li>
</ul>
<p><img src=pics/pocorgtfo14.png height=400></img></p>
<ul>
<li>a GIF hashquine by <em>Rogdham</em> - cf <a href="https://www.rogdham.net/2017/03/12/gif-md5-hashquine.en">blog post</a> (<a href="scripts/gif_rog.py">script</a>)</li>
</ul>
<p><img src=pocs/rogdham_gif_md5_hashquine.gif height=150></img></p>
<ul>
<li>a TIFF <a href="pocs/rogdham_tiff_md5_hashquine_4Go.zip">hashquine (4 gb)</a> by <em>Rogdham</em></li>
</ul>
<p>Cf self-descriptive image (here as PNG):</p>
<p><img src=pics/tiff_preview.png height=600></img></p>
<ul>
<li>a PNG hashquine by <em>David 'Retr0id' Buchanan</em></li>
</ul>
<p><img src=pocs/hashquine_by_retr0id.png height=600></img></p>
<h2 id="archive-hashquines">Archive hashquines</h2>
<p>Decompressing these archives gives the hash of the whole archive.</p>
<ul>
<li>a <a href="pocs/hashquine.gz">GZIP hashquine</a> by <em>Ange Albertini</em> (<a href="scripts/gzip.py">script</a>)</li>
</ul>
<pre><code>$ md5sum hashquine.gz
ad5de2581f4bd8f35051b789df379d36 *hashquine.gz

$ gzip -dck hashquine.gz
ad5de2581f4bd8f35051b789df379d36 is the hash of this archive.

This is a Gzip partial hashquine, made of many Gzip members chained together.
It&#39;s using 192 Unicoll MD5 collisions to display optionally each of the
16 hexadecimal characters &#39;0123456789abcdef&#39; 12 times (in that order).
This is not enough to display any MD5 digest (32 nibbles),
so an extra small member with bruteforced contents was appended to give the file
an MD5 that can be displayed with this limited input.

Thanks to Marc Stevens for creating Unicoll.
Thanks to David Buchanan for the constructive discussions.

Ange Albertini 2023
</code></pre>
<ul>
<li>an <a href="pocs/hashquine.lz4">LZ4 hashquine</a> by <em>Ange Albertini</em>, with 160 collisions. (<a href="scripts/lz4.py">script</a>)</li>
</ul>
<pre><code>$ md5sum hashquine.lz4
1690738ac079d914645ade5693ab019b *hashquine.lz4
$ lz4 -c hashquine.lz4
1690738ac079d914645ade5693ab019b
</code></pre>
<h3 id="retroids-zstandard-file">Retroid's Zstandard file</h3>
<p>A clever Zstandard hashquine++ by <em>David 'Retr0id' Buchanan</em>. (scripts: <a href="scripts/zst.py">zst</a>, <a href="scripts/tar_zst.py">tar.zst</a>)</p>
<p>As a pure <a href="pocs/hashquine.zst">ZStandard</a> file:</p>
<pre><code>$ md5sum hashquine.zst
720ca7f6842f1a608fcb924f5811ebb9 *hashquine.zst

$ zstd -cd hashquine.zst
The MD5 of hashquine.zst is:
720ca7f6842f1a608fcb924f5811ebb9
</code></pre>
<p>As a <a href="pocs/hashquine.tar.zst">Zstandard(tar)</a> file:</p>
<pre><code>$ md5sum hashquine.tar.zst
703911cf9e409965cebd05392acc1503 *hashquine.tar.zst

$ tar -Oxf hashquine.tar.zst hash.md5
The MD5 of hashquine.tar.zst is:
703911cf9e409965cebd05392acc1503
</code></pre>
<p>As a self-checked 'auto-manifest' <a href="pocs/self.tar.zst">Tar.zst</a> (this is generic: make <a href="scripts/tar_zst.py">your own</a> from any given Tar!):</p>
<pre><code>$ md5sum self.tar.zst
f068d54fabb12dbb1b359745a80d78fc *self.tar.zst
</code></pre>
<pre><code>$ tar -xvf self.tar.zst
x hash.md5
x hello.txt
</code></pre>
<pre><code>$ cat hash.md5
f068d54fabb12dbb1b359745a80d78fc *self.tar.zst
ed076287532e86365e841e92bfc50d8c *hello.txt
</code></pre>
<pre><code>$ md5sum -c hash.md5
self.tar.zst: OK
hello.txt: OK
</code></pre>
<p>All these files are Zstandard-streams via the same prefix with 653 collisions (!):</p>
<ol>
<li><p>Tar Header (entirely optional and generic for <code>hash.md5</code> contents):</p>
<ol>
<li><code>1</code> for the Tar header start <code>hash.md5 [...] 0000644 0000000 0000000</code> (constant)</li>
<li><code>8*11</code> for the <code>hash.md5</code> file size, in octal.</li>
<li><code>1</code> for a Tar timestamp: <code>14412572240</code> (constant)</li>
<li><code>8*6</code> tar header checksum, in octal</li>
<li><code>1</code> for the rest of the Tar header <code>0 [...] ustar 00root [...] root [...] 0000000 0000000 [...]</code> (constant)</li>
</ol></li>
<li><p>File contents:</p></li>
</ol>
<p>Optional text prefixes:</p>
<ol start="6">
<li><code>1</code> for the prefix &quot;The MD5 of hashquine.tar.zst is&quot; (constant)</li>
<li><code>1</code> for the prefix &quot;The MD5 of hashquine.zst is&quot; (constant)</li>
</ol>
<p>Hashquine:</p>
<ol start="8">
<li><code>32*16</code> MD5 hash (all nibble possibilities)</li>
</ol>
<p>So this file archive prefix can be set to be decompressed as a tar archive or not, as the tar header is entirely optional, and the tar header checksum can be adjusted. Since all the collisions make each compressed frame optional, they can be set to decompress as a 292kB <a href="pocs/empty.zst">empty Zstandard archive</a>:</p>
<pre><code>$ zstd -d empty.zst
empty.zst           : 0 bytes
</code></pre>
<h1 id="notes">Notes</h1>
<h2 id="custom-fastcoll">Custom FastColl</h2>
<p>For these hashquines, new forms of hash collisions were introduced by <em>Mako</em>:</p>
<ul>
<li><p>Rather than relying on UniColl, FastColl was modified to force the creation of a JPEG comment <code>FF FE</code> right before the collision difference. Also useable for standard JPEG <a href="../examples/free">collisions</a>.</p></li>
<li><p>Similarly, in the PDF hashquine with text, 32b of the FastColl are forced to be <code>Do(...)</code> which is a valid PDF operator. It's very nice to have such a short text operator that can be abused in the middle of a collision block!</p></li>
</ul>
<h2 id="hiding-collision-blocks">Hiding collision blocks</h2>
<p>As usual, hiding collision blocks can be tricky. Here are some introduced techniques.</p>
<ul>
<li><em>Mako</em> abused PDF name conventions, forcing the start of the collision block to define a PDF name - the last character before the collision block is a <code>/</code>, defining a PDF name. The name has to be reused later in the file.</li>
</ul>
<p>For example, the first collision block defines the atrocious but working name <code>/öÃÝüúá.3A¢å¦e»ñæÀ¿Wæ‚b—»ò´óûàÊÄÃ’\q±*,ÆýH™æSùÞsp</code>.</p>
<ul>
<li><em>Retr0id</em> put the collision blocks in plain sight: a custom palette is used to hide all colors but 0, and collision blocks containing null values are rejected.</li>
</ul>
<p>Here's the a crop of the picture around the <code>8</code> digit with a more revealing palette (black and red color unmodified).</p>
<p><img src=pics/8blue.png height=100></img></p>
<p>These are unicoll blocks to turn on/off the red pixels (color <code>0</code> or <code>1</code>)</p>
<h2 id="misc">Misc</h2>
<ul>
<li><p><a href="https://github.com/corkami/collisions#detection"><code>detectcoll</code></a> in <em>unsafe mode</em> can enumerate all Fastcoll and Unicoll occurences are present in the file - cf <a href="logs/">logs</a>.</p>
<ul>
<li>Knowing the offsets and the types of the collision just from one file, it's then possible to modify or reset the collisions and alter the display yourself of the files without changing their MD5 or recomputing any collision, whether it's an encoding-based, format-based or archive hashquine.</li>
</ul></li>
<li><p>Since Unicoll only changes one byte, Retr0id's PNG font is made of one pixel per line:</p></li>
</ul>
<p><img src=pics/8.png height=100></img></p>
<pre><code>  *
     *
*
       *
*
       *
  *
     *
*
       *
*
       *
  *
     *
</code></pre>
<ul>
<li>Since correct values of Adler32 are required for Zlib chunks and CRC32 are required for PNG chunks, <em>Retr0id</em> had to forge these values while keeping the same MD5 hash: to do that, he added many fastcoll blocks and swapped them in a clever meet-in-a-middle attack to reach a suitable value - cf this <a href="https://gist.github.com/DavidBuchanan314/a15e93eeaaad977a0fec3a6232c0b8ae">gist</a>.</li>
</ul>
<p>The Fastcoll blocks to forge the CRCs are 'visible' at the bottom of the picture (here, with a revealing palette).</p>
<p><img src=pics/retr0id_blue.png height=500></img></p>
<ul>
<li><p>If the whole file has to be parsed, and each hex character can be switched on/off, an MD5 might be displayable by less characters than the whole range of 16 characters: in practice, <code>192</code> (=<code>12*16</code>) collisions are enough to display a bruteforced MD5 after roughly a few 1000s tries.</p>
<p>For example, <code>ad5de2581f4bd8f35051b789df379d36</code> is a bruteforced MD5 made of 12 groups of hex characters in increasing order:</p>
<p><code>ad</code>, <code>5de</code>, <code>258</code>, <code>1f</code>, <code>4bd</code>, <code>8f</code>, <code>35</code>, <code>05</code>, <code>1b</code>, <code>789df</code>, <code>379d</code>, <code>36</code>.</p>
<p>It's possible with even fewer collisions (for example with only 10 characters sets), but it requires more luck:</p>
<p><code>169</code>, <code>07</code>, <code>38ac</code>, <code>079d</code>, <code>9</code>, <code>146</code>, <code>45ade</code>, <code>569</code>, <code>3ab</code>, <code>019b</code> fits in 10 groups.</p></li>
</ul>
<h1 id="more-than-hash-values">More than hash values</h1>
<p>Many collisions can be used to encode or display a hash value, but they can be used to encode anything else, even bigger.</p>
<p><em>Retr0id</em> <a href="https://github.com/DavidBuchanan314/monomorph">combined</a> parallelized Fastcoll with a linux 4kb shellcode loader, generating a hashquine (benign), a rickroll (fun) or a meterpreter (evil) or anything you want with the same final hash.</p>
<pre><code>$ python3 monomorph.py bin/monomorph.linux.x86-64.benign benign
[...]
$ python3 monomorph.py bin/monomorph.linux.x86-64.benign hashquine sample_payloads/bin/hashquine.bin
[...]
$ benign
$ hashquine
My MD5 is: 3cebbe60d91ce760409bbe513593e401
$ md5sum benign hashquine
3cebbe60d91ce760409bbe513593e401  benign
3cebbe60d91ce760409bbe513593e401  hashquine
</code></pre>
<!-- pandoc -s -f gfm -t html README.md -o README.html -->
</body>
</html>