Skip to content

warmcat/fixdiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fixdiff

Copyright (C) 2025 Andy Green [email protected] Licensed under MIT license, see LICENSE

$ cat llm-patch.diff | fixdiff | patch -p1

or with redirect

$ cat llm-patch.diff | fixdiff > llm-patch-fixed.diff

or optionally if run from elsewhere, you can tell it a path to change CWD to on the commandline so it can find the sources mentioned in the patch.

$ cat llm-patch.diff | fixdiff /path/to/sources | patch -p1

fixdiff is designed to clean up diffs generated by LLMs (eg, Gemini 2.5).

LLM find it hard to generate diff headers with correct line counts or even line offsets, although some LLMs are smart enough to produce otherwise legible diffs. Often the content or just the context lines around the changes are not quite right.

This utility adjusts the diff stanzas sent to it on stdin and produces new stanza headers with accurate line counts on stdout.

It silently repairs:

  1. New empty + lines with only whitespace are rewritten to be empty blank lines

Example:

diff --git a/deaddrop.js b/deaddrop.js
index 8f804f0..7913254 100644
--- a/deaddrop.js
+++ b/deaddrop.js
@@ -165,13 +165,21 @@
                    ts = d.getFullYear() + '-' + pad(d.getMonth() + 1) + '-' +
                         pad(d.getDate()) + '_' + pad(d.getHours()) + '-' +
                         pad(d.getMinutes()) + '-' + pad(d.getSeconds()),
+<tab>
                    formData = new FormData(), blob;
...

The added line is rewritten to be an empty blank line.

  1. Diff stanzas that do not contain any +/- lines are removed

Example:

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -3,6 +3,11 @@
        var server_max_size = 0, username = "", ws;

        function san(s)
        {
  1. Original lines in diff that differ from real line in file only by whitespace are rewritten to contain the correct whitespace

Example: file contains <tab><tab>abc

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -3,6 +3,11 @@
 <space><space>abc
...

The output patch is rewritten to match what is already in the file at that line for whitespace, so the output patch contains <tab><tab>abc

  1. All stanza header line offsets and counts are recomputed from the actual match in the original source and counting before and after lines in the diff, the incoming @@ line is completely ignored and rewritten with actual info

Example incoming patch stanza headers can be nonsense

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -123,16 +345,5 @@
 <space><space>abc
...

The correct headers will be rewritten in place of the wrong ones.

  1. Extra lead-in context lines to stanza by removing until only 3
--- a/deaddrop.js
+++ b/deaddrop.js
@@ -3,6 +3,11 @@
                                                "<tr><th>User</th><th>IP Address</th>" +
                                                "<th>Platform</th><th>Client</th></tr>";

                                        for (n = 0; n < j.connected_users.length; n++) {
                                                var u = j.connected_users[n];
                                                s_users += "<tr><td>" + san(u.user) +
                                                        "</td><td>" + san(u.ip) +
                                                        "</td><td>" + san(u.platform) +
                                                        "</td><td>" + san(u.browser) +
                                                        "</td></tr>";
                                        }
                                        s_users += "</table>";
                                        t_users.innerHTML = s_users;
                                }
+                       };
+
+                       ws.onclose = function() {
...

This will be rewritten to reduce the lead-in to the normal 3

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -14,6 +14,11 @@
                                        s_users += "</table>";
                                        t_users.innerHTML = s_users;
                                }
+                       };
+
+                       ws.onclose = function() {
...
  1. Excessive lead-out-context is removed, missing lead-out context is added. Diffs adding to EOF with missing or wrong context caused by LLM losing blank lines at the original EOF are rewritten by checking the original source file for extra lines and adding them as needed.

Example 1: excessive led-out removed

...
                                        s_users += "</table>";
                                        t_users.innerHTML = s_users;
                                }
+                       };
+
+                       ws.onclose = function() {
                                var u = j.connected_users[n];
                                s_users += "<tr><td>" + san(u.user) +
                                        "</td><td>" + san(u.ip) +
                                        "</td><td>" + san(u.platform) +
                                        "</td><td>" + san(u.browser) +
                                        "</td></tr>";

This will be trimmed to

...
                                        s_users += "</table>";
                                        t_users.innerHTML = s_users;
                                }
+                       };
+
+                       ws.onclose = function() {
                                var u = j.connected_users[n];
                                s_users += "<tr><td>" + san(u.user) +
                                        "</td><td>" + san(u.ip) +

Sometimes at EOT, the LLM does not know what is in the file properly, this leads to missing lead-out context.

Example 2: missing EOT context

Actual file ending

...
A
B
<cr>
<cr>

patch:

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -14,6 +14,11 @@
 A
 B
+C
+D
...

fixdiff will realize the situation and fix the stanza by fetching the extra lines from the original file and adding them as context at the end.

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -14,6 +14,11 @@
 A
 B
+C
+D
 <cr>
 <cr>
...
  1. Unexpected blank lines in a stanza (without space, + or -) are either ignored if happening at the end of the stanza, or rewritten to be context by adding a space at the beginning, if the normal diff resumes.

It finds and scans the sources the patches apply to and uses the diff stanza to find the original line it applied to by itself, along with the original line count and, considering earlier stanzas, the line in the modified file it appears at and the new line count for the stanza. Thus, it does not use the incoming broken stanza header information at all and replaces all the @@ lines with correct numbers according to the changed and unchanged sources, and the actual diff contents inside the stanza.

It handles diffs starting with --- as produced by most LLMs, also those with diff and index headers, and supports any combination of concatenated diffs targeting different files in one step.

Building

  • There are no dependencies other than libc.
  • It's pure C99.
  • It's valgrind-clean.
  • It just produces a small executable with no data files.
  • There are no switches.
  • It runs as part of a pipe into patch or standalone with redirects.

You can build it like:

$ mkdir build
$ cd build
$ cmake ..
$ make && sudo make install

Selftests can be run after build with ctest --output-on-failure.

About

Commandline utility to fix common LLM diff mistakes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published