fixdiff

$ cat llm-patch.diff | fixdiff | patch -p1

or with redirect

$ cat llm-patch.diff | fixdiff > llm-patch-fixed.diff

or optionally if run from elsewhere, you can tell it a path to change CWD to on the commandline so it can find the sources mentioned in the patch.

$ cat llm-patch.diff | fixdiff /path/to/sources | patch -p1

fixdiff is designed to clean up diffs generated by LLMs (eg, Gemini 2.5).

LLM find it hard to generate diff headers with correct line counts or even line offsets, although some LLMs are smart enough to produce otherwise legible diffs. Often the content or just the context lines around the changes are not quite right.

This utility adjusts the diff stanzas sent to it on stdin and produces new stanza headers with accurate line counts on stdout.

It silently repairs:

New empty + lines with only whitespace are rewritten to be empty blank lines

Example:

diff --git a/deaddrop.js b/deaddrop.js
index 8f804f0..7913254 100644
--- a/deaddrop.js
+++ b/deaddrop.js
@@ -165,13 +165,21 @@
                    ts = d.getFullYear() + '-' + pad(d.getMonth() + 1) + '-' +
                         pad(d.getDate()) + '_' + pad(d.getHours()) + '-' +
                         pad(d.getMinutes()) + '-' + pad(d.getSeconds()),
+<tab>
                    formData = new FormData(), blob;
...

The added line is rewritten to be an empty blank line.

Diff stanzas that do not contain any +/- lines are removed

Example:

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -3,6 +3,11 @@
        var server_max_size = 0, username = "", ws;

        function san(s)
        {

Original lines in diff that differ from real line in file only by whitespace are rewritten to contain the correct whitespace

Example: file contains <tab><tab>abc

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -3,6 +3,11 @@
 <space><space>abc
...

The output patch is rewritten to match what is already in the file at that line for whitespace, so the output patch contains <tab><tab>abc

All stanza header line offsets and counts are recomputed from the actual match in the original source and counting before and after lines in the diff, the incoming @@ line is completely ignored and rewritten with actual info

Example incoming patch stanza headers can be nonsense

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -123,16 +345,5 @@
 <space><space>abc
...

The correct headers will be rewritten in place of the wrong ones.

Extra lead-in context lines to stanza by removing until only 3

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -3,6 +3,11 @@
                                                "<tr><th>User</th><th>IP Address</th>" +
                                                "<th>Platform</th><th>Client</th></tr>";

                                        for (n = 0; n < j.connected_users.length; n++) {
                                                var u = j.connected_users[n];
                                                s_users += "<tr><td>" + san(u.user) +
                                                        "</td><td>" + san(u.ip) +
                                                        "</td><td>" + san(u.platform) +
                                                        "</td><td>" + san(u.browser) +
                                                        "</td></tr>";
                                        }
                                        s_users += "</table>";
                                        t_users.innerHTML = s_users;
                                }
+                       };
+
+                       ws.onclose = function() {
...

This will be rewritten to reduce the lead-in to the normal 3

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -14,6 +14,11 @@
                                        s_users += "</table>";
                                        t_users.innerHTML = s_users;
                                }
+                       };
+
+                       ws.onclose = function() {
...

Excessive lead-out-context is removed, missing lead-out context is added. Diffs adding to EOF with missing or wrong context caused by LLM losing blank lines at the original EOF are rewritten by checking the original source file for extra lines and adding them as needed.

Example 1: excessive led-out removed

...
                                        s_users += "</table>";
                                        t_users.innerHTML = s_users;
                                }
+                       };
+
+                       ws.onclose = function() {
                                var u = j.connected_users[n];
                                s_users += "<tr><td>" + san(u.user) +
                                        "</td><td>" + san(u.ip) +
                                        "</td><td>" + san(u.platform) +
                                        "</td><td>" + san(u.browser) +
                                        "</td></tr>";

This will be trimmed to

...
                                        s_users += "</table>";
                                        t_users.innerHTML = s_users;
                                }
+                       };
+
+                       ws.onclose = function() {
                                var u = j.connected_users[n];
                                s_users += "<tr><td>" + san(u.user) +
                                        "</td><td>" + san(u.ip) +

Sometimes at EOT, the LLM does not know what is in the file properly, this leads to missing lead-out context.

Example 2: missing EOT context

Actual file ending

...
A
B
<cr>
<cr>

patch:

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -14,6 +14,11 @@
 A
 B
+C
+D
...

fixdiff will realize the situation and fix the stanza by fetching the extra lines from the original file and adding them as context at the end.

--- a/deaddrop.js
+++ b/deaddrop.js
@@ -14,6 +14,11 @@
 A
 B
+C
+D
 <cr>
 <cr>
...

Unexpected blank lines in a stanza (without space, + or -) are either ignored if happening at the end of the stanza, or rewritten to be context by adding a space at the beginning, if the normal diff resumes.

It finds and scans the sources the patches apply to and uses the diff stanza to find the original line it applied to by itself, along with the original line count and, considering earlier stanzas, the line in the modified file it appears at and the new line count for the stanza. Thus, it does not use the incoming broken stanza header information at all and replaces all the @@ lines with correct numbers according to the changed and unchanged sources, and the actual diff contents inside the stanza.

It handles diffs starting with --- as produced by most LLMs, also those with diff and index headers, and supports any combination of concatenated diffs targeting different files in one step.

Building

There are no dependencies other than libc.
It's pure C99.
It's valgrind-clean.
It just produces a small executable with no data files.
There are no switches.
It runs as part of a pipe into patch or standalone with redirects.

You can build it like:

$ mkdir build
$ cd build
$ cmake ..
$ make && sudo make install

Selftests can be run after build with ctest --output-on-failure.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
tests		tests
tools		tools
.gitignore		.gitignore
.sai.json		.sai.json
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
fixdiff.c		fixdiff.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fixdiff

Building

About

Uh oh!

Releases

Packages

Languages

License

warmcat/fixdiff

Folders and files

Latest commit

History

Repository files navigation

fixdiff

Building

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages