Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] A more useful diffing tool #2172

Open
1 task
xroberx opened this issue Mar 17, 2025 · 7 comments
Open
1 task

[Feature] A more useful diffing tool #2172

xroberx opened this issue Mar 17, 2025 · 7 comments

Comments

@xroberx
Copy link

xroberx commented Mar 17, 2025

What feature would you like to see?

Hi there, and thank you once more for this excellent program. It is helping me a lot!

I think the current diffing view could be much more useful if it were able to detect insertions between identical data chunks, the way Scooter Software's Beyond Compare does. This program, when it detects that there is data inserted between two data chunks that are present in both files, displays the bytes in the view of the file that does not contain that data with dashes and shifts the next chunk that is the same in both files so that their positions match visually.

Please see the attached screenshot.

Best regards,
Robert.

Image

How will this feature be useful to you and others?

This would make finding differences a lot better and faster.

Request Type

  • I can provide a PoC for this feature or am willing to work on it myself and submit a PR

Additional context?

No response

@paxcut
Copy link
Contributor

paxcut commented Mar 17, 2025

Beyond compare is not free and open software, so the algorithms used to obtain their differences is not something that ImHex can necessarily (or easily) implement. In fact, in order to obtain certain types of differences, you must tweak the options of BC to be the right ones. The general problem of finding inserted blocks in binary files is not easy to implement so that it always works as expected. I agree that the request would be nice to have, but I doubt there is an easy way to add it to the current diffing code. As far as I know Beyond Compare is the only diffing tool that implements this feature in a usable form so it must be hard to get it done.

@C3pa
Copy link

C3pa commented Mar 18, 2025

IIRC WinMerge is FOSS and has per-bit and per-byte comparison. Maybe it does something similar to Beyond Compare.

@xroberx
Copy link
Author

xroberx commented Mar 20, 2025

Beyond compare is not free and open software, so the algorithms used to obtain their differences is not something that ImHex can necessarily (or easily) implement. In fact, in order to obtain certain types of differences, you must tweak the options of BC to be the right ones.

You don't have to tweak anything, it works out of the box. Also, it does not matter how hard it is to implement, if it can be done for sure it will be done, because the current diffing tool is useless if there are insertions/deletions in one of the files to compare and forces me to switch back and forth between ImHex and Beyond Compare.

As far as I know Beyond Compare is the only diffing tool that implements this feature in a usable form so it must be hard to get it done.

The 010 Editor also has this capability (insertions in file B highlighted in yellow):

Image

@paxcut
Copy link
Contributor

paxcut commented Mar 20, 2025

You don't have to tweak anything, it works out of the box.

You need to set alignment. the three settings (none,fast, complete) will yield different answers when selected. You don't always want to detect insertions.

Also, it does not matter how hard it is to implement, if it can be done for sure it will be done,...

I have tested all binary diff programs I can get my hands on and BC is the one that does it best although it still has problems. Binary diffing is not a well posed problem in general. I am saying it is unlikely somebody will step forward and implement this in a useable form.

The 010 Editor also has this capability (insertions in file B highlighted in yellow):

Yes, as with many things 010 claims to have its diffing tool has much left to be desired. A proper diffing tool classifies all entries in both files as either common, insertions, deletions or replacements. If blue is the color assigned to common parts then the entries with values 0x10 and 0x20 should also be blue because the yellow values were assigned as insertions. Beyond compare gives the right answer by coloring only differences and inserting blank spaces (without changing the offsets) where an insertion occurred.

Image

That brings about the difficulty in this kind of operation. A possible answer would be that the 0xAA entries were modifications followed by common values until offset 0x40 where 8 values were modified followed by common values until offset 0x60 where 8 values were inserted. You may not want that as an answer but somebody else would and the diffing program should provide both. ImHex does provide this as an answer and changing the options in Beyond Compare gives us that answer too. Note that now the blank spaces are at the end so that all changes are replacements except the last one..

Image

@xroberx
Copy link
Author

xroberx commented Mar 20, 2025

You seem to know more about this subject than I do. As soon as I have time I will try to replicate the behavior of Beyond Compare. Thanks for your response.

@xroberx
Copy link
Author

xroberx commented Mar 21, 2025

Hey @paxcut , I have been doing some tests with Beyond Compare and I have found out that it is also broken. Please see the attached image...

Image

.

@paxcut
Copy link
Contributor

paxcut commented Mar 21, 2025

Let me put it this way. Any algorithm that uses traditional programing to detect insertions deletions and substitutions can be given inputs so that it will not be able to produce some of the equally valid answers 100% of the time. Human brains can do that to a certain extent, but it requires high order pattern recognition and it breaks when the complexity of the difference is increased (consider files within files within files ,,,). Consider the transformation of a file into a compressed format. It can also be described in term of insertions deletions and so forth but no software or human can accomplish it. It is theoretically possible because the mathematics of sequence transformations allow for answer to be known to exists.
Beyond Compare aims to find the smallest set of additions insertions and replacements that transform one file into the other given the alignment constrains set in the options. The answer may not be unique (or the one you expect) but these tools are only helpers that do the bulk of the work and lets you decide how to handle fringe cases. The fact that you can say that BC is also broken testifies to that role.
Modern approaches to the problem involve defining metrics that can be used to define distances between files that are measured in terms of basic operations (insertion, deletion and replacement). They allow you to classify large number of versions of one file in some in order that help researchers trace the evolution of the code. This is pretty profound topic that is currently being research in many fields of computer science.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants