Skip to content

Commit

Permalink
Update structuring section and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
mahaloz committed Apr 29, 2024
1 parent edbd618 commit 4fe5bd6
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 12 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# The Decompilation Wiki

<p align="center">
<img src="/static/img/logo.png" style="width: 30%;" alt="Dec Wiki Logo"/>
<img src="/docs/static/img/logo.png" style="width: 30%;" alt="Dec Wiki Logo"/>
</p>

The Decompilation Wiki is a collection of categorized information on all things decompilation.
From real-world applications to cutting-edge research papers, the Decompilation Wiki has it all! Join our Discord below for active community engagement. To get involved, see our [contribution guide](./docs/contributing.md).
From real-world applications to cutting-edge research papers, the Decompilation Wiki has it all! Join our Discord below for active community engagement. To get involved, see our [contribution guide](/docs/contributing.md).

[![Discord](https://dcbadge.vercel.app/api/server/hE7prXNt7t)](https://discord.gg/hE7prXNt7t)

Expand All @@ -24,9 +24,9 @@ Decompilation has wide applications across cyber security, including:
- vulnerability discovery (the understanding of program flaws)
- malware classification
- program repair
- [and much more...](./docs/applications/overview.md)
- [much more...](/applications/introduction/).

## Wiki Goals?
## Wiki Goals
This wiki has two main goals:

1. Making decompilation knowledge more accessible to new-comers in the field
Expand All @@ -45,7 +45,7 @@ The Decompilation Wiki was started by [Zion Leonahenahe Basque](https://zionbasq
The wiki is highly inspired by the following sources:

- [Program-Transformation.org](https://www.program-transformation.org/): a wiki on program transformations, including some decompilation.
- [CTF Wiki](https://ctf-wiki.org/): a wiki for Capture the Flag, inspiring this layout and design.
- [CTF Wiki](https://ctf-wiki.org/): a wiki for Capture the Flag, inspiring this layout and design
- ["30 Years into Scientific Binary Decompilation"](https://www.youtube.com/watch?v=XasallkPQIA), Dr. Ruoyu (Fish) Wang: a source of information on decompilers.


Expand Down
14 changes: 13 additions & 1 deletion docs/fundamentals/cf_structuring/gotoless.md
Original file line number Diff line number Diff line change
@@ -1 +1,13 @@
# Gotoless Structuring
# Gotoless Structuring
## Introduction
Gotoless structuring is a type of structuring that ignores some compiler patterns to make code that contains no gotos[^1].
According to this work, these gotos are signs of unstructured code which is bad for readability[^1][^2].

The most famous of these works, and the founder of the area, is the DREAM decompiler[^1].
The DREAM decompiler used reaching conditions on statements to condense and reduce decompilation.

The followup to this work, which is a hybrid of gotoless and schema-based methods, is the rev.ng[^2] decompiler.
They used a method called "Combing", which duplicated nodes in the original graph to get rid of unstructured regions.

[^1]: Yakdan, Khaled, et al. "No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations." NDSS. 2015.
[^2]: Gussoni, Andrea, et al. "A comb for decompiled c code." Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. 2020.
7 changes: 4 additions & 3 deletions docs/fundamentals/cf_structuring/overview.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Overview
# Structuring Overview
## Introduction

Academically introduced in Dr. Cifuentes' 1994 Dissertation[^1], decompilation control flow structuring is the process used to turn a control flow graph (CFG) into a structured high-level language.
Expand Down Expand Up @@ -54,7 +54,7 @@ There are multiple ways of turning the graph into linear C code [^4].
For instance, the first condition on `x` can be flipped, changing where the `goto` appears and how many `if` scopes exist in the program.

## Types of Structuring
There are two dominant types of structuring algorithms[^4]:
There are two dominant types of structuring algorithms[^8]:

1. [Schema-based](/fundamentals/cf_structuring/schema-based.md): Algorithms that construct code based on pre-known graph patterns that are omitted by compilers. These algorithms attempt to only make structured code when they are aware of a direct mapping to its source structure.
2. [Gotoless](/fundamentals/cf_structuring/gotoless.md): Algorithms that prioritize removing all unstructured regions from code. These algorithms may use schema-based methods initially but are unique in their pattern-matching of structures that may not exist in its source.
Expand All @@ -76,4 +76,5 @@ Additionally, many of the ideas for eliminating gotos, which were often a byprod
[^4]: Basque, Zion Leonahenahe. “30 Years of Decompilation and the Unsolved Structuring Problem: Part 1.” Mahaloz.Re, 2 Jan. 2024, https://mahaloz.re/dec-history-pt1. Accessed 11 Apr. 2024.
[^5]: Yakdan, Khaled, et al. "No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations." NDSS. 2015.
[^6]: Williams, M. Howard, and G. Chen. "Restructuring pascal programs containing goto statements." The Computer Journal 28.2 (1985): 134-137.
[^7]: Erosa, Ana M., and Laurie J. Hendren. "Taming control flow: A structured approach to eliminating goto statements." Proceedings of 1994 IEEE International Conference on Computer Languages (ICCL'94). IEEE, 1994.
[^7]: Erosa, Ana M., and Laurie J. Hendren. "Taming control flow: A structured approach to eliminating goto statements." Proceedings of 1994 IEEE International Conference on Computer Languages (ICCL'94). IEEE, 1994.
[^8]: Basque, Zion Leonahenahe, et al. "Ahoy sailr! there is no need to dream of c: A compiler-aware structuring algorithm for binary decompilation." 33st USENIX Security Symposium (USENIX Security 24). 2024.
27 changes: 26 additions & 1 deletion docs/fundamentals/cf_structuring/schema-based.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,27 @@
# Schema-based Structuring
pass
## Introduction
Schema-based structuring is a type of structuring that depends on a set of known compiler graph patterns[^1].
With this in mind, a decompiler must know all of the compiler graph patterns to generate code that is structured[^3] (contains no gotos).
An example of this type of structuring can be found in the [overview](/fundamentals/cf_structuring/overview) section.
Schema-based structuring techniques are the most popular techniques among decompilers[^1][^2][^4][^5][^6].

## Example Graph Patterns
Example graph patterns, from Cifuentes 1994 dissertation[^1], can be seen below:

![](/static/img/dcc_schema.png)

## Approaches
After the foundational dissertation from Cifuentes, the Phoenix[^2] work improved on structuring by adding more condition patterns.
These patterns allowed for more correct structures (like loop reductions).

Follow-ups to this work all used gotoless[^3][^4] structuring methods until the SAILR[^5] work in 2024.
The SAILR work improved on the gotoless algorithms by introducing a new type of schema that "reverts compiler optimizations."


[^1]: Cifuentes, Cristina. Reverse compilation techniques. Queensland University of Technology, Brisbane, 1994.
[^2]: Brumley, David, et al. "Native x86 decompilation using Semantics-Preserving structural analysis and iterative Control-Flow structuring." 22nd USENIX Security Symposium (USENIX Security 13). 2013.
[^3]: Yakdan, Khaled, et al. "No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations." NDSS. 2015.
[^4]: Gussoni, Andrea, et al. "A comb for decompiled c code." Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. 2020.
[^5]: Basque, Zion Leonahenahe, et al. "Ahoy sailr! there is no need to dream of c: A compiler-aware structuring algorithm for binary decompilation." 33st USENIX Security Symposium (USENIX Security 24). 2024.
[^6]: Ďurfina, Lukáš, et al. "Design of a retargetable decompiler for a static platform-independent malware analysis." Information Security and Assurance: International Conference, ISA 2011, Brno, Czech Republic, August 15-17, 2011. Proceedings. Springer Berlin Heidelberg, 2011.

8 changes: 6 additions & 2 deletions docs/fundamentals/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,12 @@ Some work has constructed a taxonomy of decompiler-to-source errors[^13].
That work categorized and identified many errors that occur when decompiling a binary.

Other work has identified general methods by which we may test the _correctness_ of decompilation[^14].
Relatadely, some structuring work has attempted to measure this through recompiliability[^2].
Relatedly, some structuring work has attempted to measure this through recompiliability[^2].

## Dataset Generation
The question of "what dataset?" to use has also been an issue in decompilation.
Additionally, compiled binaries can sometimes be hard to access.
Some related work has explored ways to generate bigger binary datasets for the evaluation of binary tools like decompilers[^17].


[^1]: Cifuentes, Cristina. Reverse compilation techniques. Queensland University of Technology, Brisbane, 1994.
Expand All @@ -71,4 +75,4 @@ Relatadely, some structuring work has attempted to measure this through recompil
[^14]: How far we have come: Testing decompilation correctness of C decompilers
[^15]: Nosco, Timothy, et al. "The industrial age of hacking." 29th USENIX Security Symposium (USENIX Security 20). 2020.
[^16]: Mantovani, Alessandro, et al. "{RE-Mind}: a First Look Inside the Mind of a Reverse Engineer." 31st USENIX Security Symposium (USENIX Security 22). 2022.

[^17]: Singhal, Vidush, et al. "Cornucopia: A Framework for Feedback Guided Generation of Binaries." Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 2022.
Binary file added docs/static/img/dcc_schema.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 4fe5bd6

Please sign in to comment.