From edbd618e9a54b70e1915b8bf9d62fb9e8675e385 Mon Sep 17 00:00:00 2001 From: mahaloz Date: Sun, 28 Apr 2024 18:56:36 -0700 Subject: [PATCH] Add applied research section --- README.md | 14 +++---- docs/applications/overview.md | 12 ++++++ docs/applications/program_reconstruction.md | 5 ++- docs/applied_research/code_sim.md | 25 +++++++++++++ .../library_identification.md | 0 docs/applied_research/overview.md | 33 +++++++++++++++++ docs/applied_research/symbol_recovery.md | 37 +++++++++++++++++++ docs/applied_research/vuln_discovery.md | 14 +++++++ docs/decompilers/tools.md | 11 ++++++ docs/fundamentals/cfg_recovery/overview.md | 6 ++- docs/index.md | 12 +++--- mkdocs.yml | 4 +- 12 files changed, 152 insertions(+), 21 deletions(-) create mode 100644 docs/applied_research/code_sim.md delete mode 100644 docs/applied_research/library_identification.md create mode 100644 docs/applied_research/vuln_discovery.md diff --git a/README.md b/README.md index 8fdf8df..0ed2a41 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,11 @@ # The Decompilation Wiki

- Dec Wiki Logo + Dec Wiki Logo

The Decompilation Wiki is a collection of categorized information on all things decompilation. -From real-world applications to cutting-edge research papers, the Decompilation Wiki has it all! Join our Discord below for active community engagement. To get involved, see our [contribution guide](/docs/contributing.md). +From real-world applications to cutting-edge research papers, the Decompilation Wiki has it all! Join our Discord below for active community engagement. To get involved, see our [contribution guide](./docs/contributing.md). [![Discord](https://dcbadge.vercel.app/api/server/hE7prXNt7t)](https://discord.gg/hE7prXNt7t) @@ -24,9 +24,9 @@ Decompilation has wide applications across cyber security, including: - vulnerability discovery (the understanding of program flaws) - malware classification - program repair -- [much more...](/applications/introduction/). +- [and much more...](./docs/applications/overview.md) -## Wiki Goals +## Wiki Goals? This wiki has two main goals: 1. Making decompilation knowledge more accessible to new-comers in the field @@ -45,10 +45,8 @@ The Decompilation Wiki was started by [Zion Leonahenahe Basque](https://zionbasq The wiki is highly inspired by the following sources: - [Program-Transformation.org](https://www.program-transformation.org/): a wiki on program transformations, including some decompilation. -- [CTF Wiki](https://ctf-wiki.org/): a wiki for Capture the Flag, inspiring this layout and design -- ["30 Years into Scientific Binary Decompilation", Dr. Ruoyu (Fish) Wang](https://www.youtube.com/watch?v=XasallkPQIA) - +- [CTF Wiki](https://ctf-wiki.org/): a wiki for Capture the Flag, inspiring this layout and design. +- ["30 Years into Scientific Binary Decompilation"](https://www.youtube.com/watch?v=XasallkPQIA), Dr. Ruoyu (Fish) Wang: a source of information on decompilers. [^1]: Yakdan, Khaled, et al. ["Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study."](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7546501&casa_token=Pl69lA763yoAAAAA:0rH6AIEbiBhbUGGaSvJvhaYeFEaWPnIifVHceQTGkd_k4NQK6EDH_zcytY-I-W6OE5oHbdU) 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016. - diff --git a/docs/applications/overview.md b/docs/applications/overview.md index e69de29..443b64a 100644 --- a/docs/applications/overview.md +++ b/docs/applications/overview.md @@ -0,0 +1,12 @@ +# Decompiler Applications +Across the internet, there are many ways people have used decompilers in the wild. +In this section, you can find a collection of some of those use cases. + +As an example, some decompilation uses include: + +- Program Reversing: used to understand how a program works and how to interact with it. +- [Program Reconstruction](/applications/program_reconstruction): used to completely (or partially) recompile the targetted binary +- Automated Program Repair: used for patching faulty programs +- Manual & Automated Vuln Discovery: used for finding vulnerabilities + +For links to full decompilers, see the [Decompilers](/decompilers/directory) section. \ No newline at end of file diff --git a/docs/applications/program_reconstruction.md b/docs/applications/program_reconstruction.md index aa21603..9a08f38 100644 --- a/docs/applications/program_reconstruction.md +++ b/docs/applications/program_reconstruction.md @@ -1,9 +1,10 @@ # Program Reconstruction When source code is unavailable for a compiled program, users may want to recover the source code so they can make edits to it and recompile it. -In the video game scene, this is referred to as modding. +In the video game scene, this can be useful for modding or full program recovery. ## Video Games -A list of video games that are being reversed with a decompiler to recompile either the entire game or individual functions: +A list of video games that are being reversed with a decompiler to recompile either the entire game or individual functions. +Most projects include a percent completion of the estimated program recompilation: - [Halo: Combat Evolved (halo-re)](https://github.com/halo-re/halo) - [Lego Island (1997)](https://github.com/isledecomp/isle) diff --git a/docs/applied_research/code_sim.md b/docs/applied_research/code_sim.md new file mode 100644 index 0000000..b36fcf1 --- /dev/null +++ b/docs/applied_research/code_sim.md @@ -0,0 +1,25 @@ +# Code Similarity +## Introduction +In cases such as malware identification, the ability to estimate code similarity among binaries is critical[^1]. +Research in this area generally looks at ways to improve the reliability of similarity detection among binaries. + +There is little work in the direct use of decompilation for code similarity, however, the general work in the binary analysis is frequent. +These works are included here since they often touch on or improve fundamental components in decompilation. + +The most direct research in this area has utilized Ghidra decompilation to identify inlined functions in decompilation[^2]. + +## Related Works +Many works have progressed towards binary-based code similarity that do not explicitly use decompilation [^1][^3][^4][^5][^6]. +Most of these works have improved code similarity techniques indirectly by improving it for their specific uses cases. +These uses have included malware identification[^1], duplicated bug hunting[^3][^4], and code reuse[^5]. + +Recent work has suggested that machine learning has made significant strides in this area[^6]. + + +[^1]: Hu, Xin, Tzi-cker Chiueh, and Kang G. Shin. "Large-scale malware indexing using function-call graphs." Proceedings of the 16th ACM conference on Computer and communications security. 2009. +[^2]: Ahmed, Toufique, Premkumar Devanbu, and Anand Ashok Sawant. "Finding Inlined Functions in Optimized Binaries." arXiv preprint arXiv:2103.05221 (2021). +[^3]: Feng, Qian, et al. "Scalable graph-based bug search for firmware images." Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 2016. +[^4]: Eschweiler, Sebastian, Khaled Yakdan, and Elmar Gerhards-Padilla. "Discovre: Efficient cross-architecture identification of bugs in binary code." Ndss. Vol. 52. 2016. +[^5]: Mirzaei, Omid, et al. "Scrutinizer: Detecting code reuse in malware via decompilation and machine learning." Detection of Intrusions and Malware, and Vulnerability Assessment: 18th International Conference, DIMVA 2021, Virtual Event, July 14–16, 2021, Proceedings 18. Springer International Publishing, 2021. +[^6]: Marcelli, Andrea, et al. "How machine learning is solving the binary function similarity problem." 31st USENIX Security Symposium (USENIX Security 22). 2022. + diff --git a/docs/applied_research/library_identification.md b/docs/applied_research/library_identification.md deleted file mode 100644 index e69de29..0000000 diff --git a/docs/applied_research/overview.md b/docs/applied_research/overview.md index e69de29..bdf1d9f 100644 --- a/docs/applied_research/overview.md +++ b/docs/applied_research/overview.md @@ -0,0 +1,33 @@ +# Applied Research Overview +Decompiler research that does not neatly fit into one of the [fundamental](/fundamentals/overview) areas is defined here as applied research. +Research in this area contributes to a specific use-case of decompilation that may not necessarily improve base decompilation. + +As an example, most researchers would agree that variable name prediction in stripped binaries is an important research area[^1]. +However, as it stands, variable name prediction does not improve any fundamental research area (except neural decompilation). +As such, we consider it an applied research area, with that target being human-comprehensible decompilation. + +This section is ever-growing as new research areas are explored in decompilation. +Currently, the following areas exist: + +- [Symbol Recovery](/applied_research/symbol_recovery): recovering the names or high-level symbols that are associated with a function or variable +- [Code Similarity](/applied_research/code_sim): measuring how similar (for various uses) some binary is to another +- [Vulnerability Discovery](/applied_research/vuln_discovery): tuning decompilation to be better used for vulnerability discovery + + +## Other Research +Some research areas don't have enough work to define a label for them. +The following works are listed here: + +- Byte-exact recompilable decompilation[^4] +- Patchable decompilation[^2] +- Verifiable decompilation[^3] +- Higher abstraction support[^5][^6][^7] + + +[^1]: Pal, Kuntal Kumar, et al. ""Len or index or count, anything but v1": Predicting Variable Names in Decompilation Output with Transfer Learning." 2024 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2024. +[^2]: Reiter, Pemma, et al. "Automatically mitigating vulnerabilities in x86 binary programs via partially recompilable decompilation." arXiv preprint arXiv:2202.12336 (2022). +[^3]: Verbeek, Freek, Pierre Olivier, and Binoy Ravindran. "Sound C Code Decompilation for a subset of x86-64 Binaries." Software Engineering and Formal Methods: 18th International Conference, SEFM 2020, Amsterdam, The Netherlands, September 14–18, 2020, Proceedings 18. Springer International Publishing, 2020. +[^4]: Schulte, Eric, et al. "Evolving exact decompilation." Workshop on Binary Analysis Research (BAR). 2018. +[^5]: Fokin, Alexander, et al. "SmartDec: approaching C++ decompilation." 2011 18th Working Conference on Reverse Engineering. IEEE, 2011. +[^6]: Wu, Ruoyu, et al. "{DnD}: A {Cross-Architecture} deep neural network decompiler." 31st USENIX Security Symposium (USENIX Security 22). 2022. +[^7]: Liu, Zhibo, et al. "Decompiling x86 deep neural network executables." 32nd USENIX Security Symposium (USENIX Security 23). 2023. \ No newline at end of file diff --git a/docs/applied_research/symbol_recovery.md b/docs/applied_research/symbol_recovery.md index e69de29..01a5b74 100644 --- a/docs/applied_research/symbol_recovery.md +++ b/docs/applied_research/symbol_recovery.md @@ -0,0 +1,37 @@ +# Symbol Recovery +## Introduction +A symbol, in the context of binaries, is a name associated with an object. +In most cases, this is either function names or variable names. +It is often useful for reverse engineering to have the original symbols to more quickly understand the purpose of an object. + +## Symbol Recovery Example +Below is a snippet of a C program: +```c +int mode; +char* name; +long long timezone; +``` + +After compiling and [stripping](https://en.wikipedia.org/wiki/Strip_(Unix)), a common developer practice, the binary will be decompiled to something like: +```c +int v1; +char* v2; +long long v3; +``` + +Assuming the types are recovered perfectly (hard), it is still hard to understand what these variables do. + + +## Previous Work +Research in this area has been concerned with the recovery of both variable names[^1][^2][^4][^5][^6][^7] and function names[^3][^5]. +Approaches have varied between using neural networks[^2][^3][^6][^7], machine translation[^4], probabilistic methods[^5], and BERT-based language models[^1]. +In many cases, the bottleneck of this work has been dataset generation[^1]. + + +[^1]: Pal, Kuntal Kumar, et al. ""Len or index or count, anything but v1": Predicting Variable Names in Decompilation Output with Transfer Learning." 2024 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2024. +[^2]: Dramko, Luke, et al. "DIRE and its data: Neural decompiled variable renamings with respect to software class." ACM Transactions on Software Engineering and Methodology 32.2 (2023): 1-34. +[^3]: Artuso, Fiorella, et al. "Function naming in stripped binaries using neural networks." arXiv preprint arXiv:1912.07946 (2019). +[^4]: Jaffe, Alan, et al. "Meaningful variable names for decompiled code: A machine translation approach." Proceedings of the 26th Conference on Program Comprehension. 2018. +[^5]: He, Jingxuan, et al. "Debin: Predicting debug information in stripped binaries." Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018. +[^6]: DIRE: A Neural Approach to Decompiled Identifier Naming +[^7]: Chen, Qibin, et al. "Augmenting decompiler output with learned variable names and types." 31st USENIX Security Symposium (USENIX Security 22). 2022. \ No newline at end of file diff --git a/docs/applied_research/vuln_discovery.md b/docs/applied_research/vuln_discovery.md new file mode 100644 index 0000000..ec739f7 --- /dev/null +++ b/docs/applied_research/vuln_discovery.md @@ -0,0 +1,14 @@ +# Vulnerability Discovery +In many uses of decompilation, humans, or machines, aim to understand if a program is safe. +To verify if this program is safe, they attempt to do the opposite: find vulnerabilities in the program. +Some decompilers, and their associated research, have attempted to tune their decompilers to be better at this task[^1]. +There has also been work at evaluating decompilers by how well they perform with source tools[^2]. + +Most research in this area has focused on static analysis[^1][^2][^3] and [symbolic execution](https://en.wikipedia.org/wiki/Symbolic_execution)[^4] applied to decompilation. +Since these tasks have often been researched with source, an application to binaries has been achieved through decompilation. + + +[^1]: Botacin, Marcus, et al. "Revenge is a dish served cold: Debug-oriented malware decompilation and reassembly." Proceedings of the 3rd Reversing and Offensive-oriented Trends Symposium. 2019. +[^2]: Mantovani, Alessandro, et al. "The Convergence of Source Code and Binary Vulnerability Discovery--A Case Study." Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. 2022. +[^3]: Park, Jihee, et al. "Static Analysis of JNI Programs via Binary Decompilation." IEEE Transactions on Software Engineering (2023). +[^4]: Han, HyungSeok, et al. "QueryX: Symbolic Query on Decompiled Code for Finding Bugs in COTS Binaries." 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023. \ No newline at end of file diff --git a/docs/decompilers/tools.md b/docs/decompilers/tools.md index e69de29..1153459 100644 --- a/docs/decompilers/tools.md +++ b/docs/decompilers/tools.md @@ -0,0 +1,11 @@ +# Tools +The community continues to extend decompilers outside of fundamental improvements in the form of plugins and tools. +Here you can find a listing of tools and plugins used by the community. + +### Generic Tools +Tools that work in _most_ popular decompilers. + +- [BinDiff](https://github.com/google/bindiff): A decompiler-based diffing tool for binaries. +- [BinSync](https://binsync.net): A Git-based collaboration framework for decompilers. Supports IDA, Ghidra, Binja, and angr decompiler. +- [DogBolt](https://dogbolt.org/): A web-based tool for comparing popular decompiler's decompilation. +- [RevSync](https://github.com/lunixbochs/revsync): A synchronization tool for decompilers. Supports IDA & Binja. diff --git a/docs/fundamentals/cfg_recovery/overview.md b/docs/fundamentals/cfg_recovery/overview.md index b80ba6a..08c6215 100644 --- a/docs/fundamentals/cfg_recovery/overview.md +++ b/docs/fundamentals/cfg_recovery/overview.md @@ -17,7 +17,8 @@ Most decompilers will use an IL to make their later analyses more widely applica Methods for evaluating improvements in this field are also of note but have had very limited work. The most recent of these works has focused on instrumenting and comparing to the compile-generated CFG[^2][^3]. -There has been little work in replacing CFG recovery algorithms with a machine learning model[^4]. +There has been little work in replacing CFG recovery algorithms with a machine-learning model[^4]. +Related work in binary analysis has looked at how these recovered CFGs may be instrumentable[^5]. ## Graph Recovery Example An example C program is shown below: @@ -67,4 +68,5 @@ The structure of the graph will also look the same if lifted to an IL. [^1]: Allen, Frances E. "Control flow analysis." ACM Sigplan Notices 5.7 (1970): 1-19. [^2]: Pang, Chengbin, et al. "Ground truth for binary disassembly is not easy." 31st USENIX Security Symposium (USENIX Security 22). 2022. [^3]: Pang, Chengbin, et al. "Sok: All you ever wanted to know about x86/x64 binary disassembly but were afraid to ask." 2021 IEEE symposium on security and privacy (SP). IEEE, 2021. -[^4]: Yu, Shih-Yuan, et al. "Cfg2vec: Hierarchical graph neural network for cross-architectural software reverse engineering." 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2023. \ No newline at end of file +[^4]: Yu, Shih-Yuan, et al. "Cfg2vec: Hierarchical graph neural network for cross-architectural software reverse engineering." 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2023. +[^5]: Di Bartolomeo, Luca, Hossein Moghaddas, and Mathias Payer. "{ARMore}: Pushing Love Back Into Binaries." 32nd USENIX Security Symposium (USENIX Security 23). 2023. \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index c7a45f8..8b9d1be 100644 --- a/docs/index.md +++ b/docs/index.md @@ -5,7 +5,7 @@

The Decompilation Wiki is a collection of categorized information on all things decompilation. -From real-world applications to cutting-edge research papers, the Decompilation Wiki has it all! Join our Discord below for active community engagement. To get involved, see our [contribution guide](/docs/contributing). +From real-world applications to cutting-edge research papers, the Decompilation Wiki has it all! Join our Discord below for active community engagement. To get involved, see our [contribution guide](/contributing). [![Discord](https://dcbadge.vercel.app/api/server/hE7prXNt7t)](https://discord.gg/hE7prXNt7t) @@ -24,9 +24,9 @@ Decompilation has wide applications across cyber security, including: - vulnerability discovery (the understanding of program flaws) - malware classification - program repair -- [much more...](/docs/applications/overview/). +- [and much more...](/applications/overview/) -## Wiki Goals +## Wiki Goals? This wiki has two main goals: 1. Making decompilation knowledge more accessible to new-comers in the field @@ -45,10 +45,8 @@ The Decompilation Wiki was started by [Zion Leonahenahe Basque](https://zionbasq The wiki is highly inspired by the following sources: - [Program-Transformation.org](https://www.program-transformation.org/): a wiki on program transformations, including some decompilation. -- [CTF Wiki](https://ctf-wiki.org/): a wiki for Capture the Flag, inspiring this layout and design -- ["30 Years into Scientific Binary Decompilation", Dr. Ruoyu (Fish) Wang](https://www.youtube.com/watch?v=XasallkPQIA) - - +- [CTF Wiki](https://ctf-wiki.org/): a wiki for Capture the Flag, inspiring this layout and design. +- ["30 Years into Scientific Binary Decompilation"](https://www.youtube.com/watch?v=XasallkPQIA), Dr. Ruoyu (Fish) Wang: a source of information on decompilers. [^1]: Yakdan, Khaled, et al. ["Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study."](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7546501&casa_token=Pl69lA763yoAAAAA:0rH6AIEbiBhbUGGaSvJvhaYeFEaWPnIifVHceQTGkd_k4NQK6EDH_zcytY-I-W6OE5oHbdU) 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016. diff --git a/mkdocs.yml b/mkdocs.yml index 0605e9e..449ab22 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -32,7 +32,6 @@ nav: - Decompilers 💾: - decompilers/directory.md - decompilers/tools.md - - decompilers/history.md - Fundamentals 🔍: - fundamentals/overview.md - Control Flow Graph Recovery: @@ -51,7 +50,8 @@ nav: - Applied Research ⚙️: - applied_research/overview.md - applied_research/symbol_recovery.md - - applied_research/library_identification.md + - applied_research/code_sim.md + - applied_research/vuln_discovery.md - Applications 🌍: - applications/overview.md - applications/program_reconstruction.md