Skip to content

Commit

Permalink
Add public release of VarBERT code from @Atipriya (#24)
Browse files Browse the repository at this point in the history
* Add initial local model for variable renaming code

* update images

* try again

* Update README

* rename varmodel to varbert
  • Loading branch information
mahaloz authored Dec 6, 2023
1 parent acd43d0 commit f7d841d
Show file tree
Hide file tree
Showing 7 changed files with 75 additions and 28 deletions.
24 changes: 18 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,25 @@
# DAILA
The Decompiler Artificial Intelligence Language Assistant (DAILA) is a unified interface for AI systems to be used in decompilers.
Using DAILA, you can utilize various AI systems, like local and remote LLMs, all in the same scripting and GUI interfaces.
Power up your decompilation experience with AI!

<img src="./assets/ida_daila.png" style="width: 50%;" alt="DAILA context menu"/>
![](./assets/ida_daila.png)

DAILA's main purpose is to provide a unified interface for AI systems to be used in decompilers.
To accomplish this, DAILA provides a lifted interface, relying on the BinSync library [LibBS](https://github.com/binsync/libbs) to abstract away the decompiler.
DAILA provides a lifted interface, relying on the BinSync library [LibBS](https://github.com/binsync/libbs) to abstract away the decompiler.
**All decompilers supported in LibBS are supported in DAILA, which currently includes IDA, Ghidra, Binja, and angr-management.**
Currently, there are two AI systems supported in DAILA: [OpenAI](https://openai.com/) and [VarBERT](https://github.com/binsync/varbert_api),
the latter of which is a local model for renaming variables in decompilation published in S&P 2024.


## Installation
Install our library backend through pip and our decompiler plugin through our installer:
```bash
pip3 install dailalib && daila --install
```

### Ghidra Extras
This will also download the VarBERT models for you through the [VarBERT API](https://github.com/binsync/varbert_api).

### Ghidra Extra Steps
You need to do a few extra steps to get Ghidra working.
Next, enable the DAILA plugin:
1. Start Ghidra and open a binary
Expand All @@ -38,7 +43,7 @@ DAILA is designed to be used in two ways:
With the exception of Ghidra (see below), when you start your decompiler you will have a new context menu
which you can access when you right-click anywhere in a function:

<img src="./assets/ida_show_menu_daila.png" style="width: 50%;" alt="DAILA context menu"/>
<img src="./assets/ida_daila.png" style="width: 50%;" alt="DAILA context menu"/>

If you are using Ghidra, go to `Tools->DAILA->Start DAILA Backend` to start the backend server.
After you've done this, you can use the context menu as shown above.
Expand All @@ -58,7 +63,7 @@ for function in deci.functions:


## Supported AI Backends
### OpenAI
### OpenAI (ChatGPT)
DAILA supports the OpenAI API. To use the OpenAI API, you must have an OpenAI API key.
If your decompiler does not have access to the `OPENAI_API_KEY` environment variable, then you must use the decompiler option from
DAILA to set the API key.
Expand All @@ -69,6 +74,13 @@ Currently, DAILA supports the following prompts:
- Rename function
- Identify the source of a function

### VarBERT
VarBERT is a local BERT model from the S&P 2024 paper [""Len or index or count, anything but v1": Predicting Variable Names in Decompilation Output with Transfer Learning"]().
VarBERT is for renaming variables (both stack, register, and arguments) in decompilation.
To understand how to use VarBERT as a library, please see the [VarBERT API](https://github.com/binsync/varbert_api) documentation.
Using it in DAILA is a simple as using the GUI context-menu when clicking on a function.


## Supported Decompilers
- IDA
![](./assets/ida_daila.png)
Expand Down
Binary file modified assets/ida_daila.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed assets/ida_show_menu_daila.png
Binary file not shown.
33 changes: 26 additions & 7 deletions dailalib/__init__.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,37 @@
__version__ = "2.0.0"
__version__ = "2.1.0"

from .api import AIAPI, OpenAIAPI
from libbs.api import DecompilerInterface


def create_plugin(*args, **kwargs):

ai_api = OpenAIAPI(delay_init=True)
#
# OpenAI API (ChatGPT)
#

openai_api = OpenAIAPI(delay_init=True)
# create context menus for prompts
gui_ctx_menu_actions = {
f"DAILA/{prompt_name}": (prompt.desc, getattr(ai_api, prompt_name))
for prompt_name, prompt in ai_api.prompts_by_name.items()
f"DAILA/OpenAI/{prompt_name}": (prompt.desc, getattr(openai_api, prompt_name))
for prompt_name, prompt in openai_api.prompts_by_name.items()
}
# create context menus for others
gui_ctx_menu_actions["DAILA/Update API Key"] = ("Update API Key", ai_api.ask_api_key)
gui_ctx_menu_actions["DAILA/OpenAI/update_api_key"] = ("Update API Key", openai_api.ask_api_key)

#
# VarModel API (local variable renaming)
#

from varbert import VariableRenamingAPI
var_api = VariableRenamingAPI(delay_init=True)
# add single interface, which is to rename variables
gui_ctx_menu_actions["DAILA/VarBERT/varbert_rename_vars"] = ("Suggest new variable names", var_api.query_model)

#
# Decompiler Plugin Registration
#

# create decompiler interface
force_decompiler = kwargs.pop("force_decompiler", None)
deci = DecompilerInterface.discover_interface(
force_decompiler=force_decompiler,
Expand All @@ -26,6 +42,9 @@ def create_plugin(*args, **kwargs):
ui_init_args=args,
ui_init_kwargs=kwargs
)
ai_api.init_decompiler_interface(decompiler_interface=deci)

openai_api.init_decompiler_interface(decompiler_interface=deci)
if var_api is not None:
var_api.init_decompiler_interface(decompiler_interface=deci)

return deci.gui_plugin
9 changes: 7 additions & 2 deletions dailalib/api/ai_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ def __init__(
# useful for initing after the creation of a decompiler interface
self._dec_interface: Optional[DecompilerInterface] = None
self._dec_name = None
if not delay_init:
self._delay_init = delay_init
if not self._delay_init:
self.init_decompiler_interface(decompiler_interface, decompiler_name, use_decompiler)

self._min_func_size = min_func_size
Expand Down Expand Up @@ -75,6 +76,10 @@ def _requires_function(*args, ai_api: "AIAPI" = None, **kwargs):
function = kwargs.pop("function", None)
dec_text = kwargs.pop("dec_text", None)
use_dec = kwargs.pop("use_dec", True)
has_self = kwargs.pop("has_self", True)
# make the self object the new AI API, should only be used inside an AIAPI class
if not ai_api and has_self:
ai_api = args[0]

if not dec_text and not use_dec:
raise ValueError("You must provide decompile text if you are not using a dec backend")
Expand All @@ -87,7 +92,7 @@ def _requires_function(*args, ai_api: "AIAPI" = None, **kwargs):

# we must have a UI if we have no func
if function is None:
function = ai_api._dec_interface.active_context()
function = ai_api._dec_interface.functions[ai_api._dec_interface.active_context().addr]

# get new text with the function that is present
if dec_text is None:
Expand Down
34 changes: 22 additions & 12 deletions dailalib/installer.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

class DAILAInstaller(PluginInstaller):
def __init__(self):
super().__init__(targets=("ida", "ghidra", "binja"))
super().__init__(targets=("ida", "ghidra", "binja", "angr"))
self.pkg_path = Path(str(importlib.resources.files("dailalib"))).absolute()

def _copy_plugin_to_path(self, path):
Expand All @@ -18,17 +18,12 @@ def _copy_plugin_to_path(self, path):
def display_prologue(self):
print(textwrap.dedent("""
Now installing...
▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄ ▄ ▄▄▄▄▄▄▄▄▄▄▄
▐░░░░░░░░░░▌ ▐░░░░░░░░░░░▌▐░░░░░░░░░░░▌▐░▌ ▐░░░░░░░░░░░▌
▐░█▀▀▀▀▀▀▀█░▌▐░█▀▀▀▀▀▀▀█░▌ ▀▀▀▀█░█▀▀▀▀ ▐░▌ ▐░█▀▀▀▀▀▀▀█░▌
▐░▌ ▐░▌▐░▌ ▐░▌ ▐░▌ ▐░▌ ▐░▌ ▐░▌
▐░▌ ▐░▌▐░█▄▄▄▄▄▄▄█░▌ ▐░▌ ▐░▌ ▐░█▄▄▄▄▄▄▄█░▌
▐░▌ ▐░▌▐░░░░░░░░░░░▌ ▐░▌ ▐░▌ ▐░░░░░░░░░░░▌
▐░▌ ▐░▌▐░█▀▀▀▀▀▀▀█░▌ ▐░▌ ▐░▌ ▐░█▀▀▀▀▀▀▀█░▌
▐░▌ ▐░▌▐░▌ ▐░▌ ▐░▌ ▐░▌ ▐░▌ ▐░▌
▐░█▄▄▄▄▄▄▄█░▌▐░▌ ▐░▌ ▄▄▄▄█░█▄▄▄▄ ▐░█▄▄▄▄▄▄▄▄▄ ▐░▌ ▐░▌
▐░░░░░░░░░░▌ ▐░▌ ▐░▌▐░░░░░░░░░░░▌▐░░░░░░░░░░░▌▐░▌ ▐░▌
▀▀▀▀▀▀▀▀▀▀ ▀ ▀ ▀▀▀▀▀▀▀▀▀▀▀ ▀▀▀▀▀▀▀▀▀▀▀ ▀ ▀
██████ █████ ██ ██ █████
██ ██ ██ ██ ██ ██ ██ ██
██ ██ ███████ ██ ██ ███████
██ ██ ██ ██ ██ ██ ██ ██
██████ ██ ██ ██ ███████ ██ ██
The Decompiler AI Language Assistant
"""))
Expand Down Expand Up @@ -64,3 +59,18 @@ def install_angr(self, path=None, interactive=True):

self._copy_plugin_to_path(path)
return path

def display_epilogue(self):
super().display_epilogue()
print("")
self.install_local_models()

def install_local_models(self):
self.info("We will now download local models for each decompiler you've installed. Ctrl+C to cancel.")
self.install_varmodel_models()

def install_varmodel_models(self):
self.info("Installing VarBERT models...")
from varbert import install_model as install_varbert_model
for target in self._successful_installs:
install_varbert_model(target, opt_level="O0")
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,14 @@ classifiers = [
"Programming Language :: Python :: 3.8",
]
license = {text = "BSD 2 Clause"}
description = "Decompiler Artificial Intelligence Language Assistant"
description = "The Decompiler Artificial Intelligence Language Assistant (DAILA) is a tool for adding AI to decompilers."
urls = {Homepage = "https://github.com/mahaloz/DAILA"}
requires-python = ">= 3.8"
dependencies = [
"openai>=1.0.0",
"libbs",
"tiktoken",
"varbert>=2.0.1"
]
dynamic = ["version"]

Expand Down

0 comments on commit f7d841d

Please sign in to comment.