python-idb is a library for accessing the contents of IDA Pro databases (.idb files). It provides read-only access to internal structures such as the B-tree (ID0 section), name address index (NAM section), and flags index (ID2 section). The library also provides analysis of B-tree entries to expose logical structures like functions, cross references, bytes, and disassembly (via Capstone). An example use for python-idb might be to run IDA scripts in a pure-Python environment.
Willem Hengeveld (mailto:[email protected]) provided the initial research into the low-level structures in his projects pyidbutil and idbutil. Willem deserves substantial credit for reversing the .idb file format and publishing his results online. This project heavily borrows from his knowledge, though there is little code overlap.
In this example, we list the effective addresses and names of functions:
In [4]: import idb
...: with idb.from_file('./data/kernel32/kernel32.idb') as db:
...: api = idb.IDAPython(db)
...: for ea in api.idautils.Functions():
...: print('%x: %s' % (ea, api.idc.GetFunctionName(ea)))
Out [4]: 68901010: GetStartupInfoA
....: 689011df: Sleep
....: 68901200: MulDiv
....: 68901320: SwitchToFiber
....: 6890142c: GetTickCount
....: 6890143a: ReleaseMutex
....: 68901445: WaitForSingleObject
....: 68901450: GetCurrentThreadId
...
Note that we create an emulated instance of the IDAPython scripting interface, and use
this to invoke idc
and idautils
routines to fetch data.
In this example, we run the yara_fn.py IDAPython script to generate a YARA rule for the function at effective address 0x68901695 in kernel32.idb:
The target script yara_fn.py
has only been slightly modified:
- to make it Python 3.x compatible, and
- to use the modern IDAPython modules, such as
ida_bytes.GetManyBytes
rather thanidc.GetManyBytes
.
- ~250 unit tests that demonstrate functionality including file format, B-tree, analysis, and idaapi features.
- read-only parsing of .idb and .i64 files from IDA Pro v6.95 and v7.0
- extraction of file sections
- B-tree lookups and queries (ID0 section)
- flag enumeration (ID1 section)
- named address listing (NAM section)
- analysis of artifacts that reconstructs logical elements, including:
- root metadata
- loader metadata
- entry points
- functions
- structures
- cross references
- fixups
- segments
- partial implementation of the IDAPython API, including:
Names
Heads
Segs
GetMnem
(via Capstone)Functions
FlowChart
(basic blocks)- lots and lots of flags
- Python 2.7 & 3.x compatibility
- zlib-packed idb/i64 files
support for the following features are feasible and planned, but not yet implemented:
- databases from versions other than v6.95 and v7.0b
- parsing TIL section
- write access
python-idb is a pure-Python library, with the exception of Capstone (required only when calling disassembly APIs).
You can install it via pip or setup.py install
, both of which should handle depedency resolution:
$ cd ~/Downloads/python-idb/
$ python setup.py install
$ python scripts/run_ida_script.py ~/tools/yara_fn.py ~/Downloads/kernel32.idb
... profit! ...
While most python-idb function have meaningful docstrings, there is not yet a comprehensive documentation website. However, the unit tests demonstrate functionality that you'll probably find useful.
Someone interested in learning the file format and contributing to the project should review the idb.fileformat
module & tests.
Those that are looking to extract meaningful information from existing .idb files probably should look at the idb.analysis
and idb.idapython
modules & tests.
Please report issues or feature requests through Github's bug tracker associated with the project.
python-idb is licensed under the Apache License, Version 2.0. This means it is freely available for use and modification in a personal and professional capacity.