Skip to content

Internal representation

John Källén edited this page Apr 27, 2017 · 1 revision

Reko maintains the analyzed binary program in a format called IR. It's useful for developers and users to see this format to gain an understanding of the internals of the analysis. A sample below will serve as a starting point for the definition of the text format of the internal representation.

This identifies the architecture and platform Reko used to load the original binary

target "x86-Win32"

The function factorial is being defined. It was found at address 00123400. Nothing is known about its type ("signature")

define factorial 
    addr "00123400"
{

The 32-bit identifiers ecx,eax, and esp are stored at bit offset 0 in the x86 registers ECX, EAX, and ESP, respectively at bit offset 0

    def ecx:word32 reg(ecx 0) 
    def eax:word32 reg(eax 0)
    def esp:word32 reg(esp 0)

The id edx_eax is a sequence formed by treating the two x86 registers edx and eax as a single unit.

    def edx_eax:word64 seq(reg(edx 0) reg(eax 0)

SCZO is a group of flags stored in the x86 FLAGS register at the bit positions indicated by the bit mask 0xF.

    def SCZO:byte flags("flags", 0xF)

fp is a temporary introduced by Reko that doesn't map to an x86 register. It is used to represent the stack frame of the procedure. The id esp is initialized to fp, corresponding to the stack pointer pointing to the stack frame at the start of the procedure.

    def fp:ptr32
    esp = fp

The first basic block sets the SCZO flag set with the outcome of subtracting the 32-bit constant 1:word32 from register ecx (which clearly is an input parameter). The Test expresses which bits the branch instruction looks at to determine a LE condition.

l0:
    SCZO = cond(ecx - 1:word32)
    if Test(SZ,LE) branch l2

An x86 push instruction is lifted to a stack pointer adjustment and a store.

l1:
    esp = esp - 4:word32
    Mem0[esp:word32] = ecx

The recursive call to factorial pushes a 4-byte return value on the stack. Here a type indicator for the '4' is not needed because in the RtlCall instruction, it's always an integer.

    call factorial 4

A rewritten x86 pop instruction restores the old value of ecx before the call.

    ecx = Mem0[esp:word32]
    esp = esp + 4:word32

The imul instruction returns its result in a register pair.

    edx_eax = eax *s ecx
    SCZO = cond(edx_eax)
    goto l3

The base case of the induction sets eax to 1.

l2:
    eax = 1:word32

The return operation removes 4 bytes from the stack. Consider: make the stack adjustment explicit in the IR?

l3:
   return ; 4
}

Later stages of Reko will have modified this code by quite a bit. Here it is after SSA analysis, value propagation, and function signature analysis:

define word32 factorial(ecx:word32 reg(ecx 0))
    addr "00123400"
{
    def fp:ptr32
    esp_1 = fp
l0:
    SCZO_2 = cond(ecx - 1:word32)
    SZ_3 = SZCO_2
    if Test(ecx <= 1:word32) branch l2
l1:
    esp_4 = fp - 4:word32
    Mem5[fp - 4:word32] = ecx
    call factorial 4
        use: ecx
	def: eax_6 Mem7
    ecx_7 = Mem7[fp - 4:word32]
    esp_8 = fp
    edx_eax_9 = eax_6 *s ecx_7
    eax_10 = (word32) edx_eax_9
    SCZO_11 = cond(edx_eax_9)
    goto l3
l2:
    eax_12 = 1:word32
l3:
    eax_13 = PHI(eax_10, eax_13)
    return eax_13
}
Clone this wiki locally