Skip to content

Latest commit

 

History

History
1190 lines (1023 loc) · 52.8 KB

File metadata and controls

1190 lines (1023 loc) · 52.8 KB

E9Tool User's Guide

NOTE: This guide is a work-in-progress and still incomplete.

E9Tool is a frontend for E9Patch. Basically, E9Tool translates high-level patching commands (i.e., what instructions to patch, and how to patch them) into low-level commands for E9Patch. E9Patch is very low-level tool and not designed to be used directly.


Contents


The matching language specifies what instructions should be patched by the corresponding patch (see below). Matchings are specified using the (--match MATCH) or (-M MATCH) command-line option. The basic form of a matching (MATCH) is a Boolean expression of TESTs using the following high-level grammar:

    MATCH ::=   TEST
              | ( MATCH )
              | not MATCH
              | MATCH and MATCH
              | MATCH or MATCH

Alternatively, C-style Boolean operations (!, &&, and ||) can be used instead of (not, and, and or).

Each TEST queries some specific property/attribute of the underlying instruction, defined using the following grammar:

    TEST ::=   defined ( ATTRIBUTE )
             | VALUES in ATTRIBUTE
             | ATTRIBUTE [ CMP VALUES ]

    VALUES ::=   REGULAR-EXPRESSION
               | VALUE [ , VALUE ] *
               | BASENAME [ INTEGER ]

    CMP ::=   = | == | != | > | >= | < | <=

A TEST tests some underlying instruction ATTRIBUTE using an integer, string or set comparison operator CMP. The following comparison operators are supported:

ComparisonTypeDescription
= or ==Integer or String Equality
!=Integer or String Disequality
>Integer Greater-than
>=Integer Greater-than-or-equal-to
<Integer Less-than
<=Integer Less-than-or-equal-to
inSet Set membership

If the comparison operator and value are omitted, then the test is equivalent to (ATTRIBUTE != 0).

A VALUE can be either:

  • An integer constant, e.g., 123, 0x123, etc.
  • A string constant, e.g., "abc", etc.
  • An enumeration value such as register names (rax, eax, etc.), operand types (imm, reg, mem), etc.
  • A symbolic address of the form NAME, where NAME is any section or symbol name from the input ELF file. A symbolic address has type Integer.

For string attributes, the value can be a regular expression. This means that the corresponding attribute value must either match (for ==) or not match (for !=) the regular expression, depending on the comparison operator.


The following ATTRIBUTEs (with corresponding types) are supported:

AttributeTypeDescription
trueBooleanTrue
falseBooleanFalse
jumpBoolean True for jump instructions, false otherwise
condjumpBoolean True for conditional jump instructions, false otherwise
callBoolean True for call instructions, false otherwise
returnBoolean True for return instructions, false otherwise
asmString The assembly string representation
mnemonicString The mnemonic
sectionString The section name
addrInteger The ELF virtual address
offsetInteger The ELF file offset
sizeInteger The size of the instruction in bytes
randomInteger A random value [0..RAND_MAX]
targetInteger The jump/call target (if statically known).
x87Boolean True for x87 instructions, false otherwise
mmxBoolean True for MMX instructions, false otherwise
sseBoolean True for SSE instructions, false otherwise
avxBoolean True for AVX instructions, false otherwise
avx2Boolean True for AVX2 instructions, false otherwise
avx512Boolean True for AVX512 instructions, false otherwise
op.sizeInteger The number of operands
src.sizeInteger The number of source operands
dst.sizeInteger The number of destination operands
imm.sizeInteger The number of immediate operands
reg.sizeInteger The number of register operands
mem.sizeInteger The number of memory operands
op[i]Operand The ith operand
src[i]Operand The ith source operand
dst[i]Operand The ith destination operand
imm[i]Operand The ith immediate operand
reg[i]Operand The ith register operand
mem[i]Operand The ith memory operand
op[i].type{imm,reg,mem} The ith operand type
src[i].type{imm,reg,mem} The ith source operand type
dst[i].type{imm,reg,mem} The ith destination operand type
op[i].access{-,r,w,rw} The ith operand access
src[i].access{-,r,w,rw} The ith source operand access
dst[i].access{-,r,w,rw} The ith destination operand access
reg[i].access{-,r,w,rw} The ith register operand access
mem[i].access{-,r,w,rw} The ith memory operand access
op[i].segRegister The ith operand segment register
src[i].segRegister The ith source operand segment register
dst[i].segRegister The ith destination operand segment register
mem[i].segRegister The ith memory operand segment register
op[i].dispInteger The ith operand displacement
src[i].dispInteger The ith source operand displacement
dst[i].dispInteger The ith destination operand displacement
mem[i].dispInteger The ith memory operand displacement
op[i].baseRegister The ith operand base register
src[i].baseRegister The ith source operand base register
dst[i].baseRegister The ith destination operand base register
mem[i].baseRegister The ith memory operand base register
op[i].indexRegister The ith operand index register
src[i].indexRegister The ith source operand index register
dst[i].indexRegister The ith destination operand index register
mem[i].indexRegister The ith memory operand index register
op[i].scaleInteger The ith operand scale
src[i].scaleInteger The ith source operand scale
dst[i].scaleInteger The ith destination operand scale
mem[i].scaleInteger The ith memory operand scale
regsSet<Register> The set of all accessed registers
readsSet<Register> The set of all read-from registers
writesSet<Register> The set of all written-to registers
plugin(NAME).match()Integer Value from NAME.so plugin

Here Register is the set of all x86_64 register names defined as follows:

    Register = {
        rip, rflags,
        es, cs, ss, ds, fs, gs,
        ah, ch, dh, bh,
        al, cl, dl, bl, spl, bpl, sil, dil, r8b, ..., r15b,
        ax, cx, dx, bx, sp, bp, si, di, r8w, ..., r15w,
        eax, ecx, edx, ebx, esp, ebp, esi, edi, r8d, ..., r15d,
        rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8, ..., r15,
        xmm0, ..., xmm31,
        ymm0, ..., ymm31,
        zmm0, ..., zmm31, ...}

An Operand can be one of three values:

  • An immediate value represented by an Integer
  • A register represented by a Register
  • A memory operand (not representable)

Thus the Operand type is the union of the Integer and Register types:

    Operand = Integer | Register

Not all attributes are defined for all instructions. For example, if the instruction has 3 operands, then only op[0], op[1], and op[2] will be defined, and op[3] and beyond will be undefined. Similarly, op[0].base will be undefined if the first operand of the instruction is not a memory operand.

Any test that uses an undefined value will fail. For example, both of the tests (op[3] == 0x1) and (op[3] != 0x1) will fail, despite each test being the negation of the other. The explicit Boolean operators (not, and, and or) treat failure due to undefinedness the same as false, thus the tests (op[3] != 0x1) and (not op[3] == 0x1) are not equivalent for undefined values.

The special defined(ATTRIBUTE) test can be used to determine if an attribute is defined or not.


  • (true): match every instruction.
  • (false): do not match any instruction.
  • (asm == jmp.*%r.*): match all instructions whose assembly representation matches the regular expression jmp.*%r.* (will match jump instructions that access a register).
  • (mnemonic == jmp): match all instructions whose mnemonic is jmp.
  • (addr == 0x4234a7): match the instruction at the virtual address 0x4234a7.
  • (addr == 0x4234a7,0x44bd6e,0x4514b4): match the instructions at the virtual addresses 0x4234a7, 0x44bd6e, and 0x4514b4.
  • (addr >= 0x4234a7 and addr <= 0x4514b4): match all instructions in the virtual address range 0x4234a7..0x4514b4
  • (op.size > 1): match all instructions with more than one operand.
  • (reg.size == 2): match all instructions with exactly two register operands.
  • (op[0] == 0x1234): match all instructions where the first operand is the immediate value 0x1234.
  • (op[0] == rax): match all instructions where the first operand is the %rax register.
  • (op[0].type == mem): match all instructions where the first operand is a memory operand.
  • (reg[0] == rax and reg[1] == rbx): match all instructions where the first and second register operands are %rax and %rbx respectively.
  • (mem[0].base == rax and mem[0].index == rbx): match all instructions with a memory operand with %rax as the base and %rbx as the index.
  • (mem[0].base == nil): match all instructions with a memory operand that does not use a base register.
  • (rflags in reads): match all instructions that read the flags register.
  • (rflags in writes): match all instructions that modify the flags register.
  • (not rflags in regs): match all instructions that do not access the flags register.
  • defined(mem[0]): match all instructions that have at least one memory operand.
  • (call and target == &malloc): match all direct calls to malloc().

Exclusions are an additional method for controlling which instructions are patched. An exclusion is specified by the (--exclude RANGE) or (or -E RANGE) command line option, where RANGE specifies a range of addresses that should not be disassembled or rewritten. Exclusions are more low-level than the matching language since the RANGE will not even be disassembled. This can help solve some problems, such as the binary storing data inside the .text section.

The general syntax for RANGE is:

    RANGE ::=   ADDR [ .. ADDR ]
    ADDR  ::=   VALUE [ + INTEGER ]
    VALUE ::=   INTEGER
              | SYMBOL
              | SECTION [ . ( start | end ) ]

For example:

  • 0x12345...0x45689: exclude a specific address range
  • .text..ChromeMain: exclude the .text section up to the symbol ChromeMain
  • .plt .. .text: exclude a range of sections
  • .plt.start .. .text.end: equivalent to the above
  • .plt .. .text.start: exclude all sections between .plt and the starting address of .text. The .text section itself will not be excluded.
  • malloc .. malloc+16: exclude the 16-byte PLT entry for malloc.
  • .text: exclude the entire .text section.

Note that a RANGE may include a lower and upper bound, i.e., LB .. UB. If the UB is omitted, then UB=LB is implied. The instruction at the address UB is not excluded, and disassembly will resume from this address. In other words, the syntax LB .. UB represents the address range [LB..UB), and E9Tool assumes that UB points to a valid instruction from which disassembly can resume.


The patch language specifies how to patch matching instructions from the input binary. Patches are specified using the (--patch PATCH) or (-P PATCH) command-line option, and must be paired with one or more matchings. The basic form of a patch (PATCH) uses the following high-level grammar:

    PATCH      ::= [ POSITION ] TRAMPOLINE
    POSITION   ::=   before
                   | replace
                   | after
    TRAMPOLINE ::=   empty
                   | break
                   | trap
                   | exit(CODE)
                   | print
                   | CALL
                   | if CALL break
                   | if CALL goto
                   | plugin(NAME).patch()

A patch is an optional position followed by a trampoline. The trampoline represents code that will be executed when control-flow reaches the matching instruction. The trampoline can be either a builtin trampoline, a call trampoline, or a trampoline defined by a plugin.


The builtin trampolines include:

PatchDescription
empty The empty trampoline
break Immediately return from trampoline
trap Execute a TRAP (int3) instruction
exit(CODE) Exit with CODE
print Printing the matching instruction

Here:

  • empty is the empty trampoline with no instructions. Control-flow is still redirected to/from empty trampolines, and this can be used to establish a baseline for benchmarking.
  • break immediately returns from the trampoline back to the main program.
  • trap executes a single TRAP (int3) instruction.
  • exit(CODE) will immediately exit from the program with status CODE.
  • print will print the assembly representation of the matching instruction to stderr. This can be used for testing and debugging.

A call trampoline calls a user-defined function that can be implemented in a high-level programming language such as C or C++. Call trampolines are the main way of implementing custom patches using E9Tool. The syntax for a call trampoline is as follows:

    CALL ::= FUNCTION [ ABI ] ARGS @ BINARY
    ABI  ::= < clean | naked >
    ARGS ::= ( ARG , ... )

The call trampoline specifies that the trampoline should call function FUNCTION from the binary BINARY with the arguments ARGS.

To use a call trampoline:

  1. Implement the desired patch as a function using the C or C++ programming language.
  2. Compile the patch program using the special e9compile.sh script to generate a patch binary.
  3. Use an E9Tool to call the patch function from the patch binary at the desired locations.

E9Tool will handle all of the low-level details, such as loading the patch binary into memory, passing the arguments to the function, and saving/restoring the CPU state.

For example, the following code defines a function that increments a counter. Once the counter exceeds some predefined maximum value, the function will execute the int3 instruction, causing SIGTRAP to be sent to the program.

    static unsigned long counter = 0;
    static unsigned long max = 100000;
    void entry(void)
    {
        counter++;
        if (counter >= max)
            asm volatile ("int3");
    }

Once defined, the program can be compiled using the e9compile.sh script.

    ./e9compile.sh counter.c

The e9compile.sh script is a gcc wrapper that ensures the generated binary is compatible with E9Tool. In this case, the script will generate a counter binary if compilation is successful.

Finally, the counter binary can be used as a call trampoline. For example, to generate a SIGTRAP after the 10000th xor instruction:

    ./e9tool -M 'mnemonic==xor' -P 'entry()@counter' ...

Call trampolines are primarily designed for ease-of-use and not for speed. For applications where speed is essential, it is recommended to design a custom trampoline using a plugin.


Call trampolines also support passing arguments to the called function. The syntax uses the C-style round brackets. For example:

    ./e9tool -M ... -P 'func(rip)@example' xterm

This specifies that the current value of the instruction pointer %rip should be passed as the first argument to the function func(). The called function can use this argument, e.g.:

    void func(const void *rip)
    {
        ...
    }

Call trampolines support up to eight arguments. The following arguments are supported:

ArgumentTypeDescription
Integerintptr_t An integer constant
Stringconst char * A string constant
&Nameconst void * The runtime address of the named section/symbol/PLT/GOT entry
static &Nameconst void * The ELF address of the named section/symbol/PLT/GOT entry
asmconst char * Assembly representation of the matching instruction
asm.sizesize_t The number of bytes in asm (including the nul character)
asm.lensize_t The string length of asm (excluding the nul character)
baseconst void * The runtime base address of the binary
configconst void * A pointer to the E9Patch configuration (see e9loader.h)
addrconst void * The runtime address of the matching instruction
static addrconst void * The ELF address of the matching instruction
idintptr_t A unique identifier (one per patch)
instrconst uint8_t * The machine-code bytes of the matching instruction
nextconst void * The runtime address of the next executed instruction
static nextconst void * The ELF address of the next executed instruction
offsetoff_t The ELF file offset of the matching instruction
targetconst void * The runtime address of the jump/call/return target, else NULL
static targetconst void * The ELF address of the jump/call/return target, else NULL
trampolineconst void * The runtime address of the trampoline
randomintptr_t A (statically generated) random integer [0..RAND_MAX]
sizesize_t The size of instr in bytes
statevoid * A pointer to a structure containing all general purpose registers
ah,...,dh, al,...,r15bint8_t The corresponding 8bit register
ax,...,r15wint16_t The corresponding 16bit register
eax,...,r15dint32_t The corresponding 32bit register
rax,...,r15int64_t The corresponding 64bit register
rflagsint16_t The %rflags register with format SF:ZF:0:AF:0:PF:1:CF:0:0:0:0:0:0:0:OF
ripconst void * The %rip register
&ah,...,&dh, &al,...,&r15bint8_t * The corresponding 8bit register (passed-by-pointer)
&ax,...,&r15wint16_t * The corresponding 16bit register (passed-by-pointer)
&eax,...,&r15dint32_t * The corresponding 32bit register (passed-by-pointer)
&rax,...,&r15int64_t * The corresponding 64bit register (passed-by-pointer)
&rflagsint16_t * The %rflags register (passed-by-pointer)
op[i]int8/16/32/64_t The matching instruction's ith operand
src[i]int8/16/32/64_t The matching instruction's ith source operand
dst[i]int8/16/32/64_t The matching instruction's ith destination operand
imm[i]int8/16/32/64_t The matching instruction's ith immediate operand
reg[i]int8/16/32/64_t The matching instruction's ith register operand
mem[i]int8/16/32/64_t The matching instruction's ith memory operand
&op[i](const) int8/16/32/64_t * The matching instruction's ith operand (passed-by-pointer)
&src[i](const) int8/16/32/64_t * The matching instruction's ith source operand (passed-by-pointer)
&dst[i]int8/16/32/64_t * The matching instruction's ith destination operand (passed-by-pointer)
&imm[i]const int8/16/32/64_t * The matching instruction's ith immediate operand (passed-by-pointer)
&reg[i](const) int8/16/32/64_t * The matching instruction's ith register operand (passed-by-pointer)
&mem[i]int8/16/32/64_t * The matching instruction's ith memory operand (passed-by-pointer)
op[i].sizesize_t The matching instruction's ith operand size
src[i].sizesize_t The matching instruction's ith source operand size
dst[i].sizesize_t The matching instruction's ith destination operand size
imm[i].sizesize_t The matching instruction's ith immediate operand size
reg[i].sizesize_t The matching instruction's ith register operand size
mem[i].sizesize_t The matching instruction's ith memory operand size
op[i].typeint8_t The matching instruction's ith operand type (1=immediate, 2=register, 3=memory operand)
src[i].typeint8_t The matching instruction's ith source operand type
dst[i].typeint8_t The matching instruction's ith destination operand type
imm[i].typeint8_t The matching instruction's ith immediate operand type
reg[i].typeint8_t The matching instruction's ith register operand type
mem[i].typeint8_t The matching instruction's ith memory operand type
op[i].accessint8_t The matching instruction's ith operand access (0x80 | PROT_READ | PROT_WRITE)
src[i].accessint8_t The matching instruction's ith source operand access
dst[i].accessint8_t The matching instruction's ith destination operand access
imm[i].accessint8_t The matching instruction's ith immediate operand access
reg[i].accessint8_t The matching instruction's ith register operand access
mem[i].accessint8_t The matching instruction's ith memory operand access
op[i].dispint32_t The matching instruction's ith operand displacement
src[i].dispint32_t The matching instruction's ith source operand displacement
dst[i].dispint32_t The matching instruction's ith destination operand displacement
mem[i].dispint32_t The matching instruction's ith memory operand displacement
op[i].baseint32/64_t The matching instruction's ith operand base register
src[i].baseint32/64_t The matching instruction's ith source operand base register
dst[i].baseint32/64_t The matching instruction's ith destination operand base register
mem[i].baseint32/64_t The matching instruction's ith memory operand base register
&op[i].baseint32/64_t * The matching instruction's ith operand base register (passed-by-pointer)
&src[i].baseint32/64_t * The matching instruction's ith source operand base register (passed-by-pointer)
&dst[i].baseint32/64_t * The matching instruction's ith destination operand base register (passed-by-pointer)
&mem[i].baseint32/64_t * The matching instruction's ith memory operand base register (passed-by-pointer)
op[i].indexint32/64_t The matching instruction's ith operand index register
src[i].indexint32/64_t The matching instruction's ith source operand index register
dst[i].indexint32/64_t The matching instruction's ith destination operand index register
mem[i].indexint32/64_t The matching instruction's ith memory operand index register
&op[i].indexint32/64_t * The matching instruction's ith operand index register (passed-by-pointer)
&src[i].indexint32/64_t * The matching instruction's ith source operand index register (passed-by-pointer)
&dst[i].indexint32/64_t * The matching instruction's ith destination operand index register (passed-by-pointer)
&mem[i].indexint32/64_t * The matching instruction's ith memory operand index register (passed-by-pointer)
op[i].scaleint8_t The matching instruction's ith operand scale
src[i].scaleint8_t The matching instruction's ith source operand scale
dst[i].scaleint8_t The matching instruction's ith destination operand scale
mem[i].scaleint8_t The matching instruction's ith memory operand scale
mem8<MEMOP>int8_t An explicit 8-bit MEMOP
mem16<MEMOP>int16_t An explicit 16-bit MEMOP
mem32<MEMOP>int32_t An explicit 32-bit MEMOP
mem64<MEMOP>int64_t An explicit 64-bit MEMOP
&mem8<MEMOP>int8_t * An explicit 8-bit MEMOP (passed-by-pointer)
&mem16<MEMOP>int16_t * An explicit 16-bit MEMOP (passed-by-pointer)
&mem32<MEMOP>int32_t * An explicit 32-bit MEMOP (passed-by-pointer)
&mem64<MEMOP>int64_t * An explicit 64-bit MEMOP (passed-by-pointer)

Notes:

  • The rflags argument differs from the native x86_64 layout in terms of the number of flags as well as the flag ordering. The modified layout is used for efficiency reasons since preserving the native layout is a relatively slow operation.
  • For technical reasons, the %rip register is considered constant and cannot be modified.
  • The state argument is a pointer to a structure containing all general-purpose registers, the flag register (%rflags), the stack register (%rsp) and the instruction pointer register (%rip). See the examples/state.c example for the structure layout. Except for %rip, the values in the structure can be modified, in which case the corresponding register will be updated accordingly.
  • The static version of some arguments gives the address relative to the ELF base, given by the formula: runtime address = ELF address + ELF base. This corresponds to the value used by the matching.

Some arguments can be passed by pointer. This allows the corresponding value to be modified (provided the corresponding type is not const), making it possible to manipulate the state of the program at runtime.

For example, the consider the following simple function defined in example.c:

    void inc(int64_t *ptr)
    {
        *ptr += 1;
    }

And the following patch:

    $ e9compile.sh example.c
    $ e9tool -M ... -P 'inc(&rax)@example' xterm

This patch will increment the %rax register when the inc() function is called for each matching instruction.

Attempting to write to a const pointer is undefined behavior. Typically, this will result in a crash or the written value will be silently ignored.

The passed pointer depends on the operand type:

  • For immediate operands (e.g., &imm[i]), the pointer will point to a constant value stored in read-only memory.
  • For register operands (e.g., &reg[i]), the pointer will point to a temporary location that holds the register value.
  • For memory operands (e.g., &mem[i]), the pointer will be exactly the runtime pointer value calculated by the operand itself. For example, consider the instruction (mov 0x33(%rax,%rbx,2),%rcx), then the value for &mem[0] will be (0x33+%rax+2*%rbx).

Generally, it is recommended to pass memory operands by pointer rather than by value. If passed by value, the memory operand pointer will be dereferenced, which may result in a crash for instructions such as (nop) and (lea) that do not access the operand.


Some arguments can have different types, depending on the instruction. For example, with:

    mov %rax,%rbx
    mov %eax,%ebx
    mov %ax,%bx
    mov %al,%bl

The corresponding types for &op[0] will be (int64_t *), (int32_t *), (int16_t *) and (int8_t *) respectively. If the function is defined in C, there is no way to know the type of the passed argument.

One solution is to implement the functions in C++ rather than C, and to use function overloading. For example, using C++, one can define:

    void func(int64_t *x) { ... }
    void func(int32_t *x) { ... }
    void func(int16_t *x) { ... }
    void func(int8_t *x)  { ... }

Next, the program can be rewritten as follows:

    $ e9compile.sh example.cpp
    $ e9tool -M ... -P 'func(&op[0])@example' xterm

E9Tool will automatically select the function instance that best matches the argument types, or generate an error if no appropriate match can be found.


It is possible to pass explicit memory operands as arguments. This is useful for reading/writing to known memory locations, such as stack memory. The syntax is:

    ( mem8 | mem16 | mem32 | mem64 ) < MEMOP >

Here, the mem8...mem64 token specifies the size of the memory operand, and MEMOP is the memory operand itself specified in AT&T syntax. For example, the following explicit memory operands access stack memory:

    mem64<(%rsp)>
    mem64<0x100(%rsp)>
    mem64<0x200(%rsp,%rax,8)>
    ...

Some arguments may be undefined, e.g., op[3] for a 2-operand instruction. In this case, the NULL pointer will be passed and the type will be std::nullptr_t. This can also be used for function overloading:

    void func(std::nullptr_t x) { ... }

Call trampolines support two Application Binary Interfaces (ABIs).

  • clean saves/restores the CPU state and is compatible with C/C++
  • naked saves/restores registers corresponding to arguments only

The ABI can be specified inside angled brackets (<...>) after the function name, e.g.:

    $ e9tool -M ... -P 'func<naked>(&op[0])@example' xterm

This will call func using the naked ABI.

The clean ABI is the default, which means E9Tool will automatically generate code for saving/restoring most of the CPU state, including all caller-saved registers %rax, %rdi, %rsi, %rdx, %rcx, %r8, %r9, %r10, and %r11. Note however that the clean ABI is different from the standard System V ABI in the following ways:

  • The x87/MMX/SSE/AVX/AVX2/AVX512 registers are not saved.
  • The stack pointer %rsp is not guaranteed to be aligned to a 16-byte boundary.

These differences exist for performance reasons, since saving/restoring the extended register state is an expensive operation. The differences are generally safe provided the patch code exclusively uses general-purpose registers. Patch binaries generated by the e9compile.sh script are guaranteed to be compatible with the clean ABI.

The naked ABI specifies that the function should be called directly and to limit the saving/restoring to registers used to pass arguments. Naked calls allow for a more fine grained control and this can be used to improve performance. However, naked calls are generally incompatible with C/C++, and the function will usually need to be implemented directly in assembly. As such, the naked ABI is not recommended unless you know what you are doing.


Conditional call trampolines examine the return value of the called function, and change the control flow accordingly. There are two basic forms of conditional call trampolines:

  • if func(...) break: if the function returns a non-zero value, then immediately return from the trampoline back to the main program. to the main program if the function returns a non-zero value.
  • if func(...) goto: if the function returns a non-zero value interpreted as an address, then immediately jump to that address.

The first form allows for the conditional execution of the remainder of the trampoline, possibly including the matching instruction itself. For example, consider:

    $ e9tool -M 'mnemonic==syscall' -P 'if filter(...)@example break' ...

The patch is placed in the default before position, i.e., will be executed as instrumentation before the matching instruction. If the filter(...) function returns a non-zero value, the trampoline will immediately return, without executing the matching instruction.

The second form allows for arbitrary jumps to be implemented. The (if func(...) goto) syntax can be thought of as shorthand for:

    if (addr = func(...)) { goto addr; }

The goto is only executed if the return value of the func is non-NULL.


The main limitation of call trampolines is that the patch code cannot use standard libraries directly, including glibc. This is because the instrumentation binary is directly injected into the rewritten binary rather than dynamically/statically linked.

A parallel implementation of common libc functions is provided by the examples/stdlib.c file. To use, simply include this file into the instrumentation code:

    #include "stdlib.c"

This version of libc is designed to be compatible with patch code. However, only a subset of libc is implemented, so it is WYSIWYG. That said, many common libc functions, including file I/O and memory allocation, have been implemented.

Unlike glibc the parallel libc is designed to be compatible with the clean ABI and handle problems, such as deadlocks, more gracefully.


It is possible to define an initialization function in the instrumentation code. For example:

    #include "stdlib.c"

    static int max = 1000;

    void init(int argc, char **argv, char **envp)
    {
        environ = envp;     // Init getenv()

        const char *MAX = getenv("MAX");
        if (MAX != NULL)
            max = atoi(MAX);
    }

The initialization function must be named init, and will be called once during the patched program's initialization. For patched executables, the command line arguments (argc and argv) and the environment pointer (envp) will be passed as arguments to the function.

In the example above, the initialization function searches for an environment variable MAX, and sets the max counter accordingly.

For dynamically linked binaries, it is also possible to define a finalization function that will be called during normal program exit. For example:

    #include "stdlib.h"

    void fini(void)
    {
        fflush(stdout);
    }

The finalization funtion must be named fini and takes no arguments. Note that the finalization function will not be called if the program exits abnormally, such as a signal (SIGSEGV) or if the program calls "fast" exit (_exit()).


The parallel libc also provides an optional implementation of the standard dynamic linker functions dlopen(), dlsym(), and dlclose(). These can be used to dynamically load shared objects at runtime, or access existing shared libraries that are already dynamically linked into the original program. To enable, define the LIBDL macro before including stdlib.c.

    #define LIBDL
    #include "stdlib.c"

The dlinit(dynamic) function must also be called in the init() routine, where dynamic is a secret fourth argument to the init() function:

    void init(int argc, char **argv, char **envp, void *dynamic)
    {
        int result = dlinit(dynamic);
        ...
    }

Once initialized, the dlopen(), dlsym(), and dlclose() functions can be used similarly to the standard libdl counterparts.

Note that function pointers returned by dlsym() should not be called directly unless you know what you are doing. This is because most libraries are compiled with the System V ABI, which is incompatible with the clean call ABI used by the instrumentation. To avoid ABI incompatibility, the external library code should be called using a special wrapper function dlcall():

    intptr_t dlcall(void *func, arg1, arg2, ...);

The dlcall() function will:

  • Align/restore the stack pointer to 16bytes, as required by the System V ABI.
  • Save/restore the extended register state, including %xmm0, etc.
  • Save/restore the glibc version of errno.

Be aware that the dynamic loading API has several caveats:

  • The dlopen(), dlsym(), and dlclose() are wrappers for the glibc versions of these functions (__libc_dlopen, etc.). The glibc versions do not officially exist, so this functionality may change at any time. Also the glibc versions lack some features, such as RTLD_NEXT, that are available with the standard libdl versions.
  • Since glibc is required, the original binary must be dynamically linked.
  • Many external library functions are not designed to be reentrant, and this may cause deadlocks if a signal occurs when the signal handler is also instrumented.
  • The dlcall() function supports a maximum of 16 arguments.
  • The dlcall() function is relatively slow, so ought to be used sparingly.

By design, call trampolines are very simple to use, but this also comes at the cost of efficiency. The problem is that call trampolines add an extra layer of indirection, namely, the control-flow will transfer from the main program, to the trampoline, and then to the called function. For optimal results, it is sometimes better to inline the functionality directly into the trampoline and avoid the extra level of indirection.

A very fine-grained control over the generated trampolines is possible using plugin trampolines, which allows for the precise content of trampolines to be specified directly. The downside is that low-level details, such as the saving/restoring of CPU state, must be handled manually by the trampoline code, so this method is generally only recommended for expert users only.

For more information, please see the E9Patch Programmer's Guide.


Depending on the --match/-M and --patch/-P options, more than one patch may match a given instruction. If this occurs, then all matching trampolines will be executed in an order determined by:

  • The explicit (or implicit) patch position annotation, then
  • The command-line order for tie-breaking.

The possible values for the patch position annotation are:

  • before: The trampoline will be executed before the matching instruction. That is, the trampoline is instrumentation.
  • replace: The trampoline replaces the matching instruction.
  • after: The trampoline is executed after the matching instruction.

If unspecified, the default patch position is assumed to be "before", meaning that the trampoline will be executed before the matching instruction (i.e., instrumentation).

Conceptually, the individual trampolines will be arranged into a "meta" trampoline that will be executed in place of the original matching instruction. The meta trampoline has the following basic form:

        BEFORE (instruction | REPLACE) AFTER break

Here BEFORE are all before trampolines in command-line order, instruction is the original matching instruction, REPLACE is the replacement trampoline, AFTER are all after trampolines in command-line order, and break returns control-flow back to the main program.

Notes:

  • There can be at most one replacement trampoline. If no replacement trampoline is specified, E9Tool will execute the original matching instruction.
  • For the after position, the trampoline will not be executed if the matching instruction transfers control flow (i.e., for jumps taken, calls or returns).
  • Similarly, if any component trampoline transfers control flow (via a break or goto), the rest of the "meta" trampoline will not be executed.

For example, consider the command:

    e9tool -M 'asm=xor.*' -P 'after trap' -P 'replace f(...)@bin' -P print -P 'before if g(...)@bin goto' ...

Then the following "meta" trampoline will be executed in place of each xor instruction:

    print; if g(...) goto; f(...)@bin; trap; break;

The print trampoline is implicitly in the before position, so is executed first. Next, the conditional call (if g(...) goto), also in the before position, will be executed. This conditional call will transfer control-flow if the g(...) function returns a non-NULL value, in which case the rest of the meta trampoline will not be executed. Otherwise, the call f(...)@bin trampoline will be executed next, which replaces the original matching xor instruction. Finally, the trap trampoline, in the after position, will be executed last.

This design makes it possible to compose instrumentation schemas. For example, one could compose AFL fuzzing instrumentation with another instrumentation for detecting memory errors.