e9tool-user-guide.md

E9Tool User's Guide

NOTE: This guide is a work-in-progress and still incomplete.

E9Tool is a frontend for E9Patch. Basically, E9Tool translates high-level patching commands (i.e., what instructions to patch, and how to patch them) into low-level commands for E9Patch. E9Patch is very low-level tool and not designed to be used directly.

1. Matching Language

The matching language specifies what instructions should be patched by the corresponding patch (see below). Matchings are specified using the (--match MATCH) or (-M MATCH) command-line option. The basic form of a matching (MATCH) is a Boolean expression of TESTs using the following high-level grammar:

    MATCH ::=   TEST
              | ( MATCH )
              | not MATCH
              | MATCH and MATCH
              | MATCH or MATCH

Alternatively, C-style Boolean operations (!, &&, and ||) can be used instead of (not, and, and or).

Each TEST queries some specific property/attribute of the underlying instruction, defined using the following grammar:

    TEST ::=   defined ( ATTRIBUTE )
             | VALUES in ATTRIBUTE
             | ATTRIBUTE [ CMP VALUES ]

    VALUES ::=   REGULAR-EXPRESSION
               | VALUE [ , VALUE ] *
               | BASENAME [ INTEGER ]

    CMP ::=   = | == | != | > | >= | < | <=

A TEST tests some underlying instruction ATTRIBUTE using an integer, string or set comparison operator CMP. The following comparison operators are supported:

Comparison	Type	Description
`=` or `==`	`Integer` or `String`	Equality
`!=`	`Integer` or `String`	Disequality
`>`	`Integer`	Greater-than
`>=`	`Integer`	Greater-than-or-equal-to
`<`	`Integer`	Less-than
`<=`	`Integer`	Less-than-or-equal-to
`in`	`Set`	Set membership

If the comparison operator and value are omitted, then the test is equivalent to (ATTRIBUTE != 0).

A VALUE can be either:

An integer constant, e.g., 123, 0x123, etc.
A string constant, e.g., "abc", etc.
An enumeration value such as register names (rax, eax, etc.), operand types (imm, reg, mem), etc.
A symbolic address of the form NAME, where NAME is any section or symbol name from the input ELF file. A symbolic address has type Integer.

For string attributes, the value can be a regular expression. This means that the corresponding attribute value must either match (for ==) or not match (for !=) the regular expression, depending on the comparison operator.

1.1 Attributes

The following ATTRIBUTEs (with corresponding types) are supported:

Attribute	Type	Description
`true`	`Boolean`	True
`false`	`Boolean`	False
`jump`	`Boolean`	True for jump instructions, false otherwise
`condjump`	`Boolean`	True for conditional jump instructions, false otherwise
`call`	`Boolean`	True for call instructions, false otherwise
`return`	`Boolean`	True for return instructions, false otherwise
`asm`	`String`	The assembly string representation
`mnemonic`	`String`	The mnemonic
`section`	`String`	The section name
`addr`	`Integer`	The ELF virtual address
`offset`	`Integer`	The ELF file offset
`size`	`Integer`	The size of the instruction in bytes
`random`	`Integer`	A random value [0..`RAND_MAX`]
`target`	`Integer`	The jump/call target (if statically known).
`x87`	`Boolean`	True for x87 instructions, false otherwise
`mmx`	`Boolean`	True for MMX instructions, false otherwise
`sse`	`Boolean`	True for SSE instructions, false otherwise
`avx`	`Boolean`	True for AVX instructions, false otherwise
`avx2`	`Boolean`	True for AVX2 instructions, false otherwise
`avx512`	`Boolean`	True for AVX512 instructions, false otherwise
`op.size`	`Integer`	The number of operands
`src.size`	`Integer`	The number of source operands
`dst.size`	`Integer`	The number of destination operands
`imm.size`	`Integer`	The number of immediate operands
`reg.size`	`Integer`	The number of register operands
`mem.size`	`Integer`	The number of memory operands
`op[i]`	`Operand`	The i^th operand
`src[i]`	`Operand`	The i^th source operand
`dst[i]`	`Operand`	The i^th destination operand
`imm[i]`	`Operand`	The i^th immediate operand
`reg[i]`	`Operand`	The i^th register operand
`mem[i]`	`Operand`	The i^th memory operand
`op[i].type`	`{imm,reg,mem}`	The i^th operand type
`src[i].type`	`{imm,reg,mem}`	The i^th source operand type
`dst[i].type`	`{imm,reg,mem}`	The i^th destination operand type
`op[i].access`	`{-,r,w,rw}`	The i^th operand access
`src[i].access`	`{-,r,w,rw}`	The i^th source operand access
`dst[i].access`	`{-,r,w,rw}`	The i^th destination operand access
`reg[i].access`	`{-,r,w,rw}`	The i^th register operand access
`mem[i].access`	`{-,r,w,rw}`	The i^th memory operand access
`op[i].seg`	`Register`	The i^th operand segment register
`src[i].seg`	`Register`	The i^th source operand segment register
`dst[i].seg`	`Register`	The i^th destination operand segment register
`mem[i].seg`	`Register`	The i^th memory operand segment register
`op[i].disp`	`Integer`	The i^th operand displacement
`src[i].disp`	`Integer`	The i^th source operand displacement
`dst[i].disp`	`Integer`	The i^th destination operand displacement
`mem[i].disp`	`Integer`	The i^th memory operand displacement
`op[i].base`	`Register`	The i^th operand base register
`src[i].base`	`Register`	The i^th source operand base register
`dst[i].base`	`Register`	The i^th destination operand base register
`mem[i].base`	`Register`	The i^th memory operand base register
`op[i].index`	`Register`	The i^th operand index register
`src[i].index`	`Register`	The i^th source operand index register
`dst[i].index`	`Register`	The i^th destination operand index register
`mem[i].index`	`Register`	The i^th memory operand index register
`op[i].scale`	`Integer`	The i^th operand scale
`src[i].scale`	`Integer`	The i^th source operand scale
`dst[i].scale`	`Integer`	The i^th destination operand scale
`mem[i].scale`	`Integer`	The i^th memory operand scale
`regs`	`Set<Register>`	The set of all accessed registers
`reads`	`Set<Register>`	The set of all read-from registers
`writes`	`Set<Register>`	The set of all written-to registers
`plugin(NAME).match()`	`Integer`	Value from `NAME.so` plugin

Here Register is the set of all x86_64 register names defined as follows:

    Register = {
        rip, rflags,
        es, cs, ss, ds, fs, gs,
        ah, ch, dh, bh,
        al, cl, dl, bl, spl, bpl, sil, dil, r8b, ..., r15b,
        ax, cx, dx, bx, sp, bp, si, di, r8w, ..., r15w,
        eax, ecx, edx, ebx, esp, ebp, esi, edi, r8d, ..., r15d,
        rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8, ..., r15,
        xmm0, ..., xmm31,
        ymm0, ..., ymm31,
        zmm0, ..., zmm31, ...}

An Operand can be one of three values:

An immediate value represented by an Integer
A register represented by a Register
A memory operand (not representable)

Thus the Operand type is the union of the Integer and Register types:

    Operand = Integer | Register

1.2 Definedness

Not all attributes are defined for all instructions. For example, if the instruction has 3 operands, then only op[0], op[1], and op[2] will be defined, and op[3] and beyond will be undefined. Similarly, op[0].base will be undefined if the first operand of the instruction is not a memory operand.

Any test that uses an undefined value will fail. For example, both of the tests (op[3] == 0x1) and (op[3] != 0x1) will fail, despite each test being the negation of the other. The explicit Boolean operators (not, and, and or) treat failure due to undefinedness the same as false, thus the tests (op[3] != 0x1) and (not op[3] == 0x1) are not equivalent for undefined values.

The special defined(ATTRIBUTE) test can be used to determine if an attribute is defined or not.

1.3 Examples

(true): match every instruction.
(false): do not match any instruction.
(asm == jmp.*%r.*): match all instructions whose assembly representation matches the regular expression jmp.*%r.* (will match jump instructions that access a register).
(mnemonic == jmp): match all instructions whose mnemonic is jmp.
(addr == 0x4234a7): match the instruction at the virtual address 0x4234a7.
(addr == 0x4234a7,0x44bd6e,0x4514b4): match the instructions at the virtual addresses 0x4234a7, 0x44bd6e, and 0x4514b4.
(addr >= 0x4234a7 and addr <= 0x4514b4): match all instructions in the virtual address range 0x4234a7..0x4514b4
(op.size > 1): match all instructions with more than one operand.
(reg.size == 2): match all instructions with exactly two register operands.
(op[0] == 0x1234): match all instructions where the first operand is the immediate value 0x1234.
(op[0] == rax): match all instructions where the first operand is the %rax register.
(op[0].type == mem): match all instructions where the first operand is a memory operand.
(reg[0] == rax and reg[1] == rbx): match all instructions where the first and second register operands are %rax and %rbx respectively.
(mem[0].base == rax and mem[0].index == rbx): match all instructions with a memory operand with %rax as the base and %rbx as the index.
(mem[0].base == nil): match all instructions with a memory operand that does not use a base register.
(rflags in reads): match all instructions that read the flags register.
(rflags in writes): match all instructions that modify the flags register.
(not rflags in regs): match all instructions that do not access the flags register.
defined(mem[0]): match all instructions that have at least one memory operand.
(call and target == &malloc): match all direct calls to malloc().

1.4 Exclusions

Exclusions are an additional method for controlling which instructions are patched. An exclusion is specified by the (--exclude RANGE) or (or -E RANGE) command line option, where RANGE specifies a range of addresses that should not be disassembled or rewritten. Exclusions are more low-level than the matching language since the RANGE will not even be disassembled. This can help solve some problems, such as the binary storing data inside the .text section.

The general syntax for RANGE is:

    RANGE ::=   ADDR [ .. ADDR ]
    ADDR  ::=   VALUE [ + INTEGER ]
    VALUE ::=   INTEGER
              | SYMBOL
              | SECTION [ . ( start | end ) ]

For example:

0x12345...0x45689: exclude a specific address range
.text..ChromeMain: exclude the .text section up to the symbol ChromeMain
.plt .. .text: exclude a range of sections
.plt.start .. .text.end: equivalent to the above
.plt .. .text.start: exclude all sections between .plt and the starting address of .text. The .text section itself will not be excluded.
malloc .. malloc+16: exclude the 16-byte PLT entry for malloc.
.text: exclude the entire .text section.

Note that a RANGE may include a lower and upper bound, i.e., LB .. UB. If the UB is omitted, then UB=LB is implied. The instruction at the address UB is not excluded, and disassembly will resume from this address. In other words, the syntax LB .. UB represents the address range [LB..UB), and E9Tool assumes that UB points to a valid instruction from which disassembly can resume.

2. Patch Language

The patch language specifies how to patch matching instructions from the input binary. Patches are specified using the (--patch PATCH) or (-P PATCH) command-line option, and must be paired with one or more matchings. The basic form of a patch (PATCH) uses the following high-level grammar:

    PATCH      ::= [ POSITION ] TRAMPOLINE
    POSITION   ::=   before
                   | replace
                   | after
    TRAMPOLINE ::=   empty
                   | break
                   | trap
                   | exit(CODE)
                   | print
                   | CALL
                   | if CALL break
                   | if CALL goto
                   | plugin(NAME).patch()

A patch is an optional position followed by a trampoline. The trampoline represents code that will be executed when control-flow reaches the matching instruction. The trampoline can be either a builtin trampoline, a call trampoline, or a trampoline defined by a plugin.

2.1 Builtin Trampolines

The builtin trampolines include:

Patch	Description
`empty`	The empty trampoline
`break`	Immediately return from trampoline
`trap`	Execute a TRAP (`int3`) instruction
`exit(CODE)`	Exit with `CODE`
`print`	Printing the matching instruction

Here:

empty is the empty trampoline with no instructions. Control-flow is still redirected to/from empty trampolines, and this can be used to establish a baseline for benchmarking.
break immediately returns from the trampoline back to the main program.
trap executes a single TRAP (int3) instruction.
exit(CODE) will immediately exit from the program with status CODE.
print will print the assembly representation of the matching instruction to stderr. This can be used for testing and debugging.

2.2 Call Trampolines

A call trampoline calls a user-defined function that can be implemented in a high-level programming language such as C or C++. Call trampolines are the main way of implementing custom patches using E9Tool. The syntax for a call trampoline is as follows:

    CALL ::= FUNCTION [ ABI ] ARGS @ BINARY
    ABI  ::= < clean | naked >
    ARGS ::= ( ARG , ... )

The call trampoline specifies that the trampoline should call function FUNCTION from the binary BINARY with the arguments ARGS.

To use a call trampoline:

Implement the desired patch as a function using the C or C++ programming language.
Compile the patch program using the special e9compile.sh script to generate a patch binary.
Use an E9Tool to call the patch function from the patch binary at the desired locations.

E9Tool will handle all of the low-level details, such as loading the patch binary into memory, passing the arguments to the function, and saving/restoring the CPU state.

For example, the following code defines a function that increments a counter. Once the counter exceeds some predefined maximum value, the function will execute the int3 instruction, causing SIGTRAP to be sent to the program.

    static unsigned long counter = 0;
    static unsigned long max = 100000;
    void entry(void)
    {
        counter++;
        if (counter >= max)
            asm volatile ("int3");
    }

Once defined, the program can be compiled using the e9compile.sh script.

    ./e9compile.sh counter.c

The e9compile.sh script is a gcc wrapper that ensures the generated binary is compatible with E9Tool. In this case, the script will generate a counter binary if compilation is successful.

Finally, the counter binary can be used as a call trampoline. For example, to generate a SIGTRAP after the 10000th xor instruction:

    ./e9tool -M 'mnemonic==xor' -P 'entry()@counter' ...

Call trampolines are primarily designed for ease-of-use and not for speed. For applications where speed is essential, it is recommended to design a custom trampoline using a plugin.

2.2.1 Call Trampoline Arguments

Call trampolines also support passing arguments to the called function. The syntax uses the C-style round brackets. For example:

    ./e9tool -M ... -P 'func(rip)@example' xterm

This specifies that the current value of the instruction pointer %rip should be passed as the first argument to the function func(). The called function can use this argument, e.g.:

    void func(const void *rip)
    {
        ...
    }

Call trampolines support up to eight arguments. The following arguments are supported:

Argument	Type	Description
Integer	`intptr_t`	An integer constant
String	`const char *`	A string constant
`&`Name	`const void *`	The runtime address of the named section/symbol/PLT/GOT entry
`static &`Name	`const void *`	The ELF address of the named section/symbol/PLT/GOT entry
`asm`	`const char *`	Assembly representation of the matching instruction
`asm.size`	`size_t`	The number of bytes in `asm` (including the nul character)
`asm.len`	`size_t`	The string length of `asm` (excluding the nul character)
`base`	`const void *`	The runtime base address of the binary
`config`	`const void *`	A pointer to the E9Patch configuration (see `e9loader.h`)
`addr`	`const void *`	The runtime address of the matching instruction
`static addr`	`const void *`	The ELF address of the matching instruction
`id`	`intptr_t`	A unique identifier (one per patch)
`instr`	`const uint8_t *`	The machine-code bytes of the matching instruction
`next`	`const void *`	The runtime address of the next executed instruction
`static next`	`const void *`	The ELF address of the next executed instruction
`offset`	`off_t`	The ELF file offset of the matching instruction
`target`	`const void *`	The runtime address of the jump/call/return target, else `NULL`
`static target`	`const void *`	The ELF address of the jump/call/return target, else `NULL`
`trampoline`	`const void *`	The runtime address of the trampoline
`random`	`intptr_t`	A (statically generated) random integer [0..`RAND_MAX`]
`size`	`size_t`	The size of `instr` in bytes
`state`	`void *`	A pointer to a structure containing all general purpose registers
`ah`,...,`dh`, `al`,...,`r15b`	`int8_t`	The corresponding 8bit register
`ax`,...,`r15w`	`int16_t`	The corresponding 16bit register
`eax`,...,`r15d`	`int32_t`	The corresponding 32bit register
`rax`,...,`r15`	`int64_t`	The corresponding 64bit register
`rflags`	`int16_t`	The `%rflags` register with format `SF:ZF:0:AF:0:PF:1:CF:0:0:0:0:0:0:0:OF`
`rip`	`const void *`	The `%rip` register
`&ah`,...,`&dh`, `&al`,...,`&r15b`	`int8_t *`	The corresponding 8bit register (passed-by-pointer)
`&ax`,...,`&r15w`	`int16_t *`	The corresponding 16bit register (passed-by-pointer)
`&eax`,...,`&r15d`	`int32_t *`	The corresponding 32bit register (passed-by-pointer)
`&rax`,...,`&r15`	`int64_t *`	The corresponding 64bit register (passed-by-pointer)
`&rflags`	`int16_t *`	The `%rflags` register (passed-by-pointer)
`op[i]`	`int8/16/32/64_t`	The matching instruction's i^th operand
`src[i]`	`int8/16/32/64_t`	The matching instruction's i^th source operand
`dst[i]`	`int8/16/32/64_t`	The matching instruction's i^th destination operand
`imm[i]`	`int8/16/32/64_t`	The matching instruction's i^th immediate operand
`reg[i]`	`int8/16/32/64_t`	The matching instruction's i^th register operand
`mem[i]`	`int8/16/32/64_t`	The matching instruction's i^th memory operand
`&op[i]`	`(const) int8/16/32/64_t *`	The matching instruction's i^th operand (passed-by-pointer)
`&src[i]`	`(const) int8/16/32/64_t *`	The matching instruction's i^th source operand (passed-by-pointer)
`&dst[i]`	`int8/16/32/64_t *`	The matching instruction's i^th destination operand (passed-by-pointer)
`&imm[i]`	`const int8/16/32/64_t *`	The matching instruction's i^th immediate operand (passed-by-pointer)
`&reg[i]`	`(const) int8/16/32/64_t *`	The matching instruction's i^th register operand (passed-by-pointer)
`&mem[i]`	`int8/16/32/64_t *`	The matching instruction's i^th memory operand (passed-by-pointer)
`op[i].size`	`size_t`	The matching instruction's i^th operand size
`src[i].size`	`size_t`	The matching instruction's i^th source operand size
`dst[i].size`	`size_t`	The matching instruction's i^th destination operand size
`imm[i].size`	`size_t`	The matching instruction's i^th immediate operand size
`reg[i].size`	`size_t`	The matching instruction's i^th register operand size
`mem[i].size`	`size_t`	The matching instruction's i^th memory operand size
`op[i].type`	`int8_t`	The matching instruction's i^th operand type (1=immediate, 2=register, 3=memory operand)
`src[i].type`	`int8_t`	The matching instruction's i^th source operand type
`dst[i].type`	`int8_t`	The matching instruction's i^th destination operand type
`imm[i].type`	`int8_t`	The matching instruction's i^th immediate operand type
`reg[i].type`	`int8_t`	The matching instruction's i^th register operand type
`mem[i].type`	`int8_t`	The matching instruction's i^th memory operand type
`op[i].access`	`int8_t`	The matching instruction's i^th operand access (`0x80 \| PROT_READ \| PROT_WRITE`)
`src[i].access`	`int8_t`	The matching instruction's i^th source operand access
`dst[i].access`	`int8_t`	The matching instruction's i^th destination operand access
`imm[i].access`	`int8_t`	The matching instruction's i^th immediate operand access
`reg[i].access`	`int8_t`	The matching instruction's i^th register operand access
`mem[i].access`	`int8_t`	The matching instruction's i^th memory operand access
`op[i].disp`	`int32_t`	The matching instruction's i^th operand displacement
`src[i].disp`	`int32_t`	The matching instruction's i^th source operand displacement
`dst[i].disp`	`int32_t`	The matching instruction's i^th destination operand displacement
`mem[i].disp`	`int32_t`	The matching instruction's i^th memory operand displacement
`op[i].base`	`int32/64_t`	The matching instruction's i^th operand base register
`src[i].base`	`int32/64_t`	The matching instruction's i^th source operand base register
`dst[i].base`	`int32/64_t`	The matching instruction's i^th destination operand base register
`mem[i].base`	`int32/64_t`	The matching instruction's i^th memory operand base register
`&op[i].base`	`int32/64_t *`	The matching instruction's i^th operand base register (passed-by-pointer)
`&src[i].base`	`int32/64_t *`	The matching instruction's i^th source operand base register (passed-by-pointer)
`&dst[i].base`	`int32/64_t *`	The matching instruction's i^th destination operand base register (passed-by-pointer)
`&mem[i].base`	`int32/64_t *`	The matching instruction's i^th memory operand base register (passed-by-pointer)
`op[i].index`	`int32/64_t`	The matching instruction's i^th operand index register
`src[i].index`	`int32/64_t`	The matching instruction's i^th source operand index register
`dst[i].index`	`int32/64_t`	The matching instruction's i^th destination operand index register
`mem[i].index`	`int32/64_t`	The matching instruction's i^th memory operand index register
`&op[i].index`	`int32/64_t *`	The matching instruction's i^th operand index register (passed-by-pointer)
`&src[i].index`	`int32/64_t *`	The matching instruction's i^th source operand index register (passed-by-pointer)
`&dst[i].index`	`int32/64_t *`	The matching instruction's i^th destination operand index register (passed-by-pointer)
`&mem[i].index`	`int32/64_t *`	The matching instruction's i^th memory operand index register (passed-by-pointer)
`op[i].scale`	`int8_t`	The matching instruction's i^th operand scale
`src[i].scale`	`int8_t`	The matching instruction's i^th source operand scale
`dst[i].scale`	`int8_t`	The matching instruction's i^th destination operand scale
`mem[i].scale`	`int8_t`	The matching instruction's i^th memory operand scale
`mem8<MEMOP>`	`int8_t`	An explicit 8-bit `MEMOP`
`mem16<MEMOP>`	`int16_t`	An explicit 16-bit `MEMOP`
`mem32<MEMOP>`	`int32_t`	An explicit 32-bit `MEMOP`
`mem64<MEMOP>`	`int64_t`	An explicit 64-bit `MEMOP`
`&mem8<MEMOP>`	`int8_t *`	An explicit 8-bit `MEMOP` (passed-by-pointer)
`&mem16<MEMOP>`	`int16_t *`	An explicit 16-bit `MEMOP` (passed-by-pointer)
`&mem32<MEMOP>`	`int32_t *`	An explicit 32-bit `MEMOP` (passed-by-pointer)
`&mem64<MEMOP>`	`int64_t *`	An explicit 64-bit `MEMOP` (passed-by-pointer)

Notes:

The rflags argument differs from the native x86_64 layout in terms of the number of flags as well as the flag ordering. The modified layout is used for efficiency reasons since preserving the native layout is a relatively slow operation.
For technical reasons, the %rip register is considered constant and cannot be modified.
The state argument is a pointer to a structure containing all general-purpose registers, the flag register (%rflags), the stack register (%rsp) and the instruction pointer register (%rip). See the examples/state.c example for the structure layout. Except for %rip, the values in the structure can be modified, in which case the corresponding register will be updated accordingly.
The static version of some arguments gives the address relative to the ELF base, given by the formula: runtime address = ELF address + ELF base. This corresponds to the value used by the matching.

2.2.1.1 Pass-by-pointer Arguments

Some arguments can be passed by pointer. This allows the corresponding value to be modified (provided the corresponding type is not const), making it possible to manipulate the state of the program at runtime.

For example, the consider the following simple function defined in example.c:

    void inc(int64_t *ptr)
    {
        *ptr += 1;
    }

And the following patch:

    $ e9compile.sh example.c
    $ e9tool -M ... -P 'inc(&rax)@example' xterm

This patch will increment the %rax register when the inc() function is called for each matching instruction.

Attempting to write to a const pointer is undefined behavior. Typically, this will result in a crash or the written value will be silently ignored.

The passed pointer depends on the operand type:

For immediate operands (e.g., &imm[i]), the pointer will point to a constant value stored in read-only memory.
For register operands (e.g., &reg[i]), the pointer will point to a temporary location that holds the register value.
For memory operands (e.g., &mem[i]), the pointer will be exactly the runtime pointer value calculated by the operand itself. For example, consider the instruction (mov 0x33(%rax,%rbx,2),%rcx), then the value for &mem[0] will be (0x33+%rax+2*%rbx).

Generally, it is recommended to pass memory operands by pointer rather than by value. If passed by value, the memory operand pointer will be dereferenced, which may result in a crash for instructions such as (nop) and (lea) that do not access the operand.

2.2.1.2 Polymorphic Arguments

Some arguments can have different types, depending on the instruction. For example, with:

    mov %rax,%rbx
    mov %eax,%ebx
    mov %ax,%bx
    mov %al,%bl

The corresponding types for &op[0] will be (int64_t *), (int32_t *), (int16_t *) and (int8_t *) respectively. If the function is defined in C, there is no way to know the type of the passed argument.

One solution is to implement the functions in C++ rather than C, and to use function overloading. For example, using C++, one can define:

    void func(int64_t *x) { ... }
    void func(int32_t *x) { ... }
    void func(int16_t *x) { ... }
    void func(int8_t *x)  { ... }

Next, the program can be rewritten as follows:

    $ e9compile.sh example.cpp
    $ e9tool -M ... -P 'func(&op[0])@example' xterm

E9Tool will automatically select the function instance that best matches the argument types, or generate an error if no appropriate match can be found.

2.2.1.3 Explicit Memory Operand Arguments

It is possible to pass explicit memory operands as arguments. This is useful for reading/writing to known memory locations, such as stack memory. The syntax is:

    ( mem8 | mem16 | mem32 | mem64 ) < MEMOP >

Here, the mem8...mem64 token specifies the size of the memory operand, and MEMOP is the memory operand itself specified in AT&T syntax. For example, the following explicit memory operands access stack memory:

    mem64<(%rsp)>
    mem64<0x100(%rsp)>
    mem64<0x200(%rsp,%rax,8)>
    ...

2.2.1.4 Undefined Arguments

Some arguments may be undefined, e.g., op[3] for a 2-operand instruction. In this case, the NULL pointer will be passed and the type will be std::nullptr_t. This can also be used for function overloading:

    void func(std::nullptr_t x) { ... }

2.2.2 Call Trampoline ABI

Call trampolines support two Application Binary Interfaces (ABIs).

clean saves/restores the CPU state and is compatible with C/C++
naked saves/restores registers corresponding to arguments only

The ABI can be specified inside angled brackets (<...>) after the function name, e.g.:

    $ e9tool -M ... -P 'func<naked>(&op[0])@example' xterm

This will call func using the naked ABI.

The clean ABI is the default, which means E9Tool will automatically generate code for saving/restoring most of the CPU state, including all caller-saved registers %rax, %rdi, %rsi, %rdx, %rcx, %r8, %r9, %r10, and %r11. Note however that the clean ABI is different from the standard System V ABI in the following ways:

The x87/MMX/SSE/AVX/AVX2/AVX512 registers are not saved.
The stack pointer %rsp is not guaranteed to be aligned to a 16-byte boundary.

These differences exist for performance reasons, since saving/restoring the extended register state is an expensive operation. The differences are generally safe provided the patch code exclusively uses general-purpose registers. Patch binaries generated by the e9compile.sh script are guaranteed to be compatible with the clean ABI.

The naked ABI specifies that the function should be called directly and to limit the saving/restoring to registers used to pass arguments. Naked calls allow for a more fine grained control and this can be used to improve performance. However, naked calls are generally incompatible with C/C++, and the function will usually need to be implemented directly in assembly. As such, the naked ABI is not recommended unless you know what you are doing.

2.2.3 Conditional Call Trampolines

Conditional call trampolines examine the return value of the called function, and change the control flow accordingly. There are two basic forms of conditional call trampolines:

if func(...) break: if the function returns a non-zero value, then immediately return from the trampoline back to the main program. to the main program if the function returns a non-zero value.
if func(...) goto: if the function returns a non-zero value interpreted as an address, then immediately jump to that address.

The first form allows for the conditional execution of the remainder of the trampoline, possibly including the matching instruction itself. For example, consider:

    $ e9tool -M 'mnemonic==syscall' -P 'if filter(...)@example break' ...

The patch is placed in the default before position, i.e., will be executed as instrumentation before the matching instruction. If the filter(...) function returns a non-zero value, the trampoline will immediately return, without executing the matching instruction.

The second form allows for arbitrary jumps to be implemented. The (if func(...) goto) syntax can be thought of as shorthand for:

    if (addr = func(...)) { goto addr; }

The goto is only executed if the return value of the func is non-NULL.

2.2.4 Call Trampoline Standard Library

The main limitation of call trampolines is that the patch code cannot use standard libraries directly, including glibc. This is because the instrumentation binary is directly injected into the rewritten binary rather than dynamically/statically linked.

A parallel implementation of common libc functions is provided by the examples/stdlib.c file. To use, simply include this file into the instrumentation code:

    #include "stdlib.c"

This version of libc is designed to be compatible with patch code. However, only a subset of libc is implemented, so it is WYSIWYG. That said, many common libc functions, including file I/O and memory allocation, have been implemented.

Unlike glibc the parallel libc is designed to be compatible with the clean ABI and handle problems, such as deadlocks, more gracefully.

2.2.5 Call Trampoline Initialization and Finalization

It is possible to define an initialization function in the instrumentation code. For example:

    #include "stdlib.c"

    static int max = 1000;

    void init(int argc, char **argv, char **envp)
    {
        environ = envp;     // Init getenv()

        const char *MAX = getenv("MAX");
        if (MAX != NULL)
            max = atoi(MAX);
    }

The initialization function must be named init, and will be called once during the patched program's initialization. For patched executables, the command line arguments (argc and argv) and the environment pointer (envp) will be passed as arguments to the function.

In the example above, the initialization function searches for an environment variable MAX, and sets the max counter accordingly.

For dynamically linked binaries, it is also possible to define a finalization function that will be called during normal program exit. For example:

    #include "stdlib.h"

    void fini(void)
    {
        fflush(stdout);
    }

The finalization funtion must be named fini and takes no arguments. Note that the finalization function will not be called if the program exits abnormally, such as a signal (SIGSEGV) or if the program calls "fast" exit (_exit()).

2.2.6 Call Trampoline Dynamic Loading

The parallel libc also provides an optional implementation of the standard dynamic linker functions dlopen(), dlsym(), and dlclose(). These can be used to dynamically load shared objects at runtime, or access existing shared libraries that are already dynamically linked into the original program. To enable, define the LIBDL macro before including stdlib.c.

    #define LIBDL
    #include "stdlib.c"

The dlinit(dynamic) function must also be called in the init() routine, where dynamic is a secret fourth argument to the init() function:

    void init(int argc, char **argv, char **envp, void *dynamic)
    {
        int result = dlinit(dynamic);
        ...
    }

Once initialized, the dlopen(), dlsym(), and dlclose() functions can be used similarly to the standard libdl counterparts.

Note that function pointers returned by dlsym() should not be called directly unless you know what you are doing. This is because most libraries are compiled with the System V ABI, which is incompatible with the clean call ABI used by the instrumentation. To avoid ABI incompatibility, the external library code should be called using a special wrapper function dlcall():

    intptr_t dlcall(void *func, arg1, arg2, ...);

The dlcall() function will:

Align/restore the stack pointer to 16bytes, as required by the System V ABI.
Save/restore the extended register state, including %xmm0, etc.
Save/restore the glibc version of errno.

Be aware that the dynamic loading API has several caveats:

The dlopen(), dlsym(), and dlclose() are wrappers for the glibc versions of these functions (__libc_dlopen, etc.). The glibc versions do not officially exist, so this functionality may change at any time. Also the glibc versions lack some features, such as RTLD_NEXT, that are available with the standard libdl versions.
Since glibc is required, the original binary must be dynamically linked.
Many external library functions are not designed to be reentrant, and this may cause deadlocks if a signal occurs when the signal handler is also instrumented.
The dlcall() function supports a maximum of 16 arguments.
The dlcall() function is relatively slow, so ought to be used sparingly.

2.3 Plugin Trampolines

By design, call trampolines are very simple to use, but this also comes at the cost of efficiency. The problem is that call trampolines add an extra layer of indirection, namely, the control-flow will transfer from the main program, to the trampoline, and then to the called function. For optimal results, it is sometimes better to inline the functionality directly into the trampoline and avoid the extra level of indirection.

A very fine-grained control over the generated trampolines is possible using plugin trampolines, which allows for the precise content of trampolines to be specified directly. The downside is that low-level details, such as the saving/restoring of CPU state, must be handled manually by the trampoline code, so this method is generally only recommended for expert users only.

For more information, please see the E9Patch Programmer's Guide.

2.4 Trampoline Composition

Depending on the --match/-M and --patch/-P options, more than one patch may match a given instruction. If this occurs, then all matching trampolines will be executed in an order determined by:

The explicit (or implicit) patch position annotation, then
The command-line order for tie-breaking.

The possible values for the patch position annotation are:

before: The trampoline will be executed before the matching instruction. That is, the trampoline is instrumentation.
replace: The trampoline replaces the matching instruction.
after: The trampoline is executed after the matching instruction.

If unspecified, the default patch position is assumed to be "before", meaning that the trampoline will be executed before the matching instruction (i.e., instrumentation).

Conceptually, the individual trampolines will be arranged into a "meta" trampoline that will be executed in place of the original matching instruction. The meta trampoline has the following basic form:

        BEFORE (instruction | REPLACE) AFTER break

Here BEFORE are all before trampolines in command-line order, instruction is the original matching instruction, REPLACE is the replacement trampoline, AFTER are all after trampolines in command-line order, and break returns control-flow back to the main program.

Notes:

There can be at most one replacement trampoline. If no replacement trampoline is specified, E9Tool will execute the original matching instruction.
For the after position, the trampoline will not be executed if the matching instruction transfers control flow (i.e., for jumps taken, calls or returns).
Similarly, if any component trampoline transfers control flow (via a break or goto), the rest of the "meta" trampoline will not be executed.

For example, consider the command:

    e9tool -M 'asm=xor.*' -P 'after trap' -P 'replace f(...)@bin' -P print -P 'before if g(...)@bin goto' ...

Then the following "meta" trampoline will be executed in place of each xor instruction:

    print; if g(...) goto; f(...)@bin; trap; break;

The print trampoline is implicitly in the before position, so is executed first. Next, the conditional call (if g(...) goto), also in the before position, will be executed. This conditional call will transfer control-flow if the g(...) function returns a non-NULL value, in which case the rest of the meta trampoline will not be executed. Otherwise, the call f(...)@bin trampoline will be executed next, which replaces the original matching xor instruction. Finally, the trap trampoline, in the after position, will be executed last.

This design makes it possible to compose instrumentation schemas. For example, one could compose AFL fuzzing instrumentation with another instrumentation for detecting memory errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E9Tool User's Guide

Contents

1. Matching Language

1.1 Attributes

1.2 Definedness

1.3 Examples

1.4 Exclusions

2. Patch Language

2.1 Builtin Trampolines

2.2 Call Trampolines

2.2.1 Call Trampoline Arguments

2.2.1.1 Pass-by-pointer Arguments

2.2.1.2 Polymorphic Arguments

2.2.1.3 Explicit Memory Operand Arguments

2.2.1.4 Undefined Arguments

2.2.2 Call Trampoline ABI

2.2.3 Conditional Call Trampolines

2.2.4 Call Trampoline Standard Library

2.2.5 Call Trampoline Initialization and Finalization

2.2.6 Call Trampoline Dynamic Loading

2.3 Plugin Trampolines

2.4 Trampoline Composition

FilesExpand file tree

e9tool-user-guide.md

Latest commit

History

e9tool-user-guide.md

File metadata and controls

E9Tool User's Guide

Contents