NOTE: This guide is a work-in-progress and still incomplete.
E9Tool is a frontend for E9Patch. Basically, E9Tool translates high-level patching commands (i.e., what instructions to patch, and how to patch them) into low-level commands for E9Patch. E9Patch is very low-level tool and not designed to be used directly.
- 1. Matching Language
- 2. Patch Language
- 2.1 Builtin Trampolines
- 2.2 Call Trampolines
- 2.3 Plugin Trampolines
- 2.4 Trampoline Composition
The matching language specifies what instructions should be patched by
the corresponding patch (see below).
Matchings are specified using the (--match MATCH) or
(-M MATCH) command-line option.
The basic form of a matching (MATCH) is a Boolean expression of
TESTs using the following high-level grammar:
MATCH ::= TEST
| ( MATCH )
| not MATCH
| MATCH and MATCH
| MATCH or MATCH
Alternatively, C-style Boolean operations (!, &&, and ||) can be used
instead of (not, and, and or).
Each TEST queries some specific property/attribute of the underlying
instruction, defined using the following grammar:
TEST ::= defined ( ATTRIBUTE )
| VALUES in ATTRIBUTE
| ATTRIBUTE [ CMP VALUES ]
VALUES ::= REGULAR-EXPRESSION
| VALUE [ , VALUE ] *
| BASENAME [ INTEGER ]
CMP ::= = | == | != | > | >= | < | <=
A TEST tests some underlying instruction ATTRIBUTE using an
integer, string or set comparison operator CMP.
The following comparison operators are supported:
| Comparison | Type | Description |
|---|---|---|
| = or == | Integer or String | Equality |
| != | Integer or String | Disequality |
| > | Integer | Greater-than |
| >= | Integer | Greater-than-or-equal-to |
| < | Integer | Less-than |
| <= | Integer | Less-than-or-equal-to |
| in | Set | Set membership |
If the comparison operator and value are omitted, then the test is
equivalent to (ATTRIBUTE != 0).
A VALUE can be either:
- An integer constant, e.g.,
123,0x123, etc. - A string constant, e.g.,
"abc", etc. - An enumeration value such as register names (
rax,eax, etc.), operand types (imm,reg,mem), etc. - A symbolic address of the form
NAME, whereNAMEis any section or symbol name from the input ELF file. A symbolic address has typeInteger.
For string attributes, the value can be a regular expression.
This means that the corresponding attribute value must either
match (for ==) or not match (for !=) the regular expression,
depending on the comparison operator.
The following ATTRIBUTEs (with corresponding types) are
supported:
| Attribute | Type | Description |
|---|---|---|
| true | Boolean | True |
| false | Boolean | False |
| jump | Boolean | True for jump instructions, false otherwise |
| condjump | Boolean | True for conditional jump instructions, false otherwise |
| call | Boolean | True for call instructions, false otherwise |
| return | Boolean | True for return instructions, false otherwise |
| asm | String | The assembly string representation |
| mnemonic | String | The mnemonic |
| section | String | The section name |
| addr | Integer | The ELF virtual address |
| offset | Integer | The ELF file offset |
| size | Integer | The size of the instruction in bytes |
| random | Integer | A random value [0..RAND_MAX] |
| target | Integer | The jump/call target (if statically known). |
| x87 | Boolean | True for x87 instructions, false otherwise |
| mmx | Boolean | True for MMX instructions, false otherwise |
| sse | Boolean | True for SSE instructions, false otherwise |
| avx | Boolean | True for AVX instructions, false otherwise |
| avx2 | Boolean | True for AVX2 instructions, false otherwise |
| avx512 | Boolean | True for AVX512 instructions, false otherwise |
| op.size | Integer | The number of operands |
| src.size | Integer | The number of source operands |
| dst.size | Integer | The number of destination operands |
| imm.size | Integer | The number of immediate operands |
| reg.size | Integer | The number of register operands |
| mem.size | Integer | The number of memory operands |
| op[i] | Operand | The ith operand |
| src[i] | Operand | The ith source operand |
| dst[i] | Operand | The ith destination operand |
| imm[i] | Operand | The ith immediate operand |
| reg[i] | Operand | The ith register operand |
| mem[i] | Operand | The ith memory operand |
| op[i].type | {imm,reg,mem} | The ith operand type |
| src[i].type | {imm,reg,mem} | The ith source operand type |
| dst[i].type | {imm,reg,mem} | The ith destination operand type |
| op[i].access | {-,r,w,rw} | The ith operand access |
| src[i].access | {-,r,w,rw} | The ith source operand access |
| dst[i].access | {-,r,w,rw} | The ith destination operand access |
| reg[i].access | {-,r,w,rw} | The ith register operand access |
| mem[i].access | {-,r,w,rw} | The ith memory operand access |
| op[i].seg | Register | The ith operand segment register |
| src[i].seg | Register | The ith source operand segment register |
| dst[i].seg | Register | The ith destination operand segment register |
| mem[i].seg | Register | The ith memory operand segment register |
| op[i].disp | Integer | The ith operand displacement |
| src[i].disp | Integer | The ith source operand displacement |
| dst[i].disp | Integer | The ith destination operand displacement |
| mem[i].disp | Integer | The ith memory operand displacement |
| op[i].base | Register | The ith operand base register |
| src[i].base | Register | The ith source operand base register |
| dst[i].base | Register | The ith destination operand base register |
| mem[i].base | Register | The ith memory operand base register |
| op[i].index | Register | The ith operand index register |
| src[i].index | Register | The ith source operand index register |
| dst[i].index | Register | The ith destination operand index register |
| mem[i].index | Register | The ith memory operand index register |
| op[i].scale | Integer | The ith operand scale |
| src[i].scale | Integer | The ith source operand scale |
| dst[i].scale | Integer | The ith destination operand scale |
| mem[i].scale | Integer | The ith memory operand scale |
| regs | Set<Register> | The set of all accessed registers |
| reads | Set<Register> | The set of all read-from registers |
| writes | Set<Register> | The set of all written-to registers |
| plugin(NAME).match() | Integer | Value from NAME.so plugin |
Here Register is the set of all x86_64 register names defined as
follows:
Register = {
rip, rflags,
es, cs, ss, ds, fs, gs,
ah, ch, dh, bh,
al, cl, dl, bl, spl, bpl, sil, dil, r8b, ..., r15b,
ax, cx, dx, bx, sp, bp, si, di, r8w, ..., r15w,
eax, ecx, edx, ebx, esp, ebp, esi, edi, r8d, ..., r15d,
rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8, ..., r15,
xmm0, ..., xmm31,
ymm0, ..., ymm31,
zmm0, ..., zmm31, ...}
An Operand can be one of three values:
- An immediate value represented by an
Integer - A register represented by a
Register - A memory operand (not representable)
Thus the Operand type is the union of the Integer and Register types:
Operand = Integer | Register
Not all attributes are defined for all instructions.
For example, if the instruction has 3 operands, then only op[0], op[1],
and op[2] will be defined, and op[3] and beyond will be
undefined.
Similarly, op[0].base will be undefined if the first operand of the
instruction is not a memory operand.
Any test that uses an undefined value will fail.
For example, both of the tests (op[3] == 0x1) and (op[3] != 0x1) will
fail, despite each test being the negation of the other.
The explicit Boolean operators (not, and, and or) treat failure
due to undefinedness the same as false, thus the tests
(op[3] != 0x1) and (not op[3] == 0x1) are not equivalent
for undefined values.
The special defined(ATTRIBUTE) test can be used to determine if
an attribute is defined or not.
- (
true): match every instruction. - (
false): do not match any instruction. - (
asm == jmp.*%r.*): match all instructions whose assembly representation matches the regular expressionjmp.*%r.*(will match jump instructions that access a register). - (
mnemonic == jmp): match all instructions whose mnemonic isjmp. - (
addr == 0x4234a7): match the instruction at the virtual address0x4234a7. - (
addr == 0x4234a7,0x44bd6e,0x4514b4): match the instructions at the virtual addresses0x4234a7,0x44bd6e, and0x4514b4. - (
addr >= 0x4234a7 and addr <= 0x4514b4): match all instructions in the virtual address range0x4234a7..0x4514b4 - (
op.size > 1): match all instructions with more than one operand. - (
reg.size == 2): match all instructions with exactly two register operands. - (
op[0] == 0x1234): match all instructions where the first operand is the immediate value0x1234. - (
op[0] == rax): match all instructions where the first operand is the%raxregister. - (
op[0].type == mem): match all instructions where the first operand is a memory operand. - (
reg[0] == rax and reg[1] == rbx): match all instructions where the first and second register operands are%raxand%rbxrespectively. - (
mem[0].base == rax and mem[0].index == rbx): match all instructions with a memory operand with%raxas the base and%rbxas the index. - (
mem[0].base == nil): match all instructions with a memory operand that does not use a base register. - (
rflags in reads): match all instructions that read the flags register. - (
rflags in writes): match all instructions that modify the flags register. - (
not rflags in regs): match all instructions that do not access the flags register. defined(mem[0]): match all instructions that have at least one memory operand.- (
call and target == &malloc): match all direct calls tomalloc().
Exclusions are an additional method for controlling which instructions are
patched.
An exclusion is specified by the (--exclude RANGE) or (or -E RANGE)
command line option, where RANGE specifies a range of addresses that
should not be disassembled or rewritten.
Exclusions are more low-level than the matching language since the RANGE
will not even be disassembled.
This can help solve some problems, such as the binary storing data
inside the .text section.
The general syntax for RANGE is:
RANGE ::= ADDR [ .. ADDR ]
ADDR ::= VALUE [ + INTEGER ]
VALUE ::= INTEGER
| SYMBOL
| SECTION [ . ( start | end ) ]
For example:
0x12345...0x45689: exclude a specific address range.text..ChromeMain: exclude the.textsection up to the symbolChromeMain.plt .. .text: exclude a range of sections.plt.start .. .text.end: equivalent to the above.plt .. .text.start: exclude all sections between.pltand the starting address of.text. The.textsection itself will not be excluded.malloc .. malloc+16: exclude the 16-byte PLT entry for malloc..text: exclude the entire.textsection.
Note that a RANGE may include a lower and upper bound, i.e., LB .. UB.
If the UB is omitted, then UB=LB is implied.
The instruction at the address UB is not excluded, and disassembly will
resume from this address.
In other words, the syntax LB .. UB represents the address range [LB..UB),
and E9Tool assumes that UB points to a valid instruction from which
disassembly can resume.
The patch language specifies how to patch matching instructions
from the input binary.
Patches are specified using the (--patch PATCH) or
(-P PATCH) command-line option, and must be paired with one
or more matchings.
The basic form of a patch (PATCH) uses
the following high-level grammar:
PATCH ::= [ POSITION ] TRAMPOLINE
POSITION ::= before
| replace
| after
TRAMPOLINE ::= empty
| break
| trap
| exit(CODE)
| print
| CALL
| if CALL break
| if CALL goto
| plugin(NAME).patch()
A patch is an optional position followed by a trampoline. The trampoline represents code that will be executed when control-flow reaches the matching instruction. The trampoline can be either a builtin trampoline, a call trampoline, or a trampoline defined by a plugin.
The builtin trampolines include:
| Patch | Description |
|---|---|
| empty | The empty trampoline |
| break | Immediately return from trampoline |
| trap | Execute a TRAP (int3) instruction |
| exit(CODE) | Exit with CODE |
| Printing the matching instruction |
Here:
emptyis the empty trampoline with no instructions. Control-flow is still redirected to/from empty trampolines, and this can be used to establish a baseline for benchmarking.breakimmediately returns from the trampoline back to the main program.trapexecutes a single TRAP (int3) instruction.exit(CODE)will immediately exit from the program with statusCODE.printwill print the assembly representation of the matching instruction tostderr. This can be used for testing and debugging.
A call trampoline calls a user-defined function that can be implemented in a high-level programming language such as C or C++. Call trampolines are the main way of implementing custom patches using E9Tool. The syntax for a call trampoline is as follows:
CALL ::= FUNCTION [ ABI ] ARGS @ BINARY
ABI ::= < clean | naked >
ARGS ::= ( ARG , ... )
The call trampoline specifies that the trampoline should call function
FUNCTION from the binary BINARY with the arguments ARGS.
To use a call trampoline:
- Implement the desired patch as a function using the
CorC++programming language. - Compile the patch program using the special
e9compile.shscript to generate a patch binary. - Use an E9Tool to call the patch function from the patch binary at the desired locations.
E9Tool will handle all of the low-level details, such as loading the patch binary into memory, passing the arguments to the function, and saving/restoring the CPU state.
For example, the following code defines a function that increments a
counter.
Once the counter exceeds some predefined maximum value, the function
will execute the int3 instruction, causing SIGTRAP to be sent to
the program.
static unsigned long counter = 0;
static unsigned long max = 100000;
void entry(void)
{
counter++;
if (counter >= max)
asm volatile ("int3");
}
Once defined, the program can be compiled using the e9compile.sh
script.
./e9compile.sh counter.c
The e9compile.sh script is a gcc wrapper that ensures the
generated binary is compatible with E9Tool.
In this case, the script will generate a counter binary if
compilation is successful.
Finally, the counter binary can be used as a call trampoline.
For example, to generate a SIGTRAP after the 10000th xor
instruction:
./e9tool -M 'mnemonic==xor' -P 'entry()@counter' ...
Call trampolines are primarily designed for ease-of-use and not for speed. For applications where speed is essential, it is recommended to design a custom trampoline using a plugin.
Call trampolines also support passing arguments to the called function.
The syntax uses the C-style round brackets.
For example:
./e9tool -M ... -P 'func(rip)@example' xterm
This specifies that the current value of the instruction pointer
%rip should be passed as the first argument to the function
func().
The called function can use this argument, e.g.:
void func(const void *rip)
{
...
}
Call trampolines support up to eight arguments. The following arguments are supported:
| Argument | Type | Description |
|---|---|---|
| Integer | intptr_t | An integer constant |
| String | const char * | A string constant |
| &Name | const void * | The runtime address of the named section/symbol/PLT/GOT entry |
| static &Name | const void * | The ELF address of the named section/symbol/PLT/GOT entry |
| asm | const char * | Assembly representation of the matching instruction |
| asm.size | size_t | The number of bytes in asm (including the nul character) |
| asm.len | size_t | The string length of asm (excluding the nul character) |
| base | const void * | The runtime base address of the binary |
| config | const void * | A pointer to the E9Patch configuration (see e9loader.h) |
| addr | const void * | The runtime address of the matching instruction |
| static addr | const void * | The ELF address of the matching instruction |
| id | intptr_t | A unique identifier (one per patch) |
| instr | const uint8_t * | The machine-code bytes of the matching instruction |
| next | const void * | The runtime address of the next executed instruction |
| static next | const void * | The ELF address of the next executed instruction |
| offset | off_t | The ELF file offset of the matching instruction |
| target | const void * | The runtime address of the jump/call/return target, else NULL |
| static target | const void * | The ELF address of the jump/call/return target, else NULL |
| trampoline | const void * | The runtime address of the trampoline |
| random | intptr_t | A (statically generated) random integer [0..RAND_MAX] |
| size | size_t | The size of instr in bytes |
| state | void * | A pointer to a structure containing all general purpose registers |
| ah,...,dh, al,...,r15b | int8_t | The corresponding 8bit register |
| ax,...,r15w | int16_t | The corresponding 16bit register |
| eax,...,r15d | int32_t | The corresponding 32bit register |
| rax,...,r15 | int64_t | The corresponding 64bit register |
| rflags | int16_t | The %rflags register with format SF:ZF:0:AF:0:PF:1:CF:0:0:0:0:0:0:0:OF |
| rip | const void * | The %rip register |
| &ah,...,&dh, &al,...,&r15b | int8_t * | The corresponding 8bit register (passed-by-pointer) |
| &ax,...,&r15w | int16_t * | The corresponding 16bit register (passed-by-pointer) |
| &eax,...,&r15d | int32_t * | The corresponding 32bit register (passed-by-pointer) |
| &rax,...,&r15 | int64_t * | The corresponding 64bit register (passed-by-pointer) |
| &rflags | int16_t * | The %rflags register (passed-by-pointer) |
| op[i] | int8/16/32/64_t | The matching instruction's ith operand |
| src[i] | int8/16/32/64_t | The matching instruction's ith source operand |
| dst[i] | int8/16/32/64_t | The matching instruction's ith destination operand |
| imm[i] | int8/16/32/64_t | The matching instruction's ith immediate operand |
| reg[i] | int8/16/32/64_t | The matching instruction's ith register operand |
| mem[i] | int8/16/32/64_t | The matching instruction's ith memory operand |
| &op[i] | (const) int8/16/32/64_t * | The matching instruction's ith operand (passed-by-pointer) |
| &src[i] | (const) int8/16/32/64_t * | The matching instruction's ith source operand (passed-by-pointer) |
| &dst[i] | int8/16/32/64_t * | The matching instruction's ith destination operand (passed-by-pointer) |
| &imm[i] | const int8/16/32/64_t * | The matching instruction's ith immediate operand (passed-by-pointer) |
| ®[i] | (const) int8/16/32/64_t * | The matching instruction's ith register operand (passed-by-pointer) |
| &mem[i] | int8/16/32/64_t * | The matching instruction's ith memory operand (passed-by-pointer) |
| op[i].size | size_t | The matching instruction's ith operand size |
| src[i].size | size_t | The matching instruction's ith source operand size |
| dst[i].size | size_t | The matching instruction's ith destination operand size |
| imm[i].size | size_t | The matching instruction's ith immediate operand size |
| reg[i].size | size_t | The matching instruction's ith register operand size |
| mem[i].size | size_t | The matching instruction's ith memory operand size |
| op[i].type | int8_t | The matching instruction's ith operand type (1=immediate, 2=register, 3=memory operand) |
| src[i].type | int8_t | The matching instruction's ith source operand type |
| dst[i].type | int8_t | The matching instruction's ith destination operand type |
| imm[i].type | int8_t | The matching instruction's ith immediate operand type |
| reg[i].type | int8_t | The matching instruction's ith register operand type |
| mem[i].type | int8_t | The matching instruction's ith memory operand type |
| op[i].access | int8_t | The matching instruction's ith operand access (0x80 | PROT_READ | PROT_WRITE) |
| src[i].access | int8_t | The matching instruction's ith source operand access |
| dst[i].access | int8_t | The matching instruction's ith destination operand access |
| imm[i].access | int8_t | The matching instruction's ith immediate operand access |
| reg[i].access | int8_t | The matching instruction's ith register operand access |
| mem[i].access | int8_t | The matching instruction's ith memory operand access |
| op[i].disp | int32_t | The matching instruction's ith operand displacement |
| src[i].disp | int32_t | The matching instruction's ith source operand displacement |
| dst[i].disp | int32_t | The matching instruction's ith destination operand displacement |
| mem[i].disp | int32_t | The matching instruction's ith memory operand displacement |
| op[i].base | int32/64_t | The matching instruction's ith operand base register |
| src[i].base | int32/64_t | The matching instruction's ith source operand base register |
| dst[i].base | int32/64_t | The matching instruction's ith destination operand base register |
| mem[i].base | int32/64_t | The matching instruction's ith memory operand base register |
| &op[i].base | int32/64_t * | The matching instruction's ith operand base register (passed-by-pointer) |
| &src[i].base | int32/64_t * | The matching instruction's ith source operand base register (passed-by-pointer) |
| &dst[i].base | int32/64_t * | The matching instruction's ith destination operand base register (passed-by-pointer) |
| &mem[i].base | int32/64_t * | The matching instruction's ith memory operand base register (passed-by-pointer) |
| op[i].index | int32/64_t | The matching instruction's ith operand index register |
| src[i].index | int32/64_t | The matching instruction's ith source operand index register |
| dst[i].index | int32/64_t | The matching instruction's ith destination operand index register |
| mem[i].index | int32/64_t | The matching instruction's ith memory operand index register |
| &op[i].index | int32/64_t * | The matching instruction's ith operand index register (passed-by-pointer) |
| &src[i].index | int32/64_t * | The matching instruction's ith source operand index register (passed-by-pointer) |
| &dst[i].index | int32/64_t * | The matching instruction's ith destination operand index register (passed-by-pointer) |
| &mem[i].index | int32/64_t * | The matching instruction's ith memory operand index register (passed-by-pointer) |
| op[i].scale | int8_t | The matching instruction's ith operand scale |
| src[i].scale | int8_t | The matching instruction's ith source operand scale |
| dst[i].scale | int8_t | The matching instruction's ith destination operand scale |
| mem[i].scale | int8_t | The matching instruction's ith memory operand scale |
| mem8<MEMOP> | int8_t | An explicit 8-bit MEMOP |
| mem16<MEMOP> | int16_t | An explicit 16-bit MEMOP |
| mem32<MEMOP> | int32_t | An explicit 32-bit MEMOP |
| mem64<MEMOP> | int64_t | An explicit 64-bit MEMOP |
| &mem8<MEMOP> | int8_t * | An explicit 8-bit MEMOP (passed-by-pointer) |
| &mem16<MEMOP> | int16_t * | An explicit 16-bit MEMOP (passed-by-pointer) |
| &mem32<MEMOP> | int32_t * | An explicit 32-bit MEMOP (passed-by-pointer) |
| &mem64<MEMOP> | int64_t * | An explicit 64-bit MEMOP (passed-by-pointer) |
Notes:
- The
rflagsargument differs from the nativex86_64layout in terms of the number of flags as well as the flag ordering. The modified layout is used for efficiency reasons since preserving the native layout is a relatively slow operation. - For technical reasons, the
%ripregister is considered constant and cannot be modified. - The
stateargument is a pointer to a structure containing all general-purpose registers, the flag register (%rflags), the stack register (%rsp) and the instruction pointer register (%rip). See theexamples/state.cexample for the structure layout. Except for%rip, the values in the structure can be modified, in which case the corresponding register will be updated accordingly. - The
staticversion of some arguments gives the address relative to the ELF base, given by the formula: runtime address = ELF address + ELF base. This corresponds to the value used by the matching.
Some arguments can be passed by pointer.
This allows the corresponding value to be modified (provided the
corresponding type is not const),
making it possible to manipulate the state of the program at
runtime.
For example, the consider the following simple function defined in
example.c:
void inc(int64_t *ptr)
{
*ptr += 1;
}
And the following patch:
$ e9compile.sh example.c
$ e9tool -M ... -P 'inc(&rax)@example' xterm
This patch will increment the %rax register when the inc() function
is called for each matching instruction.
Attempting to write to a const pointer is undefined behavior.
Typically, this will result in a crash or the written value will be
silently ignored.
The passed pointer depends on the operand type:
- For immediate operands (e.g.,
&imm[i]), the pointer will point to a constant value stored in read-only memory. - For register operands (e.g.,
®[i]), the pointer will point to a temporary location that holds the register value. - For memory operands (e.g.,
&mem[i]), the pointer will be exactly the runtime pointer value calculated by the operand itself. For example, consider the instruction (mov 0x33(%rax,%rbx,2),%rcx), then the value for&mem[0]will be (0x33+%rax+2*%rbx).
Generally, it is recommended to pass memory operands by pointer rather
than by value.
If passed by value, the memory operand pointer will be dereferenced, which
may result in a crash for instructions such as (nop) and (lea) that
do not access the operand.
Some arguments can have different types, depending on the instruction. For example, with:
mov %rax,%rbx
mov %eax,%ebx
mov %ax,%bx
mov %al,%bl
The corresponding types for &op[0] will be (int64_t *), (int32_t *),
(int16_t *) and (int8_t *) respectively.
If the function is defined in C, there is no way to know the type of
the passed argument.
One solution is to implement the functions in C++ rather than C,
and to use function overloading.
For example, using C++, one can define:
void func(int64_t *x) { ... }
void func(int32_t *x) { ... }
void func(int16_t *x) { ... }
void func(int8_t *x) { ... }
Next, the program can be rewritten as follows:
$ e9compile.sh example.cpp
$ e9tool -M ... -P 'func(&op[0])@example' xterm
E9Tool will automatically select the function instance that best matches the argument types, or generate an error if no appropriate match can be found.
It is possible to pass explicit memory operands as arguments. This is useful for reading/writing to known memory locations, such as stack memory. The syntax is:
( mem8 | mem16 | mem32 | mem64 ) < MEMOP >
Here, the mem8...mem64 token specifies the size of the memory operand, and MEMOP is the memory operand itself specified in AT&T syntax. For example, the following explicit memory operands access stack memory:
mem64<(%rsp)>
mem64<0x100(%rsp)>
mem64<0x200(%rsp,%rax,8)>
...
Some arguments may be undefined, e.g., op[3] for a 2-operand instruction.
In this case, the NULL pointer will be passed and the type will
be std::nullptr_t.
This can also be used for function overloading:
void func(std::nullptr_t x) { ... }
Call trampolines support two Application Binary Interfaces (ABIs).
cleansaves/restores the CPU state and is compatible withC/C++nakedsaves/restores registers corresponding to arguments only
The ABI can be specified inside angled brackets (<...>) after the function
name, e.g.:
$ e9tool -M ... -P 'func<naked>(&op[0])@example' xterm
This will call func using the naked ABI.
The clean ABI is the default, which means E9Tool will automatically
generate code for saving/restoring most of the CPU state,
including all caller-saved registers
%rax, %rdi, %rsi, %rdx, %rcx, %r8, %r9, %r10, and %r11.
Note however that the clean ABI is different from the standard
System V ABI in the following ways:
- The x87/MMX/SSE/AVX/AVX2/AVX512 registers are not saved.
- The stack pointer
%rspis not guaranteed to be aligned to a 16-byte boundary.
These differences exist for performance reasons, since saving/restoring
the extended register state is an expensive operation.
The differences are generally safe provided the patch code exclusively
uses general-purpose registers.
Patch binaries generated by the e9compile.sh script are guaranteed to
be compatible with the clean ABI.
The naked ABI specifies that the function should be called
directly and to limit the saving/restoring to registers used to
pass arguments.
Naked calls allow for a more fine grained control and this can be used to
improve performance.
However, naked calls are generally incompatible with C/C++, and
the function will usually need to be implemented directly in assembly.
As such, the naked ABI is not recommended unless you know what you are doing.
Conditional call trampolines examine the return value of the called function, and change the control flow accordingly. There are two basic forms of conditional call trampolines:
if func(...) break: if the function returns a non-zero value, then immediately return from the trampoline back to the main program. to the main program if the function returns a non-zero value.if func(...) goto: if the function returns a non-zero value interpreted as an address, then immediately jump to that address.
The first form allows for the conditional execution of the remainder of the trampoline, possibly including the matching instruction itself. For example, consider:
$ e9tool -M 'mnemonic==syscall' -P 'if filter(...)@example break' ...
The patch is placed in the default before position, i.e., will be executed
as instrumentation before the matching instruction.
If the filter(...) function returns a non-zero value, the trampoline will
immediately return, without executing the matching instruction.
The second form allows for arbitrary jumps to be implemented.
The (if func(...) goto) syntax can be thought of as shorthand for:
if (addr = func(...)) { goto addr; }
The goto is only executed if the return value of the func is non-NULL.
The main limitation of call trampolines is that the patch code
cannot use standard libraries directly, including glibc.
This is because the instrumentation binary is directly injected
into the rewritten binary rather than dynamically/statically linked.
A parallel implementation of common libc functions is provided by the
examples/stdlib.c file.
To use, simply include this file into the instrumentation code:
#include "stdlib.c"
This version of libc is designed to be compatible with patch code. However, only a subset of libc is implemented, so it is WYSIWYG. That said, many common libc functions, including file I/O and memory allocation, have been implemented.
Unlike glibc the parallel libc is designed to be compatible with the clean
ABI and handle problems, such as deadlocks, more gracefully.
It is possible to define an initialization function in the instrumentation code. For example:
#include "stdlib.c"
static int max = 1000;
void init(int argc, char **argv, char **envp)
{
environ = envp; // Init getenv()
const char *MAX = getenv("MAX");
if (MAX != NULL)
max = atoi(MAX);
}
The initialization function must be named init, and will be called
once during the patched program's initialization.
For patched executables, the command line arguments (argc and argv) and
the environment pointer (envp) will be passed as arguments to the function.
In the example above, the initialization function searches for an
environment variable MAX, and sets the max counter accordingly.
For dynamically linked binaries, it is also possible to define a finalization function that will be called during normal program exit. For example:
#include "stdlib.h"
void fini(void)
{
fflush(stdout);
}
The finalization funtion must be named fini and takes no arguments.
Note that the finalization function will not be called if the program exits
abnormally, such as a signal (SIGSEGV) or if the program calls "fast" exit
(_exit()).
The parallel libc also provides an optional implementation of the
standard dynamic linker functions dlopen(), dlsym(), and dlclose().
These can be used to dynamically load shared objects at runtime, or access
existing shared libraries that are already dynamically linked into the original
program.
To enable, define the LIBDL macro before including stdlib.c.
#define LIBDL
#include "stdlib.c"
The dlinit(dynamic) function must also be called in the init() routine,
where dynamic is a secret fourth argument to the init() function:
void init(int argc, char **argv, char **envp, void *dynamic)
{
int result = dlinit(dynamic);
...
}
Once initialized, the dlopen(), dlsym(), and dlclose() functions can be
used similarly to the standard libdl counterparts.
Note that function pointers returned by dlsym() should not be called
directly unless you know what you are doing.
This is because most libraries are compiled with the System V ABI, which is
incompatible with the clean call ABI used by the instrumentation.
To avoid ABI incompatibility, the external library code should be called using
a special wrapper function dlcall():
intptr_t dlcall(void *func, arg1, arg2, ...);
The dlcall() function will:
- Align/restore the stack pointer to 16bytes, as required by the System V ABI.
- Save/restore the extended register state, including
%xmm0, etc. - Save/restore the glibc version of
errno.
Be aware that the dynamic loading API has several caveats:
- The
dlopen(),dlsym(), anddlclose()are wrappers for the glibc versions of these functions (__libc_dlopen, etc.). The glibc versions do not officially exist, so this functionality may change at any time. Also the glibc versions lack some features, such asRTLD_NEXT, that are available with the standard libdl versions. - Since glibc is required, the original binary must be dynamically linked.
- Many external library functions are not designed to be reentrant, and this may cause deadlocks if a signal occurs when the signal handler is also instrumented.
- The
dlcall()function supports a maximum of 16 arguments. - The
dlcall()function is relatively slow, so ought to be used sparingly.
By design, call trampolines are very simple to use, but this also comes at the cost of efficiency. The problem is that call trampolines add an extra layer of indirection, namely, the control-flow will transfer from the main program, to the trampoline, and then to the called function. For optimal results, it is sometimes better to inline the functionality directly into the trampoline and avoid the extra level of indirection.
A very fine-grained control over the generated trampolines is possible using plugin trampolines, which allows for the precise content of trampolines to be specified directly. The downside is that low-level details, such as the saving/restoring of CPU state, must be handled manually by the trampoline code, so this method is generally only recommended for expert users only.
For more information, please see the E9Patch Programmer's Guide.
Depending on the --match/-M and --patch/-P options, more than
one patch may match a given instruction.
If this occurs, then all matching trampolines will be executed in an order
determined by:
- The explicit (or implicit) patch position annotation, then
- The command-line order for tie-breaking.
The possible values for the patch position annotation are:
before: The trampoline will be executed before the matching instruction. That is, the trampoline is instrumentation.replace: The trampoline replaces the matching instruction.after: The trampoline is executed after the matching instruction.
If unspecified, the default patch position is assumed to be "before", meaning
that the trampoline will be executed before the matching instruction
(i.e., instrumentation).
Conceptually, the individual trampolines will be arranged into a "meta" trampoline that will be executed in place of the original matching instruction. The meta trampoline has the following basic form:
BEFORE (instruction | REPLACE) AFTER break
Here BEFORE are all before trampolines in command-line order,
instruction is the original matching instruction,
REPLACE is the replacement trampoline,
AFTER are all after trampolines in command-line order, and
break returns control-flow back to the main program.
Notes:
- There can be at most one replacement trampoline. If no replacement trampoline is specified, E9Tool will execute the original matching instruction.
- For the
afterposition, the trampoline will not be executed if the matching instruction transfers control flow (i.e., for jumps taken, calls or returns). - Similarly, if any component trampoline transfers control flow
(via a
breakorgoto), the rest of the "meta" trampoline will not be executed.
For example, consider the command:
e9tool -M 'asm=xor.*' -P 'after trap' -P 'replace f(...)@bin' -P print -P 'before if g(...)@bin goto' ...
Then the following "meta" trampoline will be executed in place of each xor
instruction:
print; if g(...) goto; f(...)@bin; trap; break;
The print trampoline is implicitly in the before position, so is executed
first.
Next, the conditional call (if g(...) goto), also in the before position,
will be executed.
This conditional call will transfer control-flow if the g(...) function
returns a non-NULL value, in which case the rest of the meta trampoline
will not be executed.
Otherwise, the call f(...)@bin trampoline will be executed next,
which replaces the original matching xor instruction.
Finally, the trap trampoline, in the after position, will be executed last.
This design makes it possible to compose instrumentation schemas. For example, one could compose AFL fuzzing instrumentation with another instrumentation for detecting memory errors.