Skip to content

Commit 60c0593

Browse files
committed
Allow disassembly or analysis to be overridden.
This introduces two new options for E9Tool: --use-disasm disasm.csv --use-targets targets.csv This allows the default E9Tool disassembler or control-flow analysis to be overridden with information generated by other tools. Here: * disasm.csv lists all instruction addresses * targets.csv lists all jump/call targets
1 parent e458531 commit 60c0593

6 files changed

Lines changed: 202 additions & 2 deletions

File tree

doc/e9tool-user-guide.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
- [1.1 Optimization](#optimization)
88
- [1.2 Compression](#compression)
99
- [1.3 Rewriting Modes](#modes)
10+
- [1.4 Disassembly and Analysis](#analysis)
1011
* [2. Matching Language](#matching)
1112
- [2.1 Attributes](#attributes)
1213
- [2.2 Definedness](#definedness)
@@ -123,6 +124,39 @@ possible rewriting errors.
123124
Thus, the CFR mode is not as robust as the default mode, although it should
124125
be compatible with most binaries.
125126

127+
---
128+
### <a id="analysis">1.4 Disassembly and Analysis</a>
129+
130+
For convenience, E9Tool comes with a built-in linear disassembler that should
131+
handle some binaries compiled with standard compilers, such as `gcc`.
132+
However, linear disassemblers have known limitations for some binaries that
133+
mix code and data.
134+
It is possible to override the default disassembler using the `--use-disasm`
135+
option, e.g.:
136+
137+
$ e9tool --use-disasm disasm.csv ...
138+
139+
Here, `disasm.csv` is a single-column *comma-seperated-value* (CSV) file that
140+
should contain all instruction addresses to be disassembled.
141+
The `disasm.csv` file can be generated by other disassemblers, and integrated
142+
into the E9Tool/E9Patch toolchain.
143+
144+
Similarly, E9Tool's default *control-flow-recovery* analysis can be
145+
overridden by the `--use-targets` option, e.g.:
146+
147+
$ e9tool --use-targets targets.csv ...
148+
149+
Here, `targets.csv` is a CSV file with one or two columns.
150+
The first column is the list of all jump/call target addresses in the binary.
151+
The second column, if present, is a Boolean value (either 0 or 1), where a
152+
value of 1 indicates that the target is a function entry, and 0 otherwise.
153+
Like disassembly, the `targets.csv` file can be generated by other binary
154+
analysis tools and integrated into the E9Tool/E9Patch toolchain.
155+
156+
Note that `--use-targets` option does not affect the internal
157+
control-flow-recovery analysis for E9Patch (for the `-X` option).
158+
Rather, the option only affects E9Tool's matching/patching operations.
159+
126160
---
127161
## <a id="matching">2. Matching Language</a>
128162

doc/e9tool.1

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,16 @@ The default syntax is "ATT".
203203
.IP "\fB\-\-trap\fR=\fI\,ADDR\/\fR, \fB\-\-trap\-all\fR" 4
204204
Insert a trap (int3) instruction at the corresponding
205205
trampoline entry. This can be used for debugging with gdb.
206+
.IP "\fB\-\-use\-disasm \fI\,FILE\/\fR" 4
207+
Use the instruction information in FILE rather than the default
208+
disassmebler. Here, FILE is a CSV file with a single column
209+
representing instruction addresses.
210+
.IP "\fB\-\-use\-targets \fI\,FILE\/\fR" 4
211+
Use the jump/call target information in FILE rather than the
212+
default control-flow recovery analysis. Here, FILE is a CSV
213+
file where the first column is all jump/call targets, and an
214+
optional second column is 1 for call targets (functions), or
215+
0 otherwise (the default is 0).
206216
.IP "\fB\-\-version\fR" 4
207217
Print the version and exit.
208218
.IP "\fB\-X\fR" 4

src/e9tool/e9csv.cpp

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,3 +283,87 @@ MatchVal getCSVValue(intptr_t addr, const char *basename, uint16_t idx)
283283
return record[idx];
284284
}
285285

286+
/*
287+
* Specialized parser for lists of addresses.
288+
*/
289+
void parseAddrs(const char *filename, std::vector<intptr_t> &As)
290+
{
291+
FILE *stream = fopen(filename, "r");
292+
if (stream == nullptr)
293+
error("failed to open CSV file \"%s\" for reading: %s",
294+
filename, strerror(errno));
295+
296+
Record record;
297+
CSV csv = {stream, filename, /*length=*/1, 0};
298+
while (true)
299+
{
300+
if (!parseRecord(csv, record))
301+
break;
302+
if ((unsigned)record.size() != (unsigned)csv.length)
303+
error("failed to parse CSV file \"%s\" at line %u; record with "
304+
"invalid length %zu (expected %u)", csv.filename,
305+
csv.lineno, record.size(), csv.length);
306+
MatchVal &addr = record[0];
307+
if (addr.type != MATCH_TYPE_INTEGER)
308+
error("failed to parse CSV file \"%s\" at line %u; first record "
309+
"entry must be an address", csv.filename, csv.lineno);
310+
As.push_back(addr.i);
311+
record.clear();
312+
}
313+
314+
fclose(stream);
315+
std::sort(As.begin(), As.end());
316+
std::unique(As.begin(), As.end());
317+
As.shrink_to_fit();
318+
}
319+
320+
/*
321+
* Specialized parser for targets.
322+
*/
323+
void parseTargets(const char *filename, const Instr *Is, size_t size,
324+
Targets &targets)
325+
{
326+
FILE *stream = fopen(filename, "r");
327+
if (stream == nullptr)
328+
error("failed to open CSV file \"%s\" for reading: %s",
329+
filename, strerror(errno));
330+
331+
Record record;
332+
CSV csv = {stream, filename, /*length=*/2, 0};
333+
while (true)
334+
{
335+
if (!parseRecord(csv, record))
336+
break;
337+
if (record.size() != 1 && record.size() != 2)
338+
error("failed to parse CSV file \"%s\" at line %u; record with "
339+
"invalid length %zu (expected 1 or 2)", csv.filename,
340+
csv.lineno, record.size());
341+
MatchVal &addr = record[0];
342+
if (addr.type != MATCH_TYPE_INTEGER)
343+
error("failed to parse CSV file \"%s\" at line %u; first record "
344+
"entry must be an address", csv.filename, csv.lineno);
345+
TargetKind kind = TARGET_DIRECT;
346+
if (record.size() == 2)
347+
{
348+
MatchVal &func = record[1];
349+
if (func.type != MATCH_TYPE_INTEGER || (func.i != 0 && func.i != 1))
350+
error("failed to parse CSV file \"%s\" at line %u; second "
351+
"record entry must be Boolean value (0 or 1)",
352+
csv.filename, csv.lineno);
353+
kind |= (func.i != 0? TARGET_FUNCTION: 0);
354+
}
355+
if (findInstr(Is, size, addr.i) < 0)
356+
{
357+
record.clear();
358+
continue;
359+
}
360+
auto r = targets.insert({addr.i, kind});
361+
if (!r.second)
362+
error("failed to parse CSV file \"%s\" at line %u; duplicate "
363+
"record with address 0x%lx", csv.filename, csv.lineno,
364+
addr);
365+
record.clear();
366+
}
367+
fclose(stream);
368+
}
369+

src/e9tool/e9csv.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,8 @@
2020
#include "e9action.h"
2121

2222
extern MatchVal getCSVValue(intptr_t addr, const char *basename, uint16_t idx);
23+
void parseAddrs(const char *filename, std::vector<intptr_t> &As);
24+
void parseTargets(const char *filename, const e9tool::Instr *Is, size_t size,
25+
e9tool::Targets &targets);
2326

2427
#endif

src/e9tool/e9misc.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,18 @@ void usage(FILE *stream, const char *progname)
330330
"\t\tInsert a trap (int3) instruction at the corresponding\n"
331331
"\t\ttrampoline entry. This can be used for debugging with gdb.\n"
332332
"\n"
333+
"\t--use-disasm FILE\n"
334+
"\t\tUse the instruction information in FILE rather than the default\n"
335+
"\t\tdisassmebler. Here, FILE is a CSV file with a single column\n"
336+
"\t\trepresenting instruction addresses.\n"
337+
"\n"
338+
"\t--use-targets FILE\n"
339+
"\t\tUse the jump/call target information in FILE rather than the\n"
340+
"\t\tdefault control-flow recovery analysis. Here, FILE is a CSV\n"
341+
"\t\tfile where the first column is all jump/call targets, and an\n"
342+
"\t\toptional second column is 1 for call targets (functions), or\n"
343+
"\t\t0 otherwise (the default is 0).\n"
344+
"\n"
333345
"\t--version\n"
334346
"\t\tPrint the version and exit.\n"
335347
"\n"

src/e9tool/e9tool.cpp

Lines changed: 59 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,29 @@ static size_t exclude(const std::vector<Exclude> &excludes, intptr_t addr)
529529
return 0;
530530
}
531531

532+
/*
533+
* Next instruction.
534+
*/
535+
static size_t nextInstr(const std::vector<intptr_t> &disasm, intptr_t addr)
536+
{
537+
ssize_t max = (ssize_t)disasm.size() - 1;
538+
if (disasm.size() == 0 || addr > disasm[max])
539+
return INT32_MAX;
540+
ssize_t lo = 0, hi = max;
541+
while (lo <= hi)
542+
{
543+
ssize_t mid = (lo + hi) / 2;
544+
intptr_t i = disasm[mid];
545+
if (addr < i)
546+
hi = mid-1;
547+
else if (addr > i)
548+
lo = mid+1;
549+
else
550+
return 0;
551+
}
552+
return disasm[lo] - addr;
553+
}
554+
532555
/*
533556
* Metadata.
534557
*/
@@ -651,6 +674,8 @@ enum Option
651674
OPTION_SYNTAX,
652675
OPTION_TRAP,
653676
OPTION_TRAP_ALL,
677+
OPTION_USE_DISASM,
678+
OPTION_USE_TARGETS,
654679
OPTION_VERSION,
655680
};
656681

@@ -725,6 +750,8 @@ int main_2(int argc, char **argv)
725750
{"syntax", req_arg, nullptr, OPTION_SYNTAX},
726751
{"trap", req_arg, nullptr, OPTION_TRAP},
727752
{"trap-all", no_arg, nullptr, OPTION_TRAP_ALL},
753+
{"use-disasm", req_arg, nullptr, OPTION_USE_DISASM},
754+
{"use-targets", req_arg, nullptr, OPTION_USE_TARGETS},
728755
{"version", no_arg, nullptr, OPTION_VERSION},
729756
{nullptr, no_arg, nullptr, 0}
730757
};
@@ -741,6 +768,10 @@ int main_2(int argc, char **argv)
741768
std::vector<std::string> option_patch;
742769
std::vector<ActionEntry> option_actions;
743770
std::vector<std::string> option_exclude;
771+
std::string option_use_disasm("");
772+
std::string option_use_targets("");
773+
std::string option_use_funcs("");
774+
744775
int option_sync = 64, option_threshold = 2;
745776
bool option_CFR = false;
746777
srand(0xe9e9e9e9);
@@ -898,6 +929,12 @@ int main_2(int argc, char **argv)
898929
case OPTION_TRAP_ALL:
899930
option_trap_all = true;
900931
break;
932+
case OPTION_USE_DISASM:
933+
option_use_disasm = optarg;
934+
break;
935+
case OPTION_USE_TARGETS:
936+
option_use_targets = optarg;
937+
break;
901938
case OPTION_VERSION:
902939
puts("E9Tool " STRING(VERSION));
903940
return EXIT_SUCCESS;
@@ -1252,6 +1289,13 @@ int main_2(int argc, char **argv)
12521289
/*
12531290
* Disassemble the ELF file.
12541291
*/
1292+
std::vector<intptr_t> disasm;
1293+
bool use_disasm = false;
1294+
if (option_use_disasm != "")
1295+
{
1296+
parseAddrs(option_use_disasm.c_str(), disasm);
1297+
use_disasm = true;
1298+
}
12551299
initDisassembler();
12561300
std::vector<Instr> Is;
12571301
std::vector<Desync> desyncs;
@@ -1279,6 +1323,7 @@ int main_2(int argc, char **argv)
12791323
while (true)
12801324
{
12811325
size_t skip = exclude(excludes, address);
1326+
skip += (use_disasm? nextInstr(disasm, address + skip): 0);
12821327
if (skip > 0)
12831328
{
12841329
address += skip;
@@ -1296,7 +1341,7 @@ int main_2(int argc, char **argv)
12961341
I.first = first;
12971342
first = false;
12981343

1299-
int score = suspiciousness(bytes, I.size);
1344+
int score = (use_disasm? 0: suspiciousness(bytes, I.size));
13001345
if (option_debug && !I.data)
13011346
{
13021347
InstrInfo J;
@@ -1311,6 +1356,11 @@ int main_2(int argc, char **argv)
13111356
(score >= option_threshold? " <data?>": ""));
13121357
}
13131358

1359+
if (I.data && use_disasm)
1360+
error("failed to decode instruction at address 0x%lx; "
1361+
"the \"%s\" disassmebly file may be inaccurate",
1362+
I.address, option_use_disasm.c_str());
1363+
13141364
if (I.data || score >= option_threshold)
13151365
{
13161366
// Data has been detected in the code segment. We attempt to
@@ -1352,13 +1402,20 @@ int main_2(int argc, char **argv)
13521402
section, section_addr, section_addr + section_size,
13531403
section_addr, section_addr + (code - start));
13541404
}
1405+
disasm.clear();
13551406
Is.shrink_to_fit();
13561407
notifyPlugins(out, &elf, Is, EVENT_DISASSEMBLY_COMPLETE);
13571408
size_t count = Is.size();
13581409

13591410
// Step (1a): CFG Analysis (if necessary).
13601411
if (option_targets)
1361-
buildTargets(&elf, Is.data(), Is.size(), elf.targets);
1412+
{
1413+
if (option_use_targets != "")
1414+
parseTargets(option_use_targets.c_str(), Is.data(), Is.size(),
1415+
elf.targets);
1416+
else
1417+
buildTargets(&elf, Is.data(), Is.size(), elf.targets);
1418+
}
13621419
if (option_bbs)
13631420
buildBBs(&elf, Is.data(), Is.size(), elf.targets, elf.bbs);
13641421
if (option_fs)

0 commit comments

Comments
 (0)