DERIVE: A tool that automatically reverse-engineers instruction encodings

Citation
Dr. Engler et Wc. Hsieh, DERIVE: A tool that automatically reverse-engineers instruction encodings, ACM SIGPL N, 35(7), 2000, pp. 12-22
Citations number
22
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM SIGPLAN NOTICES
ISSN journal
15232867 → ACNP
Volume
35
Issue
7
Year of publication
2000
Pages
12 - 22
Database
ISI
SICI code
1523-2867(200007)35:7<12:DATTAR>2.0.ZU;2-F
Abstract
Many binary tools, such as disassemblers, dynamic code generation systems, and executable code rewriters, need to understand how machine instructions are encoded. Unfortunately, specifying such encodings is tedious and error- prone. Users must typically specify thousands of details of instruction lay out, such as opcode and field locations values, legal operands, and jump of fset encodings. We have built a tool called DERIVE that extracts these deta ils from existing software: the system assembler. Users need only provide t he assembly syntax for the instructions for which they want encodings. DERI VE automatically reverse-engineers instruction encoding knowledge from the assembler by feeding it permutations of instructions and doing equation sol ving on the output. DERIVE is robust and general. It derives instruction encodings for SPARC, M IPS, Alpha, PowerPC, ARM, and x86. In the last case, it handles variable-si zed instructions, large instructions, instruction encodings determined by o perand size, and other CISC features. DERIVE is also remarkably simple: it is a factor of 6 smaller than equivalent, more traditional systems. Finally , its declarative specifications eliminate the mis-specification errors tha t plague previous approaches, such as illegal registers used as operands or incorrect field offsets and sizes. This paper discusses our current DERIVE prototype, explains how it computes instruction encodings, and also discus ses the more general implications of the ability to extract functionality f rom installed software.