The adaptive function cloning technique relocates function images to make them adaptively updateable. A series of checks safeguard from and can handle potentially unsafe relocations, such as:
Backward branches. A function could contain backward branches targeting the area occupied by the trampoline. A check for such branches prior to function cloning ensures that a possibly unsafe branch is avoided. The callers of functions that contain backward branches could be updated instead to directly invoke a newer version.
Data-in-code. Dynamic instrumentation systems generally cannot handle self-modifying code, code-in-data or data-in-code without a significant slowdown that is unsuitable for an operating system kernel.
For example, Linux uses the custom BUG macro to produce code raising an exception on a failed assertion. handle_BUG handles the exception by extracting the line number and a pointer to the file name containing the failed assertion. As shown in Figure 2-4, this information is stored as data in code right after the ud2 instruction that raises the exception.
Figure 2-4. Definition of BUG macro in Linux 2.4
#define BUG() \ asm volatile( "ud2\n" \ "\t.word %c0\n" \ "\t.long %c1\n" \ : : "i" (__LINE__), "i" (__FILE__))
The relocation logic may be misled to interpret the data as outbound branch instructions and refuse to update the function. Such conservative handling of data-in-code cases do not compromise the correctness of the updating framework.
Absolute memory addresses. Kernel code is sometimes written to contain references to absolute memory addresses in the kernel image.
One example is the switch_to macro in Linux which performs context switiching. As shown in Figure 2-5, it stores the value of the program counter (EIP) that a process will use in the future when it receives the processor again. When disassembled in Figure 2-6, the macro produces an absolute memory address pointing to the original function image and would break the redirection.
Figure 2-5. Definition of switch_to macro in Linux 2.4
#define switch_to(prev,next,last) do { \ asm volatile(... \ "movl $1f,%1\n\t" /* save EIP */ \ "pushl %4\n\t" \ "jmp __switch_to\n" \ "1:\t" \ "popl %%ebp\n\t" \ ...);
Figure 2-6. Disassembled output of switch_to macro in Linux 2.4
0xc01127f1 movl $0xc0112806,0x274(%eax) 0xc01127fb pushl 0x274(%esi) 0xc0112801 jmp 0xc0107120 <__switch_to> 0xc0112806 pop %ebp
DynAMOS inspects the original functions and detects uses of literals that happen to correspond to absolute memory addresses within the memory image of the original function. It acts conservatively and warns the user of such data uses. If permitted, it adjusts the address for relocation.
Outbound branches. Kernel code is sometimes written to contain outbound branches from one function to a common section of code and a branch back that continues execution.
For example, Linux produces semaphore and locking code in this way. In the down semaphore acquire operation shown in Figure 2-7, an atomic counter decrement checks if the semaphore is still in use. In the common case where it's not, execution falls through for improved performance. If the semaphore is in use, __down_failed is called. The pair of LOCK_SECTION macros insert linker directives that place the uncommon call to __down_failed in a separate memory area. The matching assembly produced for this sequence, when used with pipe_release, is shown in Figure 2-8. On a failure to acquire a semaphore, execution jumps to global table Letext. A wrapper call to __down_failed is issued, with a subsequent jmp back to the main pipe_release code.
Figure 2-7. down semaphore acquire operation in Linux 2.4
static inline void down(struct semaphore * sem) { __asm__ __volatile__( "# atomic down operation\n\t" LOCK "decl %0\n\t" /* --sem->count */ "js 2f\n" "1:\n" LOCK_SECTION_START("") "2:\tcall __down_failed\n\t" "jmp 1b\n" LOCK_SECTION_END :"=m" (sem->count) :"c" (sem) :"memory"); }
Figure 2-8. Disassembled output of down semaphore acquire operation in Linux 2.4
__asm__ __volatile__( 281: mov %eax,%ecx 283: decl 0x6c(%esi) 286: js 158b <Letext+0x3c> down(PIPE_SEM(*inode)); PIPE_READERS(*inode) -= decr; 28c: mov 0x108(%esi),%eax 292: sub %edx,0x18(%eax) ... 0000154f <Letext>: ... 158b: call 158c <Letext+0x3d> 1590: jmp 28c <pipe_release_v2+0x1c> ...
If outbound branches are relocated, the jmp back to the function image will divert execution flow from a cloned function to its original. DynAMOS detects such wrapper code outbound jumps and relocates their call/jmp pairs at the end of the function image, adjusting their relative offsets.
Indirect outbound branches. Compilers sometimes produce code that uses an indirection table when C switch statements or multiple if statements are used.
For example, when gcc compiles a C function containing a switch statement with more than 4 case options it produces code that uses an indirection table. The table is dereferenced with an indirect jump to determine the next value of the program counter. An example table is found in do_signal in Linux. DynAMOS inspects indirect branches to detect indirection tables. The table inspection stops at the first 4-byte table entry whose target address falls outside the range of the original function. The indirection tables identified are relocated at the end of the new function image.
Multiple entrypoints. Compilers sometimes produce functions that contain multiple entrypoints as an optimization.
For example, icc produces multiple entrypoints for some functions as a result of an interprocedural constant propagation optimization. Functions are split between a prologue and a core for a total of two symbols per function. As shown in Figure 2-9, the prologue code <filp_open> is the safe entrypoint which moves function arguments from the stack into registers. It does not contain a ret and falls through to the core <filp\_open.>. Callees invoke either the prologue or the core accordingly.
Figure 2-9. Multiple entrypoints produced by icc
c01505f8 <filp_open>: c01505f8: 8b 44 24 04 mov 0x4(%esp,1),%eax c01505fc: 8b 54 24 08 mov 0x8(%esp,1),%edx c0150600: 8b 4c 24 0c mov 0xc(%esp,1),%ecx c0150604 <filp_open.>: c0150604: 55 push %ebp c0150605: 53 push %ebx c0150606: 83 ec 30 sub $0x30,%esp ... c015064f: 5d pop %ebp c0150650: c3 ret
Execution flow of the core (e.g. <filp_open.>) in multiple entrypoints can still be redirected by applying the trampoline. However, for newer versions of the function the compiler must produce code that is again split between a prologue and a core. Prologue code (e.g. <filp_open>) needs to be at least 6-bytes long for the trampoline to be safely applied. For a smaller prologue, the local bounce allocation technique outlined in GILK, which DynAMOS does not yet implement, can be applied.