2.4. Safe Execution Flow Redirection

Execution flow redirection is accomplished by installing a trampoline in the beginning of a function. The 6-byte trampoline redirects execution to a redirection handler that offers adaptive execution, as described in Section 2.3. Trampolines may only be installed on function images that are larger than the 6-byte trampoline. Local processor interrupts are disabled during trampoline installation which guarantees the installation is safe from interrupt handlers.

It is also guaranteed that installation is safe from sleeping processes. It is not possible for a process to block midway through execution of code that would be overwritten by the trampoline. Processes in an operating system block when they explicitly call the kernel scheduler. Figure 2-1 shows an example of a routine that immediately calls the Linux scheduler on entry.

Figure 2-1. A routine that immediately blocks through a call to the scheduler.

void functionA()
{
  schedule();
  functionB();
}
	    
When disassembled in Figure 2-2, 6 bytes are dedicated to frame management and another 5 bytes consumed by the call to the scheduler. The total of 11 bytes is less than the 6 bytes needed by the trampoline.

Figure 2-2. Disassembled output of a routine that immediately blocks through a call to the scheduler.

00000052 <functionA>:
  52:   55                push   %ebp
  53:   89 e5             mov    %esp,%ebp
  55:   83 ec 08          sub    $0x8,%esp
  58:   e8 fc ff ff ff    call   59 <functionA+0x7>
  5d:   e8 fc ff ff ff    call   5e <functionA+0xc>
  6c:   c9                leave
  6d:   c3                ret
	    
Lets explore an extreme case in Figure 2-3 where a routine is highly-optimized. The call instruction consumes 5 bytes, and could still be updated by a 5 byte trampoline that used direct addressing. In all three cases, in a fixed instruction-length architecture (e.g. PowerPC) the call to the scheduler is overwritten by the trampoline as a single instruction and ensures safe instrumentation in sleeping processes.

Figure 2-3. Disassembled output of a highly optimized version of the routine using the gcc arguments -Os (optimize for size) and -fomit-frame-pointer (omit the frame pointer).

  24:   e8 fc ff ff ff    call   25 <functionA+0x1>
  29:   e8 fc ff ff ff    call   2a <functionA+0x6>
  33:   e9 fc ff ff ff    jmp    34 <function_D+0xb>
	    

Note

There are plans to replace indirect branch addressing with direct branch addressing. This will reduce the performance overhead due to the processors poor trace-cache engine performance (branch mispredictions) and additionally reduce the trampoline size to 5 bytes.

Note

There are plans to support safe execution flow redirection in multi-processor systems.