Secure, Precise, and Fast Digital Side Channel Defenses
Ashay Rane, Calvin Lin, and Mohit Tiwari

Introduction

Information leakage can cause great damage, ranging from weakening of national security, theft of intellectual property, and complete loss of privacy. While techniques like encryption, access control, and information flow analysis disable many information leaks, they do not prevent all information leaks. Specifically, side channels make it possible to infer secrets even when the system implements encryption, access control, and information flow analysis.

function square_and_multiply(m, s, n) {
z := 1
  for bit b in s from left to right {
    if b == 1 {
      z := (m * z * z) mod n
    } else {
      z := (z * z) mod n
    }
  }

  return z
}
Sample (square and multiply) code for computing m^s mod n.

To understand side channels, consider the above code snippet, which is found in old implementations of cryptographic libraries. The code loops over bits of the secret key, and performs one of two operations depending on the value of the bit. There are several ways in which the secret key in this code can be recovered using side channels. If the adversary observes that the instruction pointer landed in the Taken path of the branch, then she can infer that the secret bit is equal to one. Similarly, time difference in the execution of the Taken and Not Taken paths also revels the secret bit value, because the Taken path includes an extra multiplication. In fact, this block of code has been shown to leak the secret using not just time and instruction pointer, but also the cache, power draw, fault injection, branch predictor, electromagnetic radiation, and sound.

As defenses, architects have presented modified caches, branch predictors, instruction set architectures, memory controllers, scratchpad memories, and memory chip designs. Not only do these solutions require changes in several parts of the microarchitecture, they also may not compose well, thus leaving the system vulnerable if multiple such solutions are employed at the same time. For instance, consider a solution that normalizes execution time along two branch paths by adding NOP instructions. This solution closes the timing channel but introduces different instruction counts along the two branch paths. On the other hand, the addition of dummy instructions to normalize instruction counts will likely result in different execution time along the two branch paths, since NOP instructions may have a different execution latency than the multiply instruction. Solutions may also require disabling compiler and microarchitecture optimizations, thus discarding decades of prior research in computer architecture and compilers. As a result, previous solutions reflect a patchwork approach that cures the symptoms but not the root cause of the problem.

Our Solution: The Escort Compiler

We present the Escort compiler, which accepts non-secure source code and which generates secure, precise, and fast object code. Specifically, code transformed using the Escort compiler closes digital side channels i.e. side channels that carry information over discrete bits) without changing the program output, and while still ensuring compatibility with several compiler and micro-architectural optimizations.

Escort addresses the root cause of the problem (that source code create variations in program execution), thus enabling a single solution to close a broad class of side channels. The key insight behind our solution's ability to close a broad class of side-channel attacks is that in a compiler, control flows and data flows capture a broad range of variations in program execution. Thus, if a solution normalizes the adversary's view of the program's control flows and data flows, then that solution can close a wide range of side channels. As a bonus, a compiler-based solution enables a general-purpose defense that can be easily targeted to different hardware platforms. Escort thus enables the microarchitecture to focus on correctness and performance, while offloading the responsibility of targeted secure execution to the compiler.

Transforming Control Flow.

Transforming if-Statements. As shown in code snippet above, if-statements, which translate to conditional branch instructions, can create differences in program execution, thus revealing secrets. For each conditional branch instruction that evaluates a predicate p, the Escort compiler associates the predicate p with all basic blocks that execute if the predicate is true, and it associates the predicate not p with all basic blocks that execute if the predicate is false. These predicates are used to control the side effects of instruction execution. Here, side effects are modifications to the program state or any observable interaction, including memory accesses, exceptions, function calls, or I/O. Escort controls all side effects except for I/O statements. The Escort compiler then topologically sorts the basic blocks, thus preserving control dependences, before assembling the code into a single basic block with branch instructions removed. Assembling instructions into a single basic block ensures that the same sequence of instructions is executed, regardless of the branch predicate, while the basic block predicates ensure that the transformed program produces the same results as the original program.

Transforming Loops. Some loops introduce timing channels because their trip counts depend on secret values. The Escort compiler transforms such loops using predictive mitigation. The loop body executes as many times as the smallest power of 2 that is greater than or equal to the loop trip count. For instance, if the actual loop trip count is 10, then the loop body is executed 16 times. The basic block predicate ensures that dummy iterations do not cause side effects.

Unfortunately, this naive approach can still leak information. For instance, if two distinct inputs cause the loop to iterate 10 and 1000 times respectively, the transformed codes will iterate 16 and 1024 times respectively — a large difference that may create timing variations. To mitigate this problem, Escort allows the programmer to manually specify the minimum and maximum loop trip counts using programmer annotations.

void cond_store(uint8_t pred, uint32_t* ptr,
    uint32_t new_val) {
  uint32_t result, old_value = *ptr;

  __asm__ volatile (
    "mov    %2, %0;"
    "test   %1, %1;"
    "cmovz  %3, %0;"
    "test   %2, %2;"
    : "=r" (result)
    : "r" (pred), "r" (new_val),
    "r" (old_val)
    : "cc"
  );

  *ptr = result;
}
Conditional memory access operation that does not leak information over digital side channels.

Controlling Side Effects. To ensure proper memory access side effects, the Escort compiler replaces each store instruction with a conditional memory access operation (see above code snippet) that guards the memory access with the predicate, so memory is only updated when the predicate is true. Since pointer dereferences may also leak secrets, Escort replaces pointer dereferences with conditional memory accesses to all elements of the points-to set of each pointer. Escort handles side effects from function calls by propagating the predicate from the calling function to the callee. Thus, each user-defined function is given an additional argument that represents the predicate of the call site's basic block. The callee ensures correct handling of side effects by ANDing its own predicates with the caller's predicate. Finally, Escort requires that the programmer disable floating-point exceptions (e.g. using feclearexcept()). For integer exceptions, Escort replaces abnormal operands with benign operands (e.g. Escort prevents integer division-by-zero by replacing a zero divisor with a non-zero divisor).

Transforming Array Accesses

Array index values reveal secrets as well. For instance, if the adversary observes that accesses to array[0] and array[secret_index] result in accesses to locations 10 and 50, then the adversary knows that secret_index = 40. To eliminate such information leaks, the Escort compiler transforms each array access into a linear sweep over the entire array, which hides from the adversary the address of the program’s actual array index.

Indeed, the transformed code is expensive, but this approach is feasible because caches and prefetchers in modern processors dramatically reduce the cost of sweeping over arrays.

Transforming Variable-Latency Instructions

The latency of certain instructions on x86 processors depends on operand values, thus leaking operand information over the timing side channel. Among these instructions, floating-point multiplication, division, square-root, and type-conversion instructions exhibit wide variations in execution time. Our analysis of 94 x86 SSE and SSE2 instructions reveals that subnormal operands induce the longest latency.

Escort's fixed-time floating-point operations utilize SIMD lanes in x86 SSE and SSE2 instructions. Our solution (1) loads genuine and dummy (subnormal) inputs in spare SIMD lanes of the same input register, (2) invokes the desired SIMD instruction, and (3) retains only the result of the operation on the genuine inputs. Our tests confirm that the resulting SIMD instruction exhibits the worst-case latency, with negligible variation in running time (standard deviation is at most 1.5% of the mean).

Security Evaluation

We now evaluate Escort’s defense against the timing channel attack by Andrysco et al. on the Firefox web browser. The attack reconstructs a two-color image inside a victim web page using only the timing side channel in floating-point operations. The attack convolves the given secret image with a matrix of subnormal values — a step that is timed using high resolution Javascript timers. By comparing the measured time to a threshold, each pixel is classified as either black or white, effectively reconstructing the secret image. We integrate Escort into Firefox’s convolution code and re-run the timing attack. The results (see above figure) show that Escort successfully disables the timing attack.

Impact

Beyond making available better tools for defending against side channels, we believe that our research makes three important contributions that translate directly to long-term research impact in defending against side channels.

Protecting General-Purpose Programs. By embedding the defense into a compiler, our research demonstrates a way to close side channels in a wide variety of programs, unlike many prior solutions that target specific cryptographic hardware. At the same time, since the compiler-generated code does not contain any secret information, the transformed code can be released to the adversary without any impact on security. Our solution thus demonstrates a way to close digital side channels while adhering to Kerckhoff's principle that “a cryptosystem should be secure even if everything about the system, except the key, is public knowledge”.

Single Solution that Closes Multiple Side Channels. A large body of prior work (cache line locking, pre-loading cache entries, random caches, scratchpad memory, ORAM-based memory controllers, fuzzing timing information, private caches, static caches, and periodic cache cleansing) closes isolated side channels, sometimes even negating each others' defenses. Instead, by leveraging the knowledge of source code behavior creating differences in side-channel observations, our solution addresses the root cause instead of addressing the symptoms.

Simplified Microarchitecture Design. Microarchitectural defenses for closing side channels include a wide variety of changes, thus requiring substantial modifications to the processor. A unified software-level defense simplifies the microarchitecture design, while still guaranteeing security.

Compatibility of Security and Optimizations. Prior solutions rely on disabling various compiler and microarchitectural optimizations for ensuring security. Our research shows that security and optimizations are not completely in conflict, and does so by decoupling the optimization policies from secrets. Consequently, both transformed as well as non-transformed code can continue to reap benefits of decades of research in compiler and microarchitectural optimizations.