Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cache Attacks and Countermeasures:

Similar presentations


Presentation on theme: "Cache Attacks and Countermeasures:"— Presentation transcript:

1 Cache Attacks and Countermeasures:
the Case of AES Dag Arne Osvik, Adi Shamir and Eran Tromer Presented by Ophir Arbiv

2 Sources [1] Cache Attacks and Countermeasures: the Case of AES (Extended Version),2005, Dag Arne Osvik, Adi Shamir and Eran Tromer. [2] theory.csail.mit.edu/~tromer/SKC2006/cache-skc06.ppt – Tromer’s lecture in MIT. [3] - Adi Shamir’s lecture in Weizman Inst.

3 AES – Advanced Encryption Standard
DES becoming outdated NIST announces competition to design a successor. Evaluation criteria - Security, Cost, Algorithm & Implementation Characteristics 21 Algorithms were received. In NIST selected Rijndael as the proposed AES algorithm. Rijndael was proposed by Dr. Vincent Rijmen and Dr. Joan Daemen from Belgium Properties: Symmetric Block Cipher Based in finite mathematics 128 bit Data and Key size of 128, 192 and 256 bits. Resistant to known attacks. Security Effort required for cryptanalysis Mathematical Basis of the algorithm Security Issues raised by public. Cost Licensing requirements Computational efficiency Memory requirements Algorithm & Implementation Characteristics Flexibility Hardware & Software suitability Simplicity בעוד ריינדל תומך בטווח רחב הן של גושי צופן והן של מפתחות. למעשה כל כפולה של 32 סיביות בטווח הרי ש-AES מיישם את צופן ריינדל בשינוי קל, גודל הגושים נקבע ל-128 סיביות והוגדרו שלושה מפתחות אפשריים 128, 192 או 256 סיביות. הרחבת המפתח מתבצעת על ידי תהליך ריינדל להרחבת מפתח.

4 AES Algoritrhm The mathematical description of the algorithm:
Source:

5 Efficient Implementation
Originally proposed in the Rijndael spec, and is now widely used. Uses pre-computed table lookups. = Tables: Key: Round implementation: Each round table lookups, 16 xor’s, and 12 shifts. .Tables occupy – 4 KB (X2)

6 AES - summary During AES selection, only branch statements, arithmetic, and data-dependent shift were considered vulnerable. Proposed Algorithms was widely analyzed. Apparently, since it uses only table lookup, xor & shift, NIST declared Rijndael “not vulnerable to timing attacks. NSA declared AES-128 can be used to protect all US Government data except Top Secret data which needs AES-256 (at least). No known direct attacks as for today. Expected to be the standard for 20+ years.

7 Side Channels Any observable information emitted as a byproduct of the physical implementation of the cryptosystem. K Plaintext Cipher Side Channels Ciphertext Source:

8 Side Channels Examples for side-channels :
Power consumption (simple, differential…) Time Heat Acoustic Noise (Keyboards..) Cache Fault (power glitch, jitter..) Electromagnetic radiation Visual

9 Why Cache Analysis? → timing gap cache CPU core 60% (until recently)
Annual speed increase: CPU core 60% (until recently) Main memory 7-9% Typical latency: 50-150ns 0.3ns → timing gap

10 Cache Attacks The cache is a shared resource. => cache state affects and affected by all processes. => possible crosstalk between processes. Process memory is usually protected but… Information about memory access patterns of other processes is leaked. Cache attacks are pure software attacks. Very cheap. A process with no special privileges & no interaction with the cryptographic code (some variants) can attack the cryptographic code.

11 cache set (W cache lines)
How Cache Works? The cache holds copies of aligned blocks of B bytes in main memory (blocks). When a memory access instruction is processed, memory cell is searched in the cache first. If a cache miss occurs, a full memory block is copied into the appropriate set (S possible sets) into one of the W cache lines. Memory Access DRAM cache cache line (B bytes) cache set (W cache lines) memory block (B bytes) Cache

12 How Does a Cached Table Look Like?
S-box table DRAM cache

13 Notation δ – the cache line size B divided by the size of each table entry (usually 64/4 =16). <y> = the memory block of y in Tl. <y> = <z> iff when used as lookup indices into the same table T`, they would cause access to the same memory block Qk(p,l,y) = 1 - iff the AES encryption of the plaintext p under the encryption key k accesses the memory block of index y in Tl at least once (during the 10 rounds).

14 Cache Attacks on AES The efficient implementation of the algorithm has a big weakness: The lookup addresses strongly rely on the encryption key ( The Secret). Therefore, by knowing which memory cells were accessed we can extract the key (suppose a BUS attack). Usually the attacker doesn’t have access to the BUS and the memory is partitioned and protected by the OS. The Solution : The cache is a shared resource through which we can learn about the memory access patterns of other processes.

15 Synchronous Attacks The plaintext or cipher-text is known
The attacker can operate synchronously with the encryption (on the same processor). Examples: sending data packets through a secure channel in a VPN. Linux’s dm-crypt and cryptoloop services. The Attack Scheme Obtain a set of random samples, Mk(p,l,y) of the predicate Qk(p,l,y). Perform off-line cryptanalysis: Guess small parts of the key. Use the guess to predict memory accesses. Check whether the predictions are consistent with the collected data.

16 One Round Attack Consider one of the memory accesses in the 1st round: T0[p0  k0] Given a candidate value k’0 and samples of Q(p,l,y): The useful samples are those that fulfill: p0  k’0y If k’0k0 then for all useful samples: p0  k0 p0  k’0 y so T0[p0  k0] accesses address y => Q(p,l,y)=1 Otherwise: p0  k0  p0  k’0  y => Q(p,l,y)=0 But there are 35 more “random” accesses to T0… with probability (1-1/16)350.104 A few hundred (!) random samples suffice to eliminate all bad candidates. High nibble of all key bytes (log2(256/ δ)) are extracted (64 bits).

17 Full Key Extraction We managed to narrow down each byte of the key to δ possibities, with a straightforward method. (in the common case it means extracting half the key - 64 bits) This is all the possible information from 1st round accesses. By moving to 2nd round and taking advantage of the non-linearity of the S-box we can extract the full key!!

18 Two Round Attack These equations for the 2nd round are easily derived from the Rijndael specification: { s(·) denotes the Rijndael S-box function and • denotes multiplication over GF(256).} is used as an index to T2. The only relevant unknowns in the index are the low nibbles of k0,k5,k10 and k15 (216 candidates). Can test a candidate as before: Predict this lookup according to guess {k’0,k’5,k’10, k’15} (lower nibble k2 irrelevant). Identify useful samples, i.e., those where y is in the same memory block as the prediction Check whether Q(p,l,y)=1 for all useful samples. There are 3 more accesses of this special form, with disjoint sets of relevant low nibbles. => full key recovery using ~2000 random samples.

19 Measurement Methods How do we obtain the measurements Mk(p,l,y) of predicate Qk(p,l,y) ?? Inter-process crosstalk can be exploited in two ways: Effect of the cache on the encryption (timing). Effect of the encryption on the cache.

20 Measurement Method 1: Evict + Time
Attacker memory 1. Make sure the tables are cached T0 2. Evict one cache set DRAM 3. Time an encryption and see if it’s slow cache

21 Results Weakness of this method:
It relies on timing the triggered encryption => it is very sensitive to variations in the operation (noise due scheduling, branches, cache contention and ect.) The authors were able to extract key only from artificial service (using OpenSSL libs) but not from real services.

22 Measurement Method 2: Prime + Probe
Trying to discover the set of memory blocks read by the encryption a posteriori, by examining the state of the cache after encryption. 1. Completely evict tables from cache Attacker memory 2. Trigger a single encryption S-box table 3. Access attacker memory again and see which cache sets are slow DRAM cache

23 Results Yields more information (4 · 256/ δ) from a single encryption
Not a timing attack! Attacker is timing a simple operation performed by itself! Insensitive to timing variance in encryption code path (crucial for effective attacks on complicated systems). No real need to trigger the encryption – can wait until it happens by itself… :

24 Synchronous Attacks - summary
For a known plain-text & sync. attacker Two Measurement methods. Results: OpenSLL libs on Athlon 64: Evict + Time – 500,000 encryptions. (why?) Prime & Probe – 300 encryptions, (16K on P4E). Real Linux dm_crypt: Prime & Probe – 800 write operations – 65 ms + 3 sec offline analysis. Variants …

25 Asynchronous Attack Someone runs encryptions computations using a secret key. Attacker process runs on the same CPU at (roughly) the same time. Assume the plaintext/ciphertext has a non-uniform (conditional) distribution: English Formatted data Headers Ciphertext gleaned from wire Examples: just about any use of crypto on a multi-user system Finding the key Compare two distributions: Measured memory accesses statistics. Predicted memory accesses statistics, under the given plaintext distribution and the key hypothesis. Find key that yields best correlation

26 Countermeasures The authors consider numerous countermeasures e.g.:
Avoiding Memory Accesses Alternative Lookup Tables Data-Oblivious Memory Access Pattern Cache State Normalization and Process Blocking Disabling Cache Sharing Static or Disabled Cache Dynamic Table Storage Hiding the Timing None of the them solves the problem completely. Some are architecture/application dependant or require changes in the system. None are both secure, efficient (or cheap) and generic. => Case specific solutions – probably a combination of the methods.

27 Thank you! Questions?

28 Homework What is the difference between Evict+Time and Prime+Probe measurement methods. In the case of known cipher-text, how would the attack change? (hint: can be more efficient – see paper) Why does a first round synchronous attack able to extract only half the key bits? (on a δ=16 platform) Does the addition of random delay to the encryption algorithm improve the immunity against synchronous attacks? Why?


Download ppt "Cache Attacks and Countermeasures:"

Similar presentations


Ads by Google