Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architectures for Secure Processing Matt DeVuyst.

Similar presentations


Presentation on theme: "Architectures for Secure Processing Matt DeVuyst."— Presentation transcript:

1 Architectures for Secure Processing Matt DeVuyst

2 Research Exam - Matt DeVuyst 2 Introduction L2 L1 - D L1 - I Pipeline, Functional Units L3 Main Memory Memory Bus CPU Line of Trust Points of Attack EDU Keys Encryption Decryption Unit and keys

3 Research Exam - Matt DeVuyst 3 Introduction What kind of security?  Protection of what?  For whom?  From whom/what? This work focuses on:  Protection of execution (process data and control flow)  Protection for users, copyright holders, software companies  Protection from all other processes (including OS) and physical attack This work focuses on general purpose security mechanisms for general purpose computers.

4 Research Exam - Matt DeVuyst 4 Introduction This research takes an architecture-centric approach.  Cryptographic algorithms may be utilized but they will not be proven  Focus is given to hardware support Software and OS reap the benefits

5 Research Exam - Matt DeVuyst 5 Goals Execution Privacy  Process control flow and data exposed only to the CPU Execution Integrity  Process control flow and data cannot be tampered with without detection

6 Research Exam - Matt DeVuyst 6 Outline Execution Privacy Execution Integrity Proposed Architectures Conclusions and Open Questions

7 Research Exam - Matt DeVuyst 7 Outline Execution Privacy  Naïve Encryption  One Time Pad (OTP) Encryption  Improved OTP Encryption Execution Integrity Proposed Architectures Conclusions and Open Questions

8 Research Exam - Matt DeVuyst 8 Naïve Encryption Encryption/ Decryption Unit CPU Memory Memory Bus Plaintext DataCyphertext Data Plaintext Data

9 Research Exam - Matt DeVuyst 9 A Closer Look At the Encryption/Decryption Unit AES in Cipher Block Chaining (CBC) Mode

10 Research Exam - Matt DeVuyst 10 Issues With Naïve Encryption On the critical path → Performance suffers Not secure against all attacks

11 Research Exam - Matt DeVuyst 11 Why Naïve Encryption Is Not Secure PlaintextCiphertext time Pattern is identical Encrypt Data Only

12 Research Exam - Matt DeVuyst 12 Why Naïve Encryption Is Not Secure PlaintextCiphertext time Pattern is still identical Encrypt Data/Address Writes to same address

13 Research Exam - Matt DeVuyst 13 Why Naïve Encryption Has Poor Performance Stores are effectively immune to encryption latency  Store buffer Loads that miss in the cache cost:  Time to bring in data from memory  Time to decrypt that data time Memory LatencyDecryption Latency Load Instruction

14 Research Exam - Matt DeVuyst 14 Outline Execution Privacy  Naïve Encryption  One Time Pad (OTP) Encryption*  Improved OTP Encryption Execution Integrity Proposed Architectures Conclusions and Open Questions * Suh, et al. “Efficient Memory Integrity Verification and Encryption for Secure Processors” – MIT and Yang et al. “Fast Secure Processor for Inhibiting Software Piracy and Tampering” – UC Riverside

15 Research Exam - Matt DeVuyst 15 How OTP Encryption/Decryption Works EncryptionDecryption

16 Research Exam - Matt DeVuyst 16 Why OTP Encryption is Secure PlaintextCiphertext time No pattern is expressed Encrypt addr, seq # Writes to same address

17 Research Exam - Matt DeVuyst 17 How OTP Encryption Solves the Performance Problem Decryption done in parallel with load  Taken off the critical path The key to how it works  Decryption cannot depend on ciphertext time Memory LatencyDecryption Latency Load Instruction XOR

18 Research Exam - Matt DeVuyst 18 The Achilles’ Heel of OTP Encryption Sequence number must be available long before memory access completes time Memory Latency Decryption Latency Load Instruction Sequence number available here Sequence number associated with every cache-block- sized chunk of memory → Cannot keep all sequence numbers on chip XOR One solution: sequence number cache

19 Research Exam - Matt DeVuyst 19 Outline Execution Privacy  Naïve Encryption  One Time Pad (OTP) Encryption  Improved OTP Encryption* Execution Integrity Proposed Architectures Conclusions and Open Questions * Shi, et al. “High Efficiency Counter Mode Security Architecture Via Prediction and Precomputation” – Georgia Tech

20 Research Exam - Matt DeVuyst 20 Solutions To the OTP Problem Prediction and Precomputation  Predict sequence number  Precompute pad  When memory access completes, compare real sequence number with predicted one If they match, use precomputed pad If they don’t match, compute real pad

21 Research Exam - Matt DeVuyst 21 Prediction and Precomputation TLB Root Seq # Page of memory Real Seq # Page table entry Cache block

22 Research Exam - Matt DeVuyst 22 Prediction and Precomputation TLB Page of memory Page table entry Cache block Initially, all sequence numbers are set to page’s root sequence number

23 Research Exam - Matt DeVuyst 23 Prediction and Precomputation TLB Page of memory Page table entry Cache block Writes increment the sequence numbers

24 Research Exam - Matt DeVuyst 24 Prediction and Precomputation TLB Page of memory Page table entry Cache block Start predictions with this Memory Latency Generate pad for seq # Load Instruction Generate pad for seq # Generate pad for seq #

25 Research Exam - Matt DeVuyst 25 Better Prediction and Precompuatation Problem: Frequently updated data will have sequence number beyond prediction depth  One solution: Reset root sequence number Use a prediction history for each page This is called “adaptive prediction” TLB Root Seq # Page table entry Prediction History

26 Research Exam - Matt DeVuyst 26 Better Prediction and Precompuatation Problem: Frequently updated data will have sequence number beyond prediction depth  Another solution: Record past difference (diff) between root sequence number and real sequence number On subsequent load, make predictions around root sequence number + diff This is called “context-based” prediction TLB Root Seq # Page table entrydiff Register

27 Research Exam - Matt DeVuyst 27 Prediction and Precomputation Accuracy  “Adaptive prediction” is reported to be about 80% accurate*  “Context-based prediction” is reported to be close to 100% accurate* (though this has not yet been verified by other researchers). Cost  Larger TLB  Slightly larger memory footprint and bandwidth requirement Conclusion  Using OTP with optimizations, decryption latency is almost completely hidden. * Shi, et al. “High Efficiency Counter Mode Security Architecture Via Prediction and Precomputation” – Georgia Tech

28 Research Exam - Matt DeVuyst 28 Outline Execution Privacy Execution Integrity  Basic Execution Integrity  Cached Hash Trees  Log Hashing Proposed Architectures Conclusions and Open Questions

29 Research Exam - Matt DeVuyst 29 Execution Integrity – Basic Idea On a write…  Keyed hash is taken over data and address  Data and hash are stored in memory On a read…  Data and hash are returned from memory  Hash is computed  Compare computed hash and returned hash CPUMemory DataHash(Key,Data,Address) DataHash(Key,Data,Address)

30 Research Exam - Matt DeVuyst 30 Security Analysis of Basic Execution Integrity Arbitrary data cannot be introduced because:  The hash is keyed and  An attacker does not know the key Data stored at one address cannot be substituted for data stored at another address because:  Hashing the data along with the address binds the two But a replay attack is possible because:  An attacker may replay stale data previously stored at the given address

31 Research Exam - Matt DeVuyst 31 Outline Execution Privacy Execution Integrity  Basic Execution Integrity  Cached Hash Trees*  Log Hashing Proposed Architectures Conclusions and Open Questions * Blum, et al. “Checking the Correctness of Memories” – UC Berkley Gassend, et al. “Caches and Hash Trees for Efficient Memory Integrity Verification” – MIT Merkle, et al. “Protocols for Public Key Cryptography”

32 Research Exam - Matt DeVuyst 32 Cached Hash Trees Fundamental problem with basic hashing  Hashes verified data integrity, but nothing verified the integrity of the hashes A solution: cached hash trees  Keyed hashes are taken over data  Keyed hashes are taken over those hashes, etc.  Problem: memory requirement of hashes Solution: Hashes are stored in memory and cached on- chip along with data.

33 Research Exam - Matt DeVuyst 33 Cached Hash Trees How it works  A tree is built  Leaf nodes contain data  Intermediate nodes are hashes  The root hash is kept in a special register on-chip  Hashes are only updated when necessary Data Block Hash

34 Research Exam - Matt DeVuyst 34 Cached Hash Tree Consistency Invariant:  If a node is in memory → then it’s parent hash is consistent with it (whether the hash is in the cache or in memory)

35 Research Exam - Matt DeVuyst 35 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash hashes are not updatedIf data is written …

36 Research Exam - Matt DeVuyst 36 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash parent hash in cache is updatedIf dirty data is evicted …

37 Research Exam - Matt DeVuyst 37 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash parent hash in cache is updatedIf a hash block is evicted …

38 Research Exam - Matt DeVuyst 38 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash 1. The parent is loaded and verified against grandparent. If data is loaded and parent hash is not in the cache … 2. Then the data is verified against its parent.

39 Research Exam - Matt DeVuyst 39 Performance Analysis of Cached Hash Trees Common case: Hash nodes are in cache Data evictions only require an update to a cached node Data loads only require one hash check with cached node Uncommon case: Hash nodes are not in the cache Data evictions require hash node loads Data loads require hash node loads Passing hash nodes across the memory bus cuts into the bandwidth of data Hash nodes occupy space in the cache

40 Research Exam - Matt DeVuyst 40 Outline Execution Privacy Execution Integrity  Basic Execution Integrity  Cached Hash Trees  Log Hashing* Proposed Architectures Conclusions and Open Questions * Suh, et al. “Efficient Memory Integrity Verification and Encryption for Secure Processors” – MIT

41 Research Exam - Matt DeVuyst 41 Log Hashing Key insight  Verification is not necessary at every load  Verification is necessary before application results are produced Implication  Relax constraint on constant, vigilant verification

42 Research Exam - Matt DeVuyst 42 Log Hashing – Incremental Multiset Hashes* Incremental  Keyed hash is not computed over all data, just additional data Multiset  Duplicate items are allowed  Multiplicity of items is significant  Order of items is not Hash Set 1 Set 2 = Hash Engine * Clarke, et al. “Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking” – MIT

43 Research Exam - Matt DeVuyst 43 Log Hashing 2 incremental multiset hashes  WriteHash Hashes everything evicted from cache (written to memory)  ReadHash Hashes everything fetched from memory Counters are associated with memory operations and keyed hashes taken over (data, counter, address)

44 Research Exam - Matt DeVuyst 44 Log Hashing 3 phases of operation  Initialization All program data written out to memory (hashed into WriteHash)  Run-time Hash of every eviction is added to WriteHash Hash of every fetch is added to ReadHash  Verification All data not in cache is brought in (hashing into ReadHash) ReadHash compared to WriteHash. If equal, integrity maintained. Else, integrity violated.

45 Research Exam - Matt DeVuyst 45 Log Hashing - Initialization Write HashRead Hash Memory Cache

46 Research Exam - Matt DeVuyst 46 Log Hashing – Run-time Write HashRead Hash Memory Cache

47 Research Exam - Matt DeVuyst 47 Log Hashing – Run-time Write HashRead Hash Memory Cache

48 Research Exam - Matt DeVuyst 48 Log Hashing – Verification Write HashRead Hash Memory Cache =

49 Research Exam - Matt DeVuyst 49 Log Hashing – Performance Analysis Initialization and verification are very costly We assume initialization and verification are rare occurrences. Run-time hashing has no overhead Loading/storing sequence numbers in memory incurs a small performance overhead and a small memory overhead.

50 Research Exam - Matt DeVuyst 50 Log Hashing – Security Analysis If data is tampered with in memory:  ReadHash will be different from WriteHash. If data was returned from memory more times than it was written (as in a replay attack):  The multiplicity of hashed items will not match → hashes will not match. If data is returned from memory out of order:  The hashes won’t match because different counter values would have been hashed in with the data.

51 Research Exam - Matt DeVuyst 51 Outline Execution Privacy Execution Integrity Proposed Architectures  XOM  SP  AEGIS  SENSS Conclusions and Open Questions

52 Research Exam - Matt DeVuyst 52 Proposed Architectures XOM*  First of its kind  Uses naïve privacy and integrity mechanisms  Slow and vulnerable to attack  Keys for encryption and hashing burned on chip * Lie, et al. “Architectural Support for Copy and Tamper Resistant Software” – Stanford

53 Research Exam - Matt DeVuyst 53 Proposed Architectures Secret-Protected*  Based on XOM  Uses naïve privacy and integrity mechanisms  Decouples secret from device Key stored on chip only during user session User keys are separate from device secret (hardware key) and are transferable * Lee, et al. “Architecture for Protecting Critical Secrets in Microprocessors” – Princeton

54 Research Exam - Matt DeVuyst 54 Proposed Architectures AEGIS*  Uses OTP encryption for privacy without performance optimizations like prediction and precomputation  Uses cached hash trees for integrity  Hides device keys using Physically Random Functions (PRFs) The circuit timing characteristics of a particular chip are unique and impossible to measure. PUFs exploit this to create device secrets * Suh, et al. “Design and Implementation of the AEGIS Single-Chip Secure Processor Using Physical Random Functions” – MIT

55 Research Exam - Matt DeVuyst 55 Proposed Architectures SENSS*  Uses simple OTP encryption scheme like AEGIS  Uses cached hash tree scheme like AEGIS  Adds support for multiprocessor systems Each device has its own key Combination Cipher Block Chaining and One Time Pad mode encryption is used for cache-to-cache transfers * Zhang, et al. “SENSS: Security Enhancement to Symmetric Shared Memory Multiprocessors” - UTD

56 Research Exam - Matt DeVuyst 56 Outline Execution Privacy Execution Integrity Proposed Architectures Conclusions and Open Questions

57 Research Exam - Matt DeVuyst 57 Conclusions – OTP Execution privacy is solved by OTP encryption (with optimizations)  Secure against all system-level attacks and physical attacks (outside processor).  Almost no performance cost

58 Research Exam - Matt DeVuyst 58 Conclusions – Cached Hash Trees Cached hash trees are secure against all known attacks But they have potentially poor performance  No research has been done to stress test them Performance is bad when hash tree is not in cache → a large working set or pathological access pattern may result in poor performance

59 Research Exam - Matt DeVuyst 59 Conclusions – Log Hashing Log hashing is secure as long as verification is done before results are used  How do you ensure that results are not consumed by users or other applications e.g. disk writes, network writes, shared memory, screen refresh, OS interrupts Log hashing has good performance if verification is infrequent  But what if it’s not? How many applications require frequent verification?

60 Research Exam - Matt DeVuyst 60 Conclusions – Keys Execution privacy and integrity require keys Keys must be protected, even if OS is compromised or physical attack How should keys be protected?  Are Physically Random Functions really resistant to physical attack? How should device public keys be used?  Should the manufacturer publish them?  How should revocation work?  What happens if ownership of the device is transferred?

61 Architectures for Secure Processing Matt DeVuyst

62 Research Exam - Matt DeVuyst 62 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash 1. The parent is loaded and verified against grandparent. If dirty data is evicted and parent hash is not in the cache … 2. Then the parent is updated


Download ppt "Architectures for Secure Processing Matt DeVuyst."

Similar presentations


Ads by Google