1 Hardware Support for Isolation Krste Asanovic U.C. Berkeley MURI “DHOSA” Site Visit April 28, 2011
TRANSFORMATION HARDWARE SYSTEM ARCHITECTURES SVA Binary translation and emulation Formal methods Hardware support for isolation Dealing with malicious hardware Cryptographic secure computation Data-centric security Secure browser appliance Secure servers WEB-BASED ARCHITECTURES e.g., Enforce properties on a malicious OS e.g., Prevent data exfiltration e.g., Enable complex distributed systems, with resilience to hostile OS’s
TRANSFORMATION HARDWARE SYSTEM ARCHITECTURES SVA Binary translation and emulation Formal methods Hardware support for isolation Dealing with malicious hardware Cryptographic secure computation Data-centric security Secure browser appliance Secure servers WEB-BASED ARCHITECTURES e.g., Enforce properties on a malicious OS e.g., Prevent data exfiltration e.g., Enable complex distributed systems, with resilience to hostile OS’s
Target Scenario Trusted Hardware Trusted Hypervisor Valuable Data Normal Execution Environment Untrusted OS Noncritical App Secure Execution Environment Approved information flow Desirable App Untrusted OS Undesirable information leak
Hardware Isolation Techniques Fine-grain Memory Protection Dynamic Information Flow Tracking Secure Messaging Timing Isolation
Hardware Isolation Techniques Fine-grain Memory Protection Dynamic Information Flow Tracking Secure Messaging Timing Isolation
Modern Multicore Systems Many shared resources: Last-level Cache Interconnect Last-Level Cache Capacity DRAM & I/O Interconnects CPU L1 L2Bank DRAM CPU L1 L2Bank DRAM CPU L1 L2Bank DRAM CPU L1 L2Bank DRAM CPU L1 L2Bank DRAM L2 Interconnect DRAM & I/O Interconnect All shared hardware resources can be used to build high-bandwidth timing-based covert channels
Timing-Based Covert Channel using shared interconnect Time Unit message time on interconnect Sending core modulates traffic on shared interconnect (e.g., writes to given memory location over bus) Covert “1”Covert “0” Writes by sending core Receiving core attempts to saturate bus with requests and observes how much bandwidth is available Time Writes by receiving core
Multicore & Timing-Based Attacks Concurrent execution on different cores and high-performance on-chip interconnect allows higher-bandwidth covert channels Ability to quickly train attacker using timing gathered when running on a subset of machine E.g., calibrate using two unsecured cores, before using between secured and unsecured cores.
Hardware Partitioning for Timing Isolation Partition can contain own: Cores L1 and L2 $ capacity Off-chip DRAM bandwidth On-chip interconnect bandwidth allocation CPU L1 L2Bank DRAM CPU L1 L2Bank DRAM CPU L1 L2Bank DRAM CPU L1 L2Bank DRAM CPU L1 L2Bank DRAM L2 Interconnect DRAM & I/O Interconnect Partition 2 Partition 1 How to isolate while retaining high efficiency?
Interconnect Partitioning Off-chip DRAM bandwidth and on-chip interconnect bandwidth are among most expensive resources in system. Static partitioning would require dedicated, and hence under-utilized, interconnect. Multiplexing interconnect among multiple requesters increases system efficiency, but enables timing attacks.
Secure Interconnect Multiplexing: Time-Division Multiplexing Statically allocate bus time slots between different cores. Time Insecure traffic allocation (1/3) Secure traffic allocation (2/3) Repeating fixed allocation within frame
Secure Interconnect Multiplexing: Time-Division Multiplexing TimeInsecure traffic allocation (1/3) Secure traffic allocation (2/3) If one core cannot use slot, it is left idle even if other core has traffic to send. Cores cannot see each other’s traffic level. Secure, but wasteful. Idle slots
Secure Interconnect Multiplexing: One-Sided Bandwidth Recycling Allow secure traffic to use unclaimed insecure bus slots, but not vice versa. Insecure app cannot view timing of secure app. TimeInsecure traffic allocation (1/3) Secure traffic allocation (2/3) Recycled idle slots
Real System Interconnects Multihop interconnection networks Rings, meshes Cache-coherence protocols Single load or store instruction can generate dozens of individual interconnect messages Multiple interconnection networks for memory system Separate physical networks for initial requests, snoop traffic, responses, data payloads
Intel Sandy Bridge
Globally Synchronized Frames for Memory System Extending our earlier work on GSF for point-point networks: Divide time into discrete “frames”. Each core receives allocation of credits each frame time to perform memory operations. Credit is permission to cause worst-case traffic on every network for one memory operation. Reclaim unused credits to boost bandwidth. For secure system, only secure cores reclaim unused bandwidths.
FPGA Emulation of Hardware Concepts RAMP Gold: Initial version models 64 cores of SPARC v8 with shared memory system on $750 board Cost Performance (MIPS) Simulations per day Software Simulator $2, RAMP Gold$2,000 + $
GSFm Status on RAMP Gold Working GSF-style memory bandwidth reservation system on RAMP Gold Working Tessellation OS partition-based operating system can adjust allocations to control bandwidth partitioning Also partitions cores and cache capacity. Ongoing: investigating hardware cost/efficiency loss of asymetric bandwidth recycling.
Adoption path Hardware vendors already considering partitioning support for performance isolation Real-time guarantees (e.g., media playback) Service-level guarantees (e.g., cloud computing) Performance tuning (e.g., repeatable timing) Small tweak could also prevent timing channels
Other Hardware Isolation Mechanisms in Progress Fine-grained memory protection and protection domains Fine-grain dynamic information flow tracking User-level protected message passing Direct protected communication between trusted app components and trusted services
Questions?