Reconfigurable Hardware Security Ryan Kastner Department of Electrical and Computer Engineering University of California, Santa Barbara CISR Lecture Naval Postgraduate School August 2005
Outline Reconfigurable Hardware Security Issues Brief History of Reconfigurable Computing Benefits FPGA Architectures Configuration/Programming Security Issues General Hardware Security FPGA Attacks FPGA Virus FGPA Security
Reconfigurable Hardware Main Entry: re- Function: prefix 1 : again : anew <retell> 2 : back : backward <recall> Main Entry: con·fig·ure Pronunciation: k&n-'fi-gy&r Function: transitive verb : to set up for operation especially in a particular way CLB Block RAM IP Core (Multiplier) KEY ADVANTAGE: Performance of Hardware, Flexibility of Software
Origins of Reconfigurable Computing Fixed-Plus-Variable (F+V) Structure Computer Standard processor augmented with inventory of reconfigurable building blocks “… to permit computations which are beyond the capabilities of present systems by providing an inventory of high speed substructures and rules for interconnecting them such that the entire system may be temporarily distorted into a problem oriented special purpose computer.” - G. Estrin
History of Reconfigurable Computing PLA Early Years – PLA, PAL One time programmable Programmed by blowing fuses Complex Logic Devices FPGA, CPLD Reprogrammable (SRAM) Initially used for glue logic, ASIC prototyping “Moore” transistors = more complex devices
Modern Reconfigurable Devices Reconfigurable devices are extremely complicated, multiprocessing computing systems Mix of hardware and software components Microprocessors – RISC, DSP, network, … Logic level (FPGA) Reconfigurable logic Specs for Xilinx Virtex II 3K to 125K logic cells, Four PowerPC processor cores Complex memory hierarchy - 1,738 KB block RAM, external memory, local memory in CLBs Possibility of soft core processors – DSP Custom hardware - embedded multipliers, fast carry chain logic, etc. Can implement complex applications
Traditional Choice: Hardware vs. Software Properties of Hardware Fast: High performance Spatial execution Adaptable parallelism Compact: Silicon area efficient Operations tailored to application Simple control Direct wire connections between operations Source: DeHon/Wawrzynek Hardware Inflexible: Fixed at Fabrication
Traditional Choice: Hardware vs. Software Properties of Software Slow: Sequential execution Overhead of interpreting instructions Area inefficient: Fixed width, general operators Area overhead Instruction cache Control circuitry Souce: DeHon/Wawrzynek Software Flexible: Fixed at Runtime
Reconfigurable Hardware Fast: Spatial parallelism (like hardware) Application specific operators, control circuitry Flexible: Operators and interconnect programmable (like software) Source: DeHon/Wawrzynek
Reconfigurable Hardware Fast and flexible, but… Area overhead – switches, configuration bits Delay overhead – switches, logic Compilation Difficult - architecture “too” flexible Slow - based on hardware synthesis techniques Source: DeHon/Wawrzynek
Performance Benefits Up to 100x performance increase compared to processors (microprocessor, DSP) ~10x performance density advantage over processors Applications like: Pattern matching Data encryption Data compression Digital communications Video and image processing Boolean satisfiability Networking Cryptography
Classification of Reconfigurable Architectures Control Control ADD Register FU FU Memory Register Bank MUL Register Logic Level Instruction Level Function Level Programming Unit Bit Byte Operands Basic Unit of Computation Boolean Operation (and, or, xor) Arithmetic Operation Functional Operation Communication Wires, Flip Flops Bundles of Wires, Registers Bus, Memory Example Devices FPGA, CPLD PRISC, Chimaera, Garp NAPA, RAW, RaPiD Flexibility Performance Power/Energy Consumption Design Complexity
Processor vs FPGA Souce: DeHon
FPGA CLB Switchbox IOB Configuration Bit Routing Channel Channel
FPGA
Programmable Logic Each logic element outputs one data bit Tracks LE Each logic element outputs one data bit Interconnect programmable between elements Interconnect tracks grouped into channels
Lookup Table (LUT) 2-LUT Program configuration bits for required functionality Computes “any” 2-input function In Out 00 0 01 0 10 0 11 1 A C=A B B Configuration Bit 0 Configuration Bit 1 C Configuration Bit 2 Configuration Bit 3 A B
Lookup Table (LUT) K-LUT -- K input lookup table Any function of K inputs by programming table Load bits into table 2N bits to describe functions => different functions
Lookup Table (LUT) K-LUT (typical k=4) w/ optional output Flip-Flop
Lookup Table (LUT) Single configuration bit for each: LUT bit Interconnect point/option Flip-flop select
Configurable Logic Block (CLB)
Programmable Interconnect Interconnect architecture Fast local interconnect Horizontal and vertical lines of various lengths C L B Switch Matrix Switch Matrix
Switchbox Operation Before Programming After Programming 6 pass transistors per switchbox interconnect point Pass transistors act as programmable switches Pass transistor gates are driven by configuration memory cells
Programmable Interconnect
Programmable Interconnect 25
Embedded Functional Units Fixed, fast multipliers MAC, Shifters, counters Hard/soft processor cores PowerPC Nios Microblaze Memory Block RAM Various sizes and distributions
Embedded RAM Xilinx – Block SelectRAM Altera – TriMatrix Dual-Port RAM 18Kb dual-port RAM arranged in columns Altera – TriMatrix Dual-Port RAM M512 – 512 x 1 M4K – 4096 x 1 M-RAM – 64K x 8
Xilinx Virtex-II Pro 1 to 4 PowerPCs 4 to 16 multi-gigabit transceivers 12 to 216 multipliers 3,000 to 50,000 logic cells 200k to 4M bits RAM 204 to 852 I/Os Up to 16 serial transceivers 622 Mbps to 3.125 Gbps PowerPCs Logic cells
Altera Stratix
Programming the FPGA Configure logic – CLBs, LUTs, FFs Configure interconnect – channel, switchbox Large number of bits – 10s MB Configuration bit technology Antifuse (program once) SRAM Floating Gate (Flash)
Antifuses Opposite of fuse (open until blown) Make a connection with electrical signal The current melts a thin insulating layer to form a thin permanent and resistive link. More reliable than breaking a connection (avoids shrapnel) Permanently programmed (Non-volatile) Metal–metal antifuse. (b) A metal–metal antifuse in a three-level metal process that uses contact plugs. The conductive link usually forms at the corner of the via where the electric field is highest during programming.
Antifuses Source: Actel
SRAM SRAM Volatile 6 CMOS transistors Standard fabrication process Mode (Read/Write) Input Data SRAM Configuration Bit Output Volatile 6 CMOS transistors Standard fabrication process Configurable interconnect using a pass gate controlled by single SRAM configuration bit. If the SRAM bit is set, then A is electrically connected to B. Otherwise, A and B are not connected.
Floating Gate (EPROM/EEPROM/Flash) By applying proper programming voltages, electrons “jump” onto floating gate (a) With a high (>12V) programming voltage, VPP, applied to the drain, electrons gain enough energy to “jump” onto the floating gate (gate1) (b) Electrons stuck on gate1 raise the threshold voltage so that the transistor is always off for normal operating voltages (c) UV light provides enough energy for the electrons stuck on gate1 to “jump” back to the bulk, allowing the transistor to operate normally Electrons on gate raise threshold voltage so transistor always off Flash switch is comprised of two transistors which share a gate
SRAM Versus Flash Switch & Memory Size SRAM based PLD FLASH based PLD Switch & Routing Memory Cell Switch & Routing Memory Cell
Programming Technology SRAM Antifuse Flash Speed Medium Fast Slow Power Poor Good Relative Size 1 1/10 1/7 Reprogram Yes No Size Large Small Moderate Volatile ?? Security
Hardware Security Issues Overbuilding – buying standard parts on open market and making extra illegal products Cloning – Copying code and duplicating the IP Reverse Engineering – Reconstructing schematic or netlist to understand and possibly improve and/or disguise the design Denial of Service – Reprogramming a critical part of the system rendering the entire system inoperable Over-building is by far the most common security attack that takes place. The increasing reliance on standard products during embedded systems design makes it easy for unscrupulous manufacturers to over build. In addition to the inventory required to make systems as contracted by the embedded systems design house, it is common for additional unauthorized systems to be built with inventory of standard parts (e.g. FPGAs, Microprocessors and Memory) bought on the open market. The unauthorized over built inventory, which is identical to the genuine product, is sold for profit with none of the support and design overhead associated with developing the product. While over-building is a more prevalent and an opportunistic crime, both cloning and reverse engineering are serious problems. In cloning, a competitor makes a copy of a design by stealing part or all of the system's Intellectual Property (IP). This is accomplished by copying the FPGA and/or microcontroller boot code. The single point of failure or weak link in systems that can easily be cloned are nonvolatile memories used to store configuration data for FPGAs and/or microprocessors. Unless strong security schemes are used, even devices with security 'bits' can easily be copied. There are several companies that specialize in copying CPLDs, Flash MCUs, and similar logic products. In reverse engineering a competitor will not only extract the raw data from a device (as in cloned systems), but will also reverse engineer the programming code to extract explicit details on how the design works. This may be done by disassembling microcontroller codes and reconstructing assembly code from the raw hex file or from photographic analysis of an ASIC layout as each layer is stripped off the metalization. Reverse engineering techniques will give a competitor all of the IP used in a design and allow them to reuse it, improve on it and easily disguise it to thwart possible legal action. While Denial of Service (DoS) is not a theft technique, the consequences can be more damaging and easily preventable with secure embedded solutions. In DOS attacks, use in-system programming techniques to render the logic of an FPGA or the code of a microcontroller inoperable by configuring them to run 'bad code.' Distributed Denial of Service (or DDoS) attacks have crippled several high profile websites over the past several years using software techniques. Any unsecured in system programmable device is a point for DOS attack, whether it is an SRAM FPGA, ISP CPLD or Flash-based microcontroller running in a critical part of a network/telecom system.
Hardware Security Attacks Non-invasive: Monitoring by external means Brute force key generation Manipulating inputs and observing outputs/system response Probing/copying external code Invasive: Decapping and microprobing Focused ion beams Scanning electron microscopes Laser for removal of metalization layers In noninvasive attacks the attacker either simply copies an external nonvolatile memory containing the programming code, or attacks the device to extract the code. Simple (weak) security schemes are easily by-passed using noninvasive techniques. For example, on many microcontroller products it is possible to erase a secured device prior to reprogramming. However, because security was added as an afterthought, often starting the erase and then immediately powering down the device will, in fact, erase the security bit and not the program contents! When the part is powered up again, it is a trivial task to read all the program contents out of the part. Other techniques involve monitoring by external means, brute force key generation, and changing voltages to discover hidden test modes. After decapsulation, several invasive techniques are commonly used to break on chip security schemes. Microprobing of circuitry, focused ion beams, and various failure analysis techniques (localized laser removal of metalization and localized deposition of conducting traces) are common techniques used to try to break or bypass on-chip security schemes. Homogeneous Flash and antifuse FPGAs are very difficult to attack due to the large distributed nature of the security keys and the physical characteristics of the silicon. For Flash and antifuse FPGAs, security is deeply buried in the device and an inherent part of the fabric of the metalization (antifuse) or transistor structures (flash). Microprobing or reverse engineering these structures is extraordinarily difficult. c
Levels of Semiconductor Security Level I – Not secure, easily compromised with low cost tools (high school kid) Level II – Can be broken with time and expensive equipment (commercial enterprise) Level III – Can be broken with lost of resources (gov’t sponsored lab)
FPGA Attacks Black Box Attack Readback Attack Configuration Cloning Try all possible input combinations, observe outputs Becomes infeasible on complex devices Readback Attack Read configuration/state of the FPGA through JTAG or programming interfaces Disable interface, block bitstream readout, embed in secure environment (delete/destroy device if tampered) Configuration Cloning Eavesdrop on configuration data stored in external PROM Encrypt the configuration data, decrypt on-chip (store key on-chip)
FPGA Attacks Physical Attacks Invasive probing to find chip information Attacks are generally quite expensive (Level II/III) Example Attacks Look for physical changes (“burning”) in memory cells Sectioning silicon and SEM analysis Possible Workarounds Rotate/invert data to avoid burning Reprogram cells randomly initially The process for copying a Custom Circuit is similar to the Gate Array Many more layers are however involved and hence more cost An 80386 was reverse engineered in 2 weeks with this method “Chipworks” does this commercially inputting each photograph into a computer and then using software to generate a schematic
FPGA Attacks Side Channel Attacks Observing power consumption, timing, electromagnetic radiation and other unintended information about operation of the FPGA Other applications/cores on FPGA “snoop” Workarounds Vary location of key, encryption/decryption logic Physical separation – hardware guarantees the areas separate Logical separation – software checking for separation Separation Kernel
FPGA Virus Bitstream determines functionality of the circuit Can be exploited to hang or even destroy system Denial-of-service – generates signals from FPGAs to other devices that don’t make sense, hang the devices or even destroy them Maximum supply current Physical destruction Destroy FPGA by creating high currents through intentional logic conflicts Destroy system by creating high currents from FPGA I/Os Maximum supply current at Vdd=5V is 639mA (only need 50 logic conflicts) Bitstream Verification
Antifuse Security All data internal to chip Source: Actel All data internal to chip No optical change is visible in a programmed antifuse Requires invasive attack - sectioning silicon and SEM analysis The larger devices have millions of antifuses Larger devices have 50+ million antifuses Only a small number (< 5%) are typically used Attempts to determine their state physically would take years Actel Axcelerator AX2000 has over 53 million antifuses Antifuse FPGA: Level II devices
SRAM Security Mode (Read/Write) Input Data SRAM Configuration Bit Output SRAM is volatile - configuration data must be loaded each time power is cycled Configuration data stored in non-volatile memory external to the FPGA (PROM, FLASH, etc) External memories can be physically copied or probed Bitstream can be easily capture and duplicated SRAM FPGA: Level I device
SRAM Security Solution: Encrypt bitstream, store key on FPGA How to store the key? Non-volatile: On-chip Flash - requires non-standard manufacturing Fuses – unique hardware signatures, cannot be changed Volatile: Key register - must be maintained with a battery Must be sure to keep the key safe SRAM FPGA: Level II device?
Conclusions Reconfigurable hardware gives benefits of software and hardware Programmable, but hard to program Performance of hardware (spatial execution) Bitstream tells the device what to do – must be protected Hardware security issues – cloning, reverse engineering, overbuilding, denial of service, destroying device Security issues revolve around protecting bitstream Antifuse, flash – easier to protect (Level II+) SRAM – hard to protect (Level I)
ExPRESS Lab ExPRESS - Extensible, Programmable, Reconfigurable Embedded SystemS - http://express.ece.ucsb.edu/ Students PhD Students – Andrew Brown, Wenrui Gong, Anup Hosangadi, Shahnam Mirzaei, Yan Meng, Gang Wang Undergrad Researchers - Brian DeRenzi, Patrick Lai Sponsors: