Presentation is loading. Please wait.

Presentation is loading. Please wait.

A 3D Data Transformation Processor Dimitrios Megas, Kleber Pizolato, Timothy Levin, and Ted Huffmire WESS 2012 October 11, 2012.

Similar presentations


Presentation on theme: "A 3D Data Transformation Processor Dimitrios Megas, Kleber Pizolato, Timothy Levin, and Ted Huffmire WESS 2012 October 11, 2012."— Presentation transcript:

1 A 3D Data Transformation Processor Dimitrios Megas, Kleber Pizolato, Timothy Levin, and Ted Huffmire WESS 2012 October 11, 2012

2 Disclaimer The views presented in this talk are those of the speaker and do not necessarily reflect the views of the United States Department of Defense or the National Science Foundation.

3 Split Manufacturing Face-to-Back (F2B) Bonding

4 Basic Idea Combine using 3D integration: – Processor – Compression coprocessor – Cryptographic coprocessor

5 Basic Idea CPU Layer + Coprocessor Layer

6 Basic Idea Real-time trace collection – Compress trace prior to transmission to off-chip storage for offline program analysis Optional encryption step can protect the compressed data from interception – High-performance stand-alone encryption service – XTRec: Secure Real-time Execution Trace Recording on Commodity Platforms (CMU) – Trusted computing: mitigate glitch attack against TPM (runtime hash of memory, capture sequence of instructions executed)

7 Basic Idea Real-time trace collection – The amount of data collected depends on the granularity of the collection and the speed of the system – Monitoring and collecting more signals results in a larger data stream

8 Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

9 Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

10 Cryptographic Coprocessing 3D vs. 2D

11 Medical Image Processing [Cong 2011]

12 3D-MAPS V1 vs V2 Georgia Tech [Kim et al., ISSCC 2012] 3D-MAPS V13D-MAPS V2 # of tiers2 (1 logic, 1 SRAM)5 (2 logic, 3 DRAM) # of cores64128 Memory capacity256KB SRAM256MB DRAM & 512KB SRAM Logic footprint5mm X 5mm10mm X 10mm DRAM footprint-20mm X 12mm Bonding styleF2FF2F and F2B TSV/F2F usage~ 50K / ~50K~ 150K / ~185K Memory access*2048 bit/cycle SRAM1024 bit/cycle DRAM freq / power277MHz / 4.0W175MHz / 10.4W * Wide-I/O allows 512 bit/cycle DRAM access

13 Stack Up Comparison TSV usage – 3D-MAPS V1: For I/O (204 redundancy) – 3D-MAPS V2: For I/O (204 redundancy) and DRAM access (9 redundancy)

14 What is 3Dsec? Economics of High Assurance –High NRE Cost, Low Volume –Gap between DoD and Commercial Disentangle security from the COTS –Use a separate chip for security –Use 3-D Integration to combine: Control Plane Computation Plane –Need to add posts to the COTS chip design Dual use of computation plane

15 Pro’s and Con’s Why not use a co-processor? On-chip? Pro’s –High bandwidth and low latency –Controlled lineage –Direct access to internal structures Con’s –Thermal and cooling –Design and testing –Manufacturing yield

16 Cost Cost of fabricating systems with 3-D –Fabricating and testing the security layer –Bonding it to the host layer –Fabricating the vias –Testing the joined unit

17 Circuit-Level Modifications Passive vs. Active Monitoring –Tapping –Re-routing –Overriding –Disabling

18 3-D Application Classes Enhancement of native functions Secure alternate service Isolation and protection Passive monitoring –Information flow tracking –Runtime correctness checks –Runtime security auditing

19 Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

20 Design Goals High Performance Ability to gather and compress architectural state of a processor at runtime

21 Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

22 Design Choices Manufacturing process – Face-to-face (F2F) Compression algorithm/hw – Two stages: filtering + general-purpose Crypto algorithm/hw – AES-128, SHA-1, SHA-512 Interface between planes – 128 F2F vias up, 32 down (direct connection)

23 Design Choices Other Issues – Coordination between planes Control words in special registers – Interface within control plane Output of compression  input of crypto – Delivery of I/O and power Use existing capability of computation plane – Computation plane hardware High-performance general-purpose processor – Clock synchronization Tree network

24 Compression Study Use TCgen to compress a set of trace files generated using Pin – Traces capture memory access behavior of various Linux applications Vary parameters of TCgen for each field – TCgen is prediction-based compression – Which algorithm is most effective? Apply general-purpose compression in second stage (gzip)

25 Trace Files (generated by Pin) Instruction CountPC ADDRESSSize 8 0x52d70b0x5913c0004 250x543cc60xbff102544 250x543cc70xbff102584 330x52d6bb0xbff1025c4 330x52d6be0xbff102604 330x52d6c20xbff102644 330x52d6c80xbff102684 330x52d6c90xbff1026c4 370x9bcb440xa1a508004 400x6eb1260xbff102684

26 PC Field Number of correct predictions (%) for each configuration of TCgen when compressing the PC field (average of all 5 trace files)

27 Data Address Field Number of correct predictions (%) for each configuration of TCgen when compressing address field (average of all 5 trace files)

28 PC Field Compression ratio for the PC field

29 Data Address Field Compression ratio for the data address field

30 Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

31 Computation Plane CPU

32 Control Plane Compression coprocessor (DFCM + gzip)

33 Control Plane gzip unit (within compression coprocessor)

34 Control Plane AES/SHA

35 Control Plane Microprocessor interface unit

36 Full 3D System 3D IC

37 Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

38 Conclusions Applications: trusted computing, reverse engineering of malicious software, post-mortem analysis of system that has suffered an attack Simple preprocessing can decrease bandwidth (also gives power advantages) There is much to do before making silicon. It is useful to quantify the high-level tradeoffs: – Data to compress – Sampling rate – Number of TSVs – Throughput

39 Future Work Independent I/O and power delivery – How to share the I/O of computation plane? Floor Planning – How much logic/memory can you fit between the TSVs? It would be helpful for the 3D chip to be pin- compatible with the 2D package. – Use a network/share the TSVs? Joining dissimilar technology nodes – Use buffers, redundant hardware

40 Future Work More types of trace files – General-purpose interface, migration path – Can you test/verify computation plane without knowing what the control plane will be? – Characteristics of a “typical” trace file? Hierarchy of compression, for power not just for compression ratio? – Lossy compression?! Trust issues – Who generates the write signal? – How to protect the key? – Can monitored software turn off monitoring? Hardware implementation – Simulation – FPGA prototype – Tape-out

41 Split Manufacturing Discussion Points – Can we trust the result of split manufacturing? – Could this approach harm security? – Is it worth it? When is it worth it? – Why not use trusted foundry always? – Are trusted foundries a band aid solution to offshoring trend? – How to trust trusted foundry? – Why not use redundancy with majority vote? – Can we do everything from scratch?

42 Split Manufacturing Discussion Points – How to raise alarm if network interface is controlled by adversary? Use challenge-response protocols? – Security architecture Packaging considerations Distributed posts, policy state? If computation plane can perform AES, why perform AES in control plane?

43 Questions? faculty.nps.edu/tdhuffmi


Download ppt "A 3D Data Transformation Processor Dimitrios Megas, Kleber Pizolato, Timothy Levin, and Ted Huffmire WESS 2012 October 11, 2012."

Similar presentations


Ads by Google