Corliss, Lewis, + Roth ISCA-30 DISE: A Programmable Macro Engine for Customizing Applications Marc Corliss, E Lewis, Amir Roth University of Pennsylvania.

Slides:



Advertisements
Similar presentations
Chapt.2 Machine Architecture Impact of languages –Support – faster, more secure Primitive Operations –e.g. nested subroutine calls »Subroutines implemented.
Advertisements

Computer Organization and Architecture
1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
1 ECE462/562 ISA and Datapath Review Ali Akoglu. 2 Instruction Set Architecture A very important abstraction –interface between hardware and low-level.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1 Advanced Computer Architecture Limits to ILP Lecture 3.
ELEN 468 Advanced Logic Design
Processor Overview Features Designed for consumer and wireless products RISC Processor with Harvard Architecture Vector Floating Point coprocessor Branch.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
Memory Management (II)
1 DISE: A Programmable Macro Engine for Customizing Applications Marc Corliss, E Lewis, Amir Roth (U Penn), ISCA 2003 DISE (Dynamic Instruction Steam Editing)
Low Overhead Debugging with DISE Marc L. CorlissE Christopher LewisAmir Roth Department of Computer and Information Science University of Pennsylvania.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Using DISE to Protect Return Addresses from Attack Marc L. Corliss, E Christopher Lewis, Amir Roth University of Pennsylvania.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
ARM Core Architecture. Common ARM Cortex Core In the case of ARM-based microcontrollers a company named ARM Holdings designs the core and licenses it.
Flexicache: Software-based Instruction Caching for Embedded Processors Jason E Miller and Anant Agarwal Raw Group - MIT CSAIL.
Fast Dynamic Binary Translation for the Kernel Piyus Kedia and Sorav Bansal IIT Delhi.
1 Code Compression Motivations Data compression techniques Code compression options and methods Comparison.
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
© 2009, Renesas Technology America, Inc., All Rights Reserved 1 Course Introduction  Purpose:  This course provides an overview of the SH-2 32-bit RISC.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Full and Para Virtualization
Lecture 12 Virtualization Overview 1 Dec. 1, 2015 Prof. Kyu Ho Park “Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, White.
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Chapter Six.
Efficient Software-Based Fault Isolation
Advanced Architectures
Amir Roth and Gurindar S. Sohi University of Wisconsin-Madison
ELEN 468 Advanced Logic Design
5.2 Eleven Advanced Optimizations of Cache Performance
Improving Program Efficiency by Packing Instructions Into Registers
Flow Path Model of Superscalars
Introduction to Pentium Processor
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CMSC 611: Advanced Computer Architecture
Lecture 4: MIPS Instruction Set
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Comparison of Two Processors
Chapter Six.
Chapter Six.
Guest Lecturer TA: Shreyas Chand
Sampoorani, Sivakumar and Joshua
* From AMD 1996 Publication #18522 Revision E
Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Introduction to Virtual Machines
Introduction to Virtual Machines
Virtualization Dr. S. R. Ahmed.
Presentation transcript:

Corliss, Lewis, + Roth ISCA-30 DISE: A Programmable Macro Engine for Customizing Applications Marc Corliss, E Lewis, Amir Roth University of Pennsylvania

Corliss, Lewis, + Roth ISCA-30 Overview Application customization functions (ACFs) E.g., safety-checking, debugging, decompression, optimization … Useful when applied to other programs Why? new platforms  new requirements: power, safety … ACF implementation Binary rewriting (BR): + flexible functionality, – performance Hardware widget (HW): + little performance loss, – ACF specific Dynamic Instruction Stream Editing (DISE) Combo: a hardware widget that does rewriting + Rich functionality, + low cost

Corliss, Lewis, + Roth ISCA-30 DISE Programmable instruction macro-expander Like a hardware SED (DISE = dynamic instruction SED) Executes rewrite rules (productions) on fetch stream Looks for instructions that match patterns… replaces them with parameterized sequences (repseq) Orthogonal to  ops  ops: finer-granularity, same functionality DISE: same granularity, different functionality I$executeDISE app app+ACF

Corliss, Lewis, + Roth ISCA-30 Advantages of This Picture + Generality (wrt HW) Instruction-based ACFs That can modify (not just observe) application + Performance (wrt BR) Avoid costs of inserting ACF code into binary + Convenience (wrt both) Avoid pain of rewriting binary Avoid pain of implementing new widget + Non-subvertability (new) The last thing that can/will manipulate code, guaranteed I$executeDISE app app+ACF

Corliss, Lewis, + Roth ISCA-30 Talk Outline Introduction Basic Capabilities Implementation Formulating ACFs Numbers Related work, Status, Future

Corliss, Lewis, + Roth ISCA-30 What DISE Does Example customization functionality Compare upper store address bits to const, ERROR on mismatch DISE implementation Pattern store *, *(*) // store = match, * = don’t care Repseq srli T.RS,#4,R1 // template cmp R1,R2,R1 // T.* = instantiation directive bne R1, ERROR T.INSN store R4,8(R9) srli R9,#4,R1 cmp R1,R2,R1 bne R1,ERROR store R4,8(R9)

Corliss, Lewis, + Roth ISCA-30 More DISE Capabilities (And Restrictions) Capabilities Dedicated Registers Scratch storage (no need to scavenge) Inter-repseq communication (synthesize global behavior) DISEPC Precise state within repseq’s (repseq’s are atomic) Control-flow within repseq’s DISAWK? – Restriction: peephole transformations only

Corliss, Lewis, + Roth ISCA-30 How DISE Does It Three components (well-understood, resemble  op) I$execute DISE Engine RT Replacement Table (RT): bigger RAM holds repseq specs IL Instantiation Logic (IL): executes directives to produce actual insns PT Pattern Table (PT): small associative table holds pattern specs Logically: DISE then decode native decoder

Corliss, Lewis, + Roth ISCA-30 How DISE Really Does It Physically: interleave DISE/decode implementations I$execute native decoder PTRT IL DISE Engine native decoder PTRT IL RISC RISC: PT + decoded RT in parallel with native decoder simple dec.  ROM RT IL PT  SEL CISC (  OPs) CISC(+  ops): PT +  op RT in same physical structures Decoded/  op RT? PT/RT programmed via controller Abstracts PT/RT formats: external DISE is annotated native Virtualizes PT/RT sizes controller

Corliss, Lewis, + Roth ISCA-30 Talk Outline Introduction Basic Capabilities Implementation Formulating ACFs Numbers Related Work, Status, Future

Corliss, Lewis, + Roth ISCA-30 Formulating ACFs Two kinds: transparent and aware (wrt application) Transparent: unmodified applications work without DISE DISE redefines “naturally-occurring” instructions Examples: fault isolation (just saw), profiling, etc. Aware: modified applications rely on DISE DISE converts “planted” codewords to runtime code Examples: decompression (see soon), dynamic optimization, etc. ACFs can be composed

Corliss, Lewis, + Roth ISCA-30 Transparent ACF: Memory Fault Isolation [Wahbe+95]: multi-module safety in one address space Why? Extensible kernel, no VM… How? Check upper address bits of loads, stores, ijumps Have seen, but to recap… Binary rewriting R: move T.RS, R3 srli T.RS,#4,R1 cmpeq R1,R2,R1 bne R1,ERROR store T.RD,T.IMM(R3) – Scavenge R1,R2,R3 – Retarget surrounding branches – move thwarts malicious jumps DISE R: srli T.RS,#4,DR1 cmpeq DR1,DR2,DR1 bne DR1,ERROR store T.RD,T.IMM(T.RS) + DISE registers DR1,DR2 + No branch retargeting

Corliss, Lewis, + Roth ISCA-30 Aware ACF: Decoder Decompression [Lefurgy+97]: decoder + dictionary  compressed I$ How? Codewords = reserved opcode + dictionary index Uncompressed code add R2,R3,R2 load R4,8(R2)... add R1,R3,R1 load R4,8(R1) Hardware Widget D0: add R2,R3,R2 load R4,8(R2) D1: add R1,R3,R1 load R4,8(R1) Compressed code reserved #0... reserved #1 DISE: RT = dictionary Exploit parameterized replacement Compressed code reserved R2 #0... reserved R1 #0 Uncompressed code add R2,R3,R2 load R4,8(R2)... add R1,R3,R1 load R4,8(R1) + More efficient dictionary usage + Can compress branches R0: add T.RS,R3,T.RS load R4,8(T.RS)

Corliss, Lewis, + Roth ISCA-30 Composing ACFs Consider: applying both fault isolation and decompression Server compresses, client decompresses Fault isolate uncompressed code Who adds fault isolation? Server, before compression – How does server know to do this? Client, after decompression – Client needs two widgets DISE: client composes the two ACFs to form one Apply fault isolation productions to decompression repseq’s + Two ACFs, one widget (DISE)

Corliss, Lewis, + Roth ISCA-30 Talk Outline Introduction Basic Capabilities Implementation Formulating ACFs Numbers Related Work, Status, Future

Corliss, Lewis, + Roth ISCA-30 Experimental Setup Alpha ISA SimpleScalar-based timing simulator 4-way superscalar, 12-stage pipeline, 128 instruction window 32KB primary caches, 1MB L2 DISE configuration 32-entry PT (256 bytes), 512-entry RT (3KB) PT/RT miss handler: flush + 30 cycle stall on SPEC2K integer benchmarks Binary rewriting: perl script on.S files

Corliss, Lewis, + Roth ISCA-30 Fault Isolation: DISE vs. Binary Rewriting Point: DISE performs better than corresponding BR Metric: exec. time norm. to 4-wide, 32KB I$, unmodified Exp2: increase proc. width Dynamic cost disappears Static cost dominates + DISE adds no footprint Exp1: increase I$ size Footprint cost disappears Bandwidth cost dominates + DISE adds fewer instructions + I$ size effectively decreasing gcc twolf parser

Corliss, Lewis, + Roth ISCA-30 Decompression: DISE vs. Hardware Widget Point: DISE more flexible (?) than corresponding HW Metric: code size norm. to uncompressed Also in paper: RT sensitivity, composed ACF performance More about DISE decompression [Corliss+, LCTES03], Fri. 2:30 Exp1: parameterization + Parameterization (DISE) helps + DISE is general-purpose gccparsertwolf

Corliss, Lewis, + Roth ISCA-30 Related Work (not exhaustive) Translation/emulation Hardware: X86  OPs, AMD ROPs. DISE is native-to-native and programmable Software: FX!32, DAISY [Ebcioglu+], Crusoe ACF tools Hardware: PipeRench [Goldstein+], I-COP [Chou+] DISE is before execute Software (offline): ATOM [Eustace+], Etch [Romer+], EEL [Larus+] Software (online): DELI [Desoli+] Most similar in spirit (and name) to DISE

Corliss, Lewis, + Roth ISCA-30 To Summarize DISE: a HW/SW solution for customizing applications Like a hardware SED or AWK + More general than hardware + Faster than software + Has some other nice properties (e.g., non-subvertability) Near future: apps to exploit “other nice properties” Non-subvertability  security applications Painless code modification  runtime optimization Your questions