Programming Systems Group, Computer Science Department 2 University of Erlangen-Nuremberg, Germany www2.cs.fau.de Graph-Based Procedural Abstraction A.

Slides:



Advertisements
Similar presentations
Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University.
Advertisements

Turing Machines January 2003 Part 2:. 2 TM Recap We have seen how an abstract TM can be built to implement any computable algorithm TM has components:
Integration and Visualization of dynamic Sensor Data into 3D Spatial Data Infrastructures in a standardized Way Christian Mayer & Alexander Zipf Research.
CH14 Instruction Level Parallelism and Superscalar Processors
SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan
CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.
Data Mining Classification: Alternative Techniques
Chapter 4: Machine Language
Multilevel Page Tables
Efficient Acquisition and Realistic Rendering of Car Paint Johannes Günther, Tongbo Chen, Michael Goesele, Ingo Wald, and Hans-Peter Seidel MPI Informatik.
Advances in the VAS CF method using better bounds Alkiviadis G. Akritas Department of Computer & Communication Engineering University of Thessaly Volos,
Intro. to Data Structures 1CSCI 3333 Data Structures - Roughly based on Chapter 6.
Slides created by: Professor Ian G. Harris Efficient C Code  Your C program is not exactly what is executed  Machine code is specific to each ucontroller.
Compiler Construction Sohail Aslam Lecture Code Generation  The code generation problem is the task of mapping intermediate code to machine code.
ARM versions ARM architecture has been extended over several versions.
Instruction Level Parallelism
Overheads for Computers as Components 2nd ed.
Chapter 3 โพรเซสเซอร์และการทำงาน The Processing Unit
MIPS Assembly Tutorial
THUMB Instructions: Branching and Data Processing
Code Generation.
Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009.
Lecture 6 Programming the TMS320C6x Family of DSPs.
Compiler-Based Register Name Adjustment for Low-Power Embedded Processors Discussion by Garo Bournoutian.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 2: Data types and addressing modes dr.ir. A.C. Verschueren.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /15/2013 Lecture 11: MIPS-Conditional Instructions Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
05/03/2009CA&O Lecture 8,9,10 By Engr. Umbreen sabir1 Computer Arithmetic Computer Engineering Department.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
1 Today’s lecture  Last lecture we started talking about control flow in MIPS (branches)  Finish up control-flow (branches) in MIPS —if/then —loops —case/switch.
Whole-Program Linear-Constant Analysis with Applications to Link-Time Optimization Ludo Van Put – Dominique Chanet – Koen De Bosschere Ghent University.
Mining Graphs.
Welcome to Systems Software The purpose of this course is to provide background in fundamental types of system software, particularly assemblers, loaders,
Evaluation of branch-prediction methods on traces from commercial applications R.B. Hilgendorf, G. J. Helm, W. Rosenstiel.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Memory Management 2010.
Run time vs. Compile time
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
1 CS Programming Languages Random Access Machines Jeremy R. Johnson.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-7 Memory Management (1) Department of Computer Science and Software.
Memory Management Chapter 7.
Programmer's view on Computer Architecture by Istvan Haller.
October 6, 2004.Software Technology Forum 1 The Renaissance of Compiler Development Com piler optimizations motivated by embedded systems Tibor Gyimóthy.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Executable Unpacking using Dynamic Binary Instrumentation Shubham Bansal (iN3O) Feb 2015 UndoPack 1.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /29/2013 Lecture 13: Compile-Link-Load Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
Computer Science 210 Computer Organization Machine Language Instructions: Control.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Methodology of a Compiler that Compresses Code using Echo Instructions
The University of Adelaide, School of Computer Science
For Example: User level quicksort program Three address code.
Programming Languages (CS 550) Mini Language Compiler
Objective of This Course
Unit IV Code Generation
Computer Science 210 Computer Organization
Programming Languages
Welcome to Systems Software
Computer Science 210 Computer Organization
Lecture 4: Instruction Set Design/Pipelining
Programming Languages (CS 360) Mini Language Compiler
Presentation transcript:

Programming Systems Group, Computer Science Department 2 University of Erlangen-Nuremberg, Germany www2.cs.fau.de Graph-Based Procedural Abstraction A. Dreweke, M. Wörlein, D. Schell, T. Meinl, I. Fischer, M. Philippsen

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany2 embedded systems cost and energy consumption depend on the size of the built-in memory limited amount of memory more and more functionality is packed on embedded systems memory must be used more efficiently procedural abstraction reduces code size by extracting duplicate code segments

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany3 procedural abstraction post link-time optimization of static binaries: +whole program code, including all libraries +function prolog and epilog +constant address calculations -precise control flow must be reconstructed -offset tables -register indirect jumps binary optimized binary postprocessor extraction candidate selection duplicate search preprocessor duplicate search candidate selection

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany4 procedural abstraction (suffix tree) textual matching of instruction sequences frequent instruction sequences are taken from the suffix tree various optimizations: –special treatment for label s, jump s, … –fingerprinting –canonic register mapping –… but fundamental suffix tree matching problem persists

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany5 duplicate search (suffix tree) postprocessor extraction candidate selection duplicate search preprocessor :add r2, r1, 0x :sub r2, r2, r3 2008:add r4, r2, 0x4 200c:load r3, 0x :sub r2, r2, r3 2014:load r3, 0x1071c 2018:add r4, r2, 0x :mul r2, r1, 0x5 2508:sub r2, r2, r3 250c:add r4, r2, 0x4 2510:load r3, 0x :sub r2, r2, r3 2518:load r3, 0x1071c 251c:add r4, r2, 0x :div r3, r2, r1 311c:sub r2, r2, r3 3120:add r4, r2, 0x4 3124:load r3, 0x :sub r2, r2, r3 312c:load r3, 0x1071c 3130:add r4, r2, 0x c:sub r3, r2, 0x :sub r2, r2, r3 4014:load r3, 0x :add r4, r2, 0x4 401c:sub r2, r2, r3 4020:add r4, r2, 0x4 4024:load r3, 0x1071c...

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany6 extraction (suffix tree) :add r2, r1, 0x :call 0x :mul r2, r1, 0x5 2508:call 0x :div r3, r2, r1 311c:call 0x c:sub r3, r2, 0x :sub r2, r2, r3 4014:load r3, 0x :add r4, r2, 0x4 401c:sub r2, r2, r3 4020:add r4, r2, 0x4 4024:load r3, 0x1071c :sub r2, r2, r3 5074:load r3, 0x :add r4, r2, 0x4 507c:sub r2, r2, r3 5080:add r4, r2, 0x4 5084:load r3, 0x1071c 5088:return postprocessor extraction candidate selection duplicate search preprocessor

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany7 candidates selection (iterative greedy) postprocessor extraction candidate selection duplicate search preprocessor = instructions 4 instructions 7 instructions extraction benefit: (L · (N – 1) – (N + 1) > 0 L: code length N: # of occurrences call ret extraction benefit: (7 · (2 – 1) – (2 + 1) = 4 > 0 L: code length N: # of occurrences call ret = call ret extraction benefit: (4 · (2 – 1) – (2 + 1) = 1 > 0 L: code length N: # of occurrences call ret = call ret call ret extraction benefit: (3 · (2 – 1) – (2 + 1) = 0 L: code length N: # of occurrences call ret

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany8 saved instructions (absolute values) really small input binaries: gcc -Os, dietlibc linked MiBench programs on ARM

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany9 saved instructions (relative values) really small input binaries: gcc -Os, dietlibc linked MiBench programs on ARM good savings, still not optimal

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany10 procedural abstraction (graph-based) transform instruction sequences into minimal data flow graphs (DFG) search for frequent subgraphs in DFGs sub r2, r2, r3 add r4, r2, 0x4 load r3, 0x10710 sub r2, r2, r3 load r3, 0x1071c add r4, r2, 0x4 add sub load sub add load add load

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany11 duplicate search (graph-based) postprocessor extraction candidate selection duplicate search preprocessor :add r2, r1, 0x :sub r2, r2, r3 2008:add r4, r2, 0x4 200c:load r3, 0x :sub r2, r2, r3 2014:load r3, 0x1071c 2018:add r4, r2, 0x :mul r2, r1, 0x5 2508:sub r2, r2, r3 250c:add r4, r2, 0x4 2510:load r3, 0x :sub r2, r2, r3 2518:load r3, 0x1071c 251c:add r4, r2, 0x :div r3, r2, r1 311c:sub r2, r2, r3 3120:add r4, r2, 0x4 3124:load r3, 0x :sub r2, r2, r3 312c:load r3, 0x1071c 3130:add r4, r2, 0x c:sub r3, r2, 0x :sub r2, r2, r3 4014:load r3, 0x :add r4, r2, 0x4 401c:sub r2, r2, r3 4020:add r4, r2, 0x4 4024:load r3, 0x1071c...

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany12 extraction (graph-based) :sub r2, r2, r3 5074:load r3, 0x :add r4, r2, 0x4 507c:sub r2, r2, r3 5080:add r4, r2, 0x4 5084:load r3, 0x1071c 5088:return postprocessor extraction candidate selection duplicate search preprocessor :add r2, r1, 0x :call 0x :mul r2, r1, 0x5 2508:call 0x :div r3, r2, r1 311c:call 0x c:sub r3, r2, 0x :call 0x

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany13 postprocessor extraction candidate selection duplicate search preprocessor search lattice * sub add sub add sub load add sub load sub add sub load sub add sub load sub add load add load sub load add sub load sub add loadadd sub load add sub

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany14 pruning necessary because of the size of the search lattice number of occurrences must decrease with growing subgraph size calculate the maximal-independent set (MIS) of subgraphs to make pruning possible again graph miner (procedural abstraction extensions) load sub add #occurrences: 1#occurrences: 2#occurrences: 1 postprocessor extraction candidate selection duplicate search preprocessor

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany15 add sub load sub add load add load graph miner (procedural abstraction extensions) load add load call postprocessor extraction candidate selection duplicate search preprocessor invalid subgraph pruning during candidate selection

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany16 postprocessor extraction candidate selection duplicate search preprocessor candidates selection (optimal) = =16 =15 ret 4 3 call ret call ret call ret greedy iterative collisions: optimum

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany17 procedural abstraction (graph-based) Pro no special treatment of branches and labels resistant to instruction reordering can be used to extract general code fragments, not limited to basic blocks or single-entry single-exit regions Con subgraph-isomorphism test is NP-complete extremely huge search lattice (exponential in time and memory usage)

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany18 saved instructions (absolute values) really small input binaries: gcc -Os, dietlibc linked MiBench programs on ARM

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany19 saved instructions (relative values) really small input binaries: gcc -Os, dietlibc linked MiBench programs on ARM

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany20 optimization time (sec.) 4h 20m really small input binaries: gcc -Os, dietlibc linked MiBench programs on ARM

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany21 future work increase number of identified duplicate candidates –extend search areas from basic blocks to function and whole program –canonic register mapping speedup duplicate search –further parallelize graph search –more procedural abstraction specific pruning rules to limit search lattice

© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany22 summary procedural abstraction with DFGs result in more compact code: –graph-based mining saves up to 2.6 times more instructions than the traditional approaches interesting for embedded systems (huge volumes) –long optimization times affordable because of price per piece –overnight or over the weekend optimization of code during the development process –every saved bit counts

Programming Systems Group, Computer Science Department 2 University of Erlangen-Nuremberg, Germany www2.cs.fau.de Graph-Based Procedural Abstraction A. Dreweke, M. Wörlein, D. Schell, T. Meinl, I. Fischer, M. Philippsen