Zhelong Pan [1] This presentation as.pptx: (or scan QR code) The paper: [1]

Slides:



Advertisements
Similar presentations
AI Pathfinding Representing the Search Space
Advertisements

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Efficiency of Algorithms Csci 107 Lecture 6-7. Topics –Data cleanup algorithms Copy-over, shuffle-left, converging pointers –Efficiency of data cleanup.
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
A Beautiful Game John C. Sparks AFRL/WS (937) Wright-Patterson Educational Outreach The Air Force Research Laboratory.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Optimizing Windows There are several ways to optimize (perform regular maintenance) Windows to keep it performing smoothly and quickly. Most of these discussed.
Windows XP Basics OVERVIEW Next.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CMPS1371 Introduction to Computing for Engineers SORTING.
1 Consensus Definition Let w, x, y, z be cubes, and a be a variable such that w = ax and y = a’z w = ax and y = a’z Then the cube xz is called the consensus.
November 10, 2009Introduction to Cognitive Science Lecture 17: Game-Playing Algorithms 1 Decision Trees Many classes of problems can be formalized as search.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Optimizing General Compiler Optimization M. Haneda, P.M.W. Knijnenburg, and H.A.G. Wijshoff.
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count- Directed Clustering Aiman El-Maleh and Saqib Khurshid King Fahd University.
Register Allocation (via graph coloring)
A Practical Method For Quickly Evaluating Program Optimizations Grigori Fursin, Albert Cohen, Michael O’Boyle and Olivier Temam ALCHEMY Group, INRIA Futurs.
Parallelizing Compilers Presented by Yiwei Zhang.
Welcome to Turnitin.com’s Peer Review! This introductory tour will take you through our Peer Review system and explain the steps you need to get started.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
© 2006 Fraunhofer CESE1 MC/DC in a nutshell Christopher Ackermann.
30-Jun-15 Profiling. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the experts say about.
Compiler Optimization-Space Exploration Adrian Pop IDA/PELAB Authors Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David.
C How to Program, 6/e Summary © by Pearson Education, Inc. All Rights Reserved.
1 Performance Measurement CSE, POSTECH 2 2 Program Performance Recall that the program performance is the amount of computer memory and time needed to.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Two-Level Minimization II.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Copyright R. Weber Search in Problem Solving Search in Problem Solving INFO 629 Dr. R. Weber.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
Scott Perryman Jordan Williams.  NP-completeness is a class of unsolved decision problems in Computer Science.  A decision problem is a YES or NO answer.
Applications of discrete mathematics: Formal Languages (computer languages) Compiler Design Data Structures Computability Automata Theory Algorithm Design.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Gary MarsdenSlide 1University of Cape Town Principles of programming language design Gary Marsden Semester 2 – 2001.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Microsoft ® Office SharePoint ® Server 2007 Training SharePoint document libraries II: All about checkout Bellwood-Antis School District presents:
Slide 1 of 27 Assignment Problem: Hungarian Algorithm and Linear Programming collected from the Internet and edited by Longin Jan Latecki.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Dijkstra’s Algorithm. Announcements Assignment #2 Due Tonight Exams Graded Assignment #3 Posted.
Scientific Writing Abstract Writing. Why ? Most important part of the paper Number of Readers ! Make people read your work. Sell your work. Make your.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.
ALG0183 Algorithms & Data Structures Lecture 4 Experimental Algorithmics 8/25/20091 ALG0183 Algorithms & Data Structures by Dr Andy Brooks Case study article:
1 Programming Languages B.J. Maclennan 4. Syntax and Elagance: Algol-60.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Review: Tree search Initialize the frontier using the starting state While the frontier is not empty – Choose a frontier node to expand according to search.
1 Features as Constraints Rafael AccorsiUniv. Freiburg Carlos ArecesUniv. Amsterdam Wiet BoumaKPN Research Maarten de RijkeUniv. Amsterdam.
Genetic Algorithms CSCI-2300 Introduction to Algorithms
CS4432: Database Systems II Query Processing- Part 2.
Optimization Problems
Searching for Solutions
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.
Parallel Programming in Chess Simulations Part 2 Tyler Patton.
GENETIC ALGORITHM By Siti Rohajawati. Definition Genetic algorithms are sets of computational procedures that conceptually follow steps inspired by the.
CS161 – Design and Architecture of Computer Systems
Measuring Where CPU Time Goes
EML Engineering Design Systems II (Senior Design Project)
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Principles of Computing – UFCFA3-30-1
EE5900 Advanced Embedded System For Smart Infrastructure
Computer Hardware Optimization
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Presentation transcript:

Zhelong Pan [1] This presentation as.pptx: (or scan QR code) The paper: [1] [2] Rudolf Eigenmann [2] 1

As.pptx: 2

« This is a cite from the paper. Note the dedicated quotation marks. » Any references are listed here. The paper: 3

4

5 Choose optimization options from above to maximize program performance. Good luck. The table is taken from page 5 of the original paper.

« Given a set of compiler optimization options {F 1, F 2,..., F n }, find the combination that minimizes the program execution time. Do this efficiently, without the use of a priori knowledge of the optimizations and their interactions. » 6

7

« We present […] Combined Elimination (CE), which aims at picking the best set of compiler optimizations for a program. […] this algorithm takes the shortest tuning time, while achieving comparable or better performance than other algorithms. » 8

9

 Exhaustive Search (ES)*  Batch Elimination (BE)  Iterative Elimination (IE)  Combined Elimination (CE)  Optimization Space Exploration (OSE)  Statistical Selection (SS)* * Not covered in detail 10

« 1.Get all 2 n combinations of n options F 1, F 2,..., F n. 2.Measure application execution time of the optimized version compiled under every possible combination. 3.The best version is the one with the least execution time. » « For 38 optimizations: It would take up to 2 38 program runs – a million years for a program that runs in two minutes. » COMPLEXITY: O(2 n ) 11

* Not to be confused with Rest In Peace RIP(F i ) > 0%: F i is actually useful RIP(F i ) < 0%: F i causes the program execution time to increase Not applying F i increases the runtime by RIP(F i ) → RIP(F i ) = 100% means the program runs twice as long without F i A measure for the usefulness of an optimization. 12

* Not to be confused with Rest In Peace = A measure for the usefulness of an optimization. 13 B:The baseline; a configuration of optimization options F i :An optimization option T B :Execution time when compiled under B T(F i =0):Execution time when compiled under B but with F i off

14 Baseline B:F 1 = 1, F 2 = 1, F 3 = 1 T B :80ms T(F 1 = 0):100ms (F 1 = 0, F 2 = 1, F 3 = 1)

« 1.Compile the application under the baseline B = {F 1 = 1, F 2 = 1,..., F n = 1}. Execute the generated code version to get the baseline execution time T B. 2.For each optimization F i, switch it off from B and compile the application. Execute the generated version to get T(F i = 0), and compute the RIP B (F i = 0). 3.Disable all optimizations with negative RIPs to generate the final, tuned version. » Would be good if the optimizations did not affect each other. COMPLEXITY: O(n) 15

16 Would be good if the optimizations did not affect each other. COMPLEXITY: O(n) F 1, F 2,..., F n Compile w/ all-on Execute For each F i Compile with all-on except F i ExecuteT(F i = 0) TBTB RIP B (F i = 0) Yes: Don’t use F i No: Use F i RIP B (F i = 0) < 0?

CombinationF1F1 F2F2 RuntimeRIP B 1OFF 320 ms60% 2ONOFF160 ms-20% 3OFFON180 ms-10% 4ON 200 ms(0%) TBTB 17

COMPLEXITY: O(n 2 ) « [...] IE achieves better program performance than BE, since it considers the interaction of optimizations. However, when the interactions have only small effects, BE may perform close to IE in a faster way. » 1.Initialize S = {F 1, F 2,..., F n } and B = {F 1 = 1, F 2 = 1,..., F n = 1} 2.Determine the baseline T B : Compile the program with the options in B and measure its runtime. 3.For each optimization F i in S, compute RIP B (F i ) by compiling the program with the options in B, except F i which is turned off, and measuring its runtime. 4.Find the optimization F j with the most negative RIP b, remove it from S and set F j = 0 in B (The baseline changes!) 5.Repeat until all remaining optimizations have a positive RIP b. B now contains the "optimal" options. 18

19 F 1, F 2,..., F n Compile w/ B Execute Compile under B, but F i = 0 Execute T(F i = 0) TBTB RIP B (F i = 0) No: Result in B Exists F k : RIP B (F k = 0) < 0? S = {F 1, F 2,..., F n } B = {F 1 = 1,..., F n = 1} B.F k = 0 S = S \ {F k } Yes: Find F k with minimal RIP B For each F i in S T B = T(F k = 0) COMPLEXITY: O(n 2 ) « [...] IE achieves better program performance than BE, since it considers the interaction of optimizations. However, when the interactions have only small effects, BE may perform close to IE in a faster way. »

CombinationF1F1 F2F2 RuntimeRIP B 1OFF 320 ms60% 2ONOFF160 ms-20% 3OFFON180 ms-10% 4ON 200 ms(0%) CombinationF1F1 F2F2 RuntimeRIP B 1OFF 320 ms100% 2ONOFF160 ms(0%) 3OFFON180 ms 4ON 200 ms TBTB TBTB 20

COMPLEXITY: O(n 2 ) « CE takes the advantages of both BE and IE. When the optimizations interact weakly, CE eliminates the optimizations with negative effects in one iteration, just like BE. Otherwise, CE eliminates them iteratively, like IE. » 1.Initialize S = {F 1, F 2,..., F n } and B = {F 1 = 1, F 2 = 1,..., F n = 1} 2.Determine the baseline T B : Compile the program with the options in B and measure its runtime. 3.For each optimization F i in S, compute RIP B (F i ) by compiling the program with the options in B, except F i which is turned off, and measuring its runtime. 4.Find the optimization F j with the most negative RIP b, remove it from S and set F j = 0 in B (The baseline changes!) 5.For all remaining F k with negative RIP b from step 4, recompute the RIP B (F k ) relative to the changed B. If still negative, remove F k from S and set it to 0 in B. 6.Repeat until all remaining optimizations have a positive RIP b. B now contains the "optimal" options. 21

22 F 1, F 2,..., F n Compile w/ B Execute Compile under B, but F i = 0 Execute T(F i = 0) TBTB RIP B (F i = 0) No: Result in B Exists F k : RIP B (F k = 0) < 0? S = {F 1, F 2,..., F n } B = {F 1 = 1,..., F n = 1} B.F k = 0 S = S \ {F k } Yes: Find F k with minimal RIP B For each F i in S T B = T(F k = 0) CE For all remaining F j with negative RIP B, check if the RIP B is still negative under the changed B. If so, remove F j directly. COMPLEXITY: O(n 2 ) « CE takes the advantages of both BE and IE. When the optimizations interact weakly, CE eliminates the optimizations with negative effects in one iteration, just like BE. Otherwise, CE eliminates them iteratively, like IE. »

1.Construct a set Ω which consists of a default optimization combination (Here: All on), and n combinations that each switch a single optimization off. 2.Measure the execution time under each combination in Ω. Keep only the m fastest combinations in Ω. 3.Construct a new Ω set consisting of all unions of two optimization combinations in the old Ω set. 4.Repeat 2 and 3 until no new combinations can be generated or the performance gain becomes insignificant. 5.The fastest version in the final Ω is the result. COMPLEXITY: O(nm 2 ) ~ O(n 3 ) 23 Idea from S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. I. August. Compiler optimization-space exploration. In Proceedings of the international symposium on Code generation and optimization, pages 204–215, 2003.

F1F1 F2F2...FnFn Combination Combination Combination Combination k0010 COMPLEXITY: O(n 2 ) You wouldn’t appreciate an in-depth explanation. 24 Shown in R. P. J. Pinkers, P. M. W. Knijnenburg, M. Haneda, and H. A. G. Wijshoff. Statistical selection of compiler options. In The IEEE Computer Societys 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’ 04), pages 494–501, Volendam, The Netherlands, October 2004.

Turtle: Rabbit: 25

26

Pentium 4 SPARC II CPU2000 Pentium IV: SPARC II: SPEC Logo: GCC Logo: 27 Ver

Reference Set Training Set Executable icon: All other illustrations except GCC logo are from Office.com. 28 #include

 Compression (2x)  Game Playing: Chess  Group Theory, Interpreter  C Programming Language Compiler  Combinatorial Optimization  Word Processing  PERL Programming Language  Place and Route Simulator  Object-oriented Database  FPGA Circuit Placement and Routing 29

30

31

32

CE: 2.96h OSE: 4.51h SS: 11.96h Effective average tuning time on 2.8 GHz (To scale) 33

#include for(i = 0; i < 10; ++i) { //... } if(!over) { //... } while(true) { printf("%d", ++j); if(j > 2 * i) break; } iOS-style on/off switch: 34