Optimization. How to Optimize Code Conventional Wisdom: 1.Don't do it 2.(For experts only) Don't do it yet.

Slides:



Advertisements
Similar presentations
Code Optimization and Performance Chapter 5 CS 105 Tour of the Black Holes of Computing.
Advertisements

OPTIMIZING C CODE FOR THE ARM PROCESSOR Optimizing code takes time and reduces source code readability Usually done for functions that are critical for.
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
HW 2 is out! Due 9/25!. CS 6290 Static Exploitation of ILP.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
1 4/20/06 Exploiting Instruction-Level Parallelism with Software Approaches Original by Prof. David A. Patterson.
1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
1 Advanced Computer Architecture Limits to ILP Lecture 3.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Lecture: Pipelining Basics
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Software Methods to Increase Data Cache Performance Presented by Philip Marshall.
INTEL CONFIDENTIAL Improving Parallel Performance Introduction to Parallel Programming – Part 11.
Internal Documentation Conventions. Kinds of comments javadoc (“doc”) comments describe the user interface: –What the classes, interfaces, fields and.
Compiler Challenges for High Performance Architectures
CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop.
Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.
Friday, September 15, 2006 The three most important factors in selling optimization are location, location, location. - Realtor’s creed.
Reduction in Strength CS 480. Our sample calculation for i := 1 to n for j := 1 to m c [i, j] := 0 for k := 1 to p c[i, j] := c[i, j] + a[i, k] * b[k,
Quicksort.
Optimizing Compilers for Modern Architectures Coarse-Grain Parallelism Chapter 6 of Allen and Kennedy.
Cosc 5/4755 Algorithms Size and efficiency. Why? On modern desktops and laptops, – Memory size gets larger and larger 8 GBs is now common and 16GBs considered.
Performance Improvement
Code-Tuning By Jacob Shattuck. Code size/complexity vs computation resource utilization A classic example: Bubblesort A classic example: Bubblesort const.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Week 5 - Monday.  What did we talk about last time?  Linked list implementations  Stacks  Queues.
1 Code Optimization. 2 Outline Machine-Independent Optimization –Code motion –Memory optimization Suggested reading –5.2 ~ 5.6.
Performance Optimization Getting your programs to run faster CS 691.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Dependence Analysis and Loop Transformations.
Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
Carnegie Mellon Lecture 15 Loop Transformations Chapter Dror E. MaydanCS243: Loop Optimization and Array Analysis1.
University of Amsterdam Computer Systems – optimizing program performance Arnoud Visser 1 Computer Systems Optimizing program performance.
Performance Optimization Getting your programs to run faster.
Optimised C/C++. Overview of DS General code Functions Mathematics.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Supercomputing and Science An Introduction to High Performance Computing Part IV: Dependency Analysis and Stupid Compiler Tricks Henry Neeman, Director.
Loop Optimizations Scheduling. Loop fusion int acc = 0; for (int i = 0; i < n; ++i) { acc += a[i]; a[i] = acc; } for (int i = 0; i < n; ++i) { b[i] +=
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Optimization of C Code The C for Speed
Performance Tuning John Black CS 425 UNR, Fall 2000.
Maths & Technologies for Games Optimisation for Games 1 CO3303 Week 4.
Copyright 2014 – Noah Mendelsohn Code Tuning Noah Mendelsohn Tufts University Web:
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
1 Performance Issues CIS*2450 Advanced Programming Concepts.
Programming for Performance CS 740 Oct. 4, 2000 Topics How architecture impacts your programs How (and how not) to tune your code.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
1 Code Optimization. 2 Outline Machine-Independent Optimization –Code motion –Memory optimization Suggested reading –5.2 ~ 5.6.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
1 ENERGY 211 / CME 211 Lecture 4 September 29, 2008.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Code Optimization Overview and Examples
Code Optimization.
Simone Campanoni Loop transformations Simone Campanoni
Code Optimization I: Machine Independent Optimizations
Optimizing Transformations Hal Perkins Autumn 2011
STUDY AND IMPLEMENTATION
Register Pressure Guided Unroll-and-Jam
Optimizing Transformations Hal Perkins Winter 2008
Compiler Code Optimizations
Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
Optimization.
Introduction to Optimization
Presentation transcript:

Optimization

How to Optimize Code Conventional Wisdom: 1.Don't do it 2.(For experts only) Don't do it yet

Why Not??? You can't optimize something that doesn't work Biggest improvement is picking the right algorithm 90 / 10 rule – 90% of the time is spent in 10% of the code Need to be able to verify speedup Optimized code is usually: – Less readable/maintainable – More complex/brittle

Code Tricks Prefer ints to floats Minimize float->int conversions Use multiplication instead of division Put common branches first in if/else sequences

Fancy Tricks Loop unrolling : – Reducing iterations by increasing work done per step Less Overhead Gives compiler/processor more to rearrange for (i = 1; i <= 30; i+=3) { a[i] = a[i] + b[i] * c; a[i+1] = a[i+1] + b[i+1] * c; a[i+2] = a[i+2] + b[i+2] * c; } for (i = 1; i <= 30; i++) a[i] = a[i] + b[i] * c;

Fancy Tricks Loop fusion: – Combine loops doing unrelated jobs to avoid overhead Will they fight over cache??? for (i = 0; i < N; i++) C[i] = A[i] + B[i]; for (i = 0; i < N; i++) D[i] = E[i] + C[i]; for (i = 0; i < N; i++) C[i] = A[i] + B[i]; D[i] = E[i] + C[i]; }

Fancy Tricks Loop peeling: – Break up loops to avoid branches within them A[1] = 0; for (i = 2; i < N; i++) A[i] = A[i] + 8; A[N] = N; for (i = 1; i <= N; i++) { if (i==1) A[i] = 0; else if (i == N) A[i] = N; else A[i] = A[i] + 8; }

Strength Reductions Avoid costlier operations (divide/multiply) by using cheaper ones (add/subtract/shift) int total = 0; for (i = 0; i < 1000; i++) total += i * 3; int total = 0; for (i = 0; i < 3000; i+=3) total += i;

Compiler Optimization Compilers can – Eliminate unneeded code – Do strength reductions – Unroll loops Compiler likes specific information – Use constants – Make variables as local as possible

Compiler Optimization gcc has multiple optimizations levels – 0 = no optimization – 1 = optimize, but don't spend too much time – 2 = spend lots of time optimizing – 3 = aggressively optimize for speed and allow code size to increase if need be Activate with command line flag like -O2 Oh-2

Samples C++ Code:

Samples O0 : Faithful (but efficient) line by line translation O1 : – Try to do work in registers instead of stack frame – Remove obviously worthless code O2 : – Aggressively do work ahead of time

How To Optimize - Branching Pipelined processors HATE branch mispredications – 15+ cycle penalty

How To Optimize - Branching May be able to rearrange data for consistent branching Some compilers can use runtime profiles to reorder code Come compilers allow "expect" – hints about how to order instructions

How To Optimize - Branching Maybe better off without it:

How To Optimize - Branching Maybe better off without it:

How To Optimize - Cache Cache: – Modern CPU's live/die by cache Like contiguous data Avoid repeated swapping between multiple items

Bad Situations for Cache Data with poor locality – Complex object oriented programming Large 2D arrays traversed in column major order… Row Major Access Col Major Access

How to Optimize Right Rules 1.Start with clean, working code and good tests 2.Test any optimizations 3.If it doesn't help, take it out