1 1 Profiling & Optimization David Geldreich (DREAM)

Slides:



Advertisements
Similar presentations
Performance Analysis and Optimization through Run-time Simulation and Statistics Philip J. Mucci University Of Tennessee
Advertisements

When and How to Improve Code Performance? Ivaylo Bratoev Telerik Corporation
Josef Weidendorfer KDE Developer Conference 2004 Ludwigsburg, Germany.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Tools for applications improvement George Bosilca.
Copyright © 2003, SAS Institute Inc. All rights reserved. Where's Waldo Uncovering Hard-to-Find Application Killers Claire Cates SAS Institute, Inc
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
ANDROID OPERATING SYSTEM Guided By,Presented By, Ajay B.N Somashekar B.T Asst Professor MTech 2 nd Sem (CE)Dept of CS & E.
1 Lecture 6 Performance Measurement and Improvement.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
27-Jun-15 Profiling code, Timing Methods. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the.
30-Jun-15 Profiling. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the experts say about.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
CPU PROFILING FIND THE BOTTLENECK. WHAT? WHEN? HOW?
Pleasures and Pitfalls of Profiling Primož Gabrijelčič.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Java Introduction 劉登榮 Deng-Rung Liu 87/7/15. Outline 4 History 4 Why Java? 4 Java Concept 4 Java in Real World 4 Language Overview 4 Java Performance!?
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
Parts of a Computer Why Use Binary Numbers? Source Code - Assembly - Machine Code.
CSCI 224 Introduction to Java Programming. Course Objectives  Learn the Java programming language: Syntax, Idioms Patterns, Styles  Become comfortable.
Natawut NupairojAssembly Language1 Introduction to Assembly Programming.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Lecture 10 : Introduction to Java Virtual Machine
CSE 219 Computer Science III Performance & Optimization.
VirtualBox What you need to know to build a Virtual Machine.
Open Source Performance Monitoring Tools, Tips and Tricks for Java Matt Secoske Consultant - Bass & Associates
Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.
1 ENERGY 211 / CME 211 Lecture 26 November 19, 2008.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 7 OS System Structure.
JBOSS Profiler Performance Profiling. Contents ● Motivation – The problem ● Profiling ● Profiling Tools ● Java and Profiling ● JBoss Profiler ● Example.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
School of Computer Science & Information Technology G6DICP Introduction to Computer Programming Milena Radenkovic.
Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.
Timing Programs and Performance Analysis Tools for Analysing and Optimising advanced Simulations.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
SDD/DFS Jonas M. Larsen VLT 2 nd Generation Instrumentation Pipelines, 19 Apr Jonas M. Larsen Memory debugging Recipe profiling.
1 Announcements  Homework 4 out today  Dec 7 th is the last day you can turn in Lab 4 and HW4, so plan ahead.
Introduction to ECE 454 Computer Systems Programming Topics: Lecture topics and assignments Profiling rudiments Lab schedule and rationale Cristiana Amza.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
Copyright 2014 – Noah Mendelsohn Performance Analysis Tools Noah Mendelsohn Tufts University Web:
What is it and why do we need it? Chris Ward CS147 10/16/2008.
Tuning Threaded Code with Intel® Parallel Amplifier.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Copyright (c) JNode.org g JNode a modern Java operating system JNode.org Ewout Prangsma.
Computer System Structures
Debugging Memory Issues
CS490 Windows Internals Quiz 2 09/27/2013.
Performance Tuning Team Chia-heng Tu June 30, 2009
Characterization of Parallel Scientific Simulations
CSCI1600: Embedded and Real Time Software
CMSC 611: Advanced Computer Architecture
Adaptive Code Unloading for Resource-Constrained JVMs
Introduction CSC 111.
Tools.
Virtual Machines (Introduction to Virtual Machines)
Tools.
CMSC 611: Advanced Computer Architecture
CSCI1600: Embedded and Real Time Software
Presentation transcript:

1 1 Profiling & Optimization David Geldreich (DREAM)

2 2 Profiling & Optimization Introduction Analysis/Design Build Tests Debug Profiling Project management Documentation Versioning system IDE GForge Conclusion

3 3 Outline Profiling Tools Optimization

4 4 Profiling No optimization without profiling Not everyone has a 3Ghz “We should forget about small efficiencies, say about 97% of the time : premature optimization is the root of all evil.” Donald Knuth

5 5 Profiling : When ? To choose among several algorithms for a given problem To check the awaited behavior at runtime To find the parts of the code to be optimized

6 6 Profiling : How ? On fully implemented (and also tested) code On a “release” version (optimized by the compiler, …) On representative data The optimization cycle :

7 7 What do we measure ? Understand what’s going on : OS: scheduling, memory management, hard drives, network Compiler : optimization CPU architecture, chipset, memory Libraries used If an application is limited by its I/O, useless to improve the calculation part. Here we’ll limit ourselves to CPU & memory performance.

8 8 Measurement methods Manual Source instrumentation Statistical measure (sampling) Simulation Hardware counters

9 9 Outline Profiling Tools Optimization

10 Tools ManualInstr.SamplingSimulationHardware Counter ▪ system timersX ▪ gprof / gcc –pgXX ▪ valgrind(callgrind)/kcachegrindX IBM Rational quantifyXX OprofileX Intel VtuneXX PAPI (Performance API)X JVMPI (Java Virtual Machine Profiler) ▪ runhprof, ▪ Eclipse TPTP (Test&Performance) OptimizeIT, JProbe, JMP (Memory Profiler) X Shark (MacOS X)XX

11 Tools ManualInstr.SamplingSimulationHardware Counter system timersX gprof / gcc -pgXX valgrind(callgrind)/kcachegrindX IBM Rational quantifyXX OprofileX Intel VtuneXX PAPI (Performance API)X JVMPI (Java Virtual Machine Profiler) runhprof, Eclipse TPTP (Test&Performance) OptimizeIT, JProbe, JMP (Memory Profiler) X Shark (MacOS X)XX

12 Using the tools : system timers You have to know the timer’s resolution Windows : QueryPerformanceCounter()/QueryPerformanceFrequency() Linux/Unix : gettimeofday() clock() Java : System.currentTimeMillis() System.currentTimeNanos() Intel CPU counter : RDTSC (ReaD Time Stamp Counter)

13 Tools ManualInstr.SamplingSimulationHardware Counter system timersX gprof / gcc -pgXX valgrind(callgrind)/kcachegrindX IBM Rational quantifyXX OprofileX Intel VtuneXX PAPI (Performance API)X JVMPI (Java Virtual Machine Profiler) runhprof, Eclipse TPTP (Test&Performance) OptimizeIT, JProbe, JMP (Memory Profiler) X Shark (MacOS X)XX

14 Using the tools : gprof gprof (compiler generated instrumentation) : instrumentation : count the function calls temporal sampling compile with gcc -pg create gmon.out file at runtime Drawbacks line information not precise needs a complete recompilation results not always easy to analyze for large software

15 Using the tools : gprof

16 Using the tools : gprof

17 Tools ManualInstr.SamplingSimulationHardware Counter system timersX gprof / gcc -pgXX valgrind(callgrind)/kcachegrindX IBM Rational quantifyXX OprofileX Intel VtuneXX PAPI (Performance API)X JVMPI (Java Virtual Machine Profiler) runhprof, Eclipse TPTP (Test&Performance) OptimizeIT, JProbe, JMP (Memory Profiler) X Shark (MacOS X)XX

18 Using the tools : callgrind callgrind/kcachegrind : cache simulator on top of valgrind CPU simulation : estimate CPU cycles for each line of code analyze the data more easily with kcachegrind Drawbacks time estimates can be inaccurate measure only the user part of the code analyzed software is times slower, uses huge amount of memory.

19 Using the tools : callgrind callgrind/kcachegrind usage : on already compiled software : valgrind --tool=callgrind prog generates callgrind.out.xxx analyzed with callgrind_annotate or kcachegrind to be usable in spite of its slowness : do not simulate cache usage : --simulate-cache=no start instrumentation only when needed : --instr-atstart=no / callgrind_control -i on

20 Using the tools : callgrind

21 Using the tools : massif massif (heap profiler) : another valgrind tool to be used on compiled software : valgrind --tool=massif prog generates massif.xxx.ps : memory usage vs. time massif.xxx.txt : which part of code uses what

22 Using the tools : massif

23 Tools ManualInstr.SamplingSimulationHardware Counter system timersX gprof / gcc -pgXX valgrind(callgrind)/kcachegrindX IBM Rational quantifyXX OprofileX Intel VtuneXX PAPI (Performance API)X JVMPI (JVM Profiler) runhprof, Eclipse TPTP (Test&Performance) X Shark (MacOS X)XX

24 Using the tools : runhprof java/runhprof SUN’s JVM extension java -Xrunhprof:cpu=samples,depth=6,thread=y prog generates java.hprof.txt file analyzed with perfanal : java -jar PerfAnal.jar java.hprof.txt memory : java -Xrunhprof:heap=all prog Drawbacks coarse sampling does not use all the possibilities of JVMPI

25 Using the tools : runhprof

26 Using the tools : Eclipse Test & Performance Tools Platform TPTP : tools integrated into Eclipse Supports only Java (for the moment) Profiling of local or distributed software

27 Using the tools : Eclipse Test & Performance Tools Platform

28 Using the tools : Eclipse Test & Performance Tools Platform

29 Outline Profiling Tools Optimization

30 Optimization No premature optimization Keep the code maintainable Do not over-optimize

31 Optimization (continued) Find a better algorithm the constant factor of the complexity can be significant Memory access : first cause of slowness Use already optimized libraries Limit the number of calls to expensive functions Write performance benchmarks/tests allows one to check that the performance has not degraded The bottleneck moves at each optimization step example : I/O can become blocking

32 Optimization example Optimization example : image inversion (5000x5000) for (int x = 0; x < w; x++) for (int y = 0; y < h; y++) data[y][x] = data[y][x]; Without compiler optimization :435 ms Compiler optimization (-O3) : 316 ms Improve memory access locality :107 ms Suppress double dereference on data :94 ms Constant code outside of the loop :63 ms (~7x) OpenMP parallelization : 38 ms Using MMX assembly :26 ms

33 Conclusion No premature optimization Know your profiling tool Keep the code maintainable Do not over-optimize

34 Questions ?