Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
Advertisements

CSC 360- Instructor: K. Wu Overview of Operating Systems.
GNU gprof Profiler Yu Kai Hong Department of Mathematics National Taiwan University July 19, 2008 GNU gprof 1/22.
Profiling your application with Intel VTune at NERSC
Intel® performance analyze tools Nikita Panov Idrisov Renat.
University of Maryland Locality Optimizations in cc-NUMA Architectures Using Hardware Counters and Dyninst Mustafa M. Tikir Jeffrey K. Hollingsworth.
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
Chapter 4: Multithreaded Programming
Presented by Rengan Xu LCPC /16/2014
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Systems Programming Course Gustavo Rodriguez-Rivera.
Operating Systems Béat Hirsbrunner Main Reference: William Stallings, Operating Systems: Internals and Design Principles, 6 th Edition, Prentice Hall 2009.
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
1 1 Profiling & Optimization David Geldreich (DREAM)
Spring 2014 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Implementing Processes and Process Management Brian Bershad.
Operating Systems Lecture 2 Processes and Threads Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.
CS 584. Performance Analysis Remember: In measuring, we change what we are measuring. 3 Basic Steps Data Collection Data Transformation Data Visualization.
BG/Q Performance Tools Scott Parker Mira Community Conference: March 5, 2012 Argonne Leadership Computing Facility.
CS 444 Introduction to Operating Systems
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
LINUX System : Lecture 7 Bong-Soo Sohn Lecture notes acknowledgement : The design of UNIX Operating System.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
1 Announcements  Homework 4 out today  Dec 7 th is the last day you can turn in Lab 4 and HW4, so plan ahead.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Lab 2 Parallel processing using NIOS II processors
Introduction to ECE 454 Computer Systems Programming Topics: Lecture topics and assignments Profiling rudiments Lab schedule and rationale Cristiana Amza.
SvPablo. Source view Pablo GUI for instrumenting source code and viewing runtime performance data Joint work at Univ. of Illinois and Rice Univ. HPF programs.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Debugging Ensemble Productions CAMTA Meeting 11 th November 2010 John Murray.
Concurrency, Processes, and System calls Benefits and issues of concurrency The basic concept of process System calls.
AE6382 MinGW l The MinGW (Minimalist GNU for Windows) GNU compilers u C/C++ u Fortran 77 u Fortran 95 l Generate native Windows code l User Windows libraries.
Threaded Programming Lecture 2: Introduction to OpenMP.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
Tuning Threaded Code with Intel® Parallel Amplifier.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Introduction Purpose  This training course demonstrates the use of the High-performance.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Two notions of performance
Profiling with GNU GProf
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Outline Installing Gem5 SPEC2006 for Gem5 Configuring Gem5.
CS427 Multicore Architecture and Parallel Computing
Lecture Topics: 11/1 Processes Process Management
Advanced TAU Commander
Threads and Data Sharing
More examples How many processes does this piece of code create?
CMSC 611: Advanced Computer Architecture
Tools.
Lecture Topics: 11/1 General Operating System Concepts Processes
Processes Hank Levy 1.
Tools.
Introduction to OProfile
LINUX System : Lecture 7 Lecture notes acknowledgement : The design of UNIX Operating System.
CMSC 611: Advanced Computer Architecture
ULTRA-FAST BIQUAD FILTERING OPTIMIZED FOR CORTEX-M4/M7
Video Notes.
Processes Hank Levy 1.
Makefiles, GDB, Valgrind
Presentation transcript:

Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture & Programming

Korea Univ Performance Analysis Assuming that the performance of an application is satisfactory in single-threaded mode, the most likely performance question is “Why does my application not get the expected speed-up when running on multiple threads? The performance of large-scale parallel applications depends on many factors  Load imbalance  Parallelization overheads 2

Korea Univ Profiling Several approaches can be used to obtain performance data Sampling  Based on periodic OS interrupts (timer interrupts)  At each sampling point, the performance data such as the program counter, call stacks, and hardware counter data are collected and recorded  Less numerically accurate, but allow the target program to run at near full speed  Examples Unix gprof Sun Performance Analyzer Oprofile Code instrumentation  Calls to a tracing library are inserted in the code by the programmer, the compiler, or a tool  These library calls write performance data into a file during program execution 3

Korea Univ Pertinent Performance Data Time spent in user and system level routines Time spent in serial parts and parallel regions Time spent in communications  #Invalidations, #cache-to-cache transfers Hardware performance counter information such as CPU cycles, I$ and D$ misses The state of a thread at given times such as waiting for work, synchronizing, forking, and joining 4

Korea Univ gprof Use GNU gprof to get the profile information  Compile and link your code with -pg option  Run your code gmon.out is generated  Run gprof to interpret the information 5

Korea Univ Testrun Benchmarks Download a parallel benchmark from  Download the OpenMP version of NPS (NPB 3) Compile the BT benchmark  Read README.install for information of how to compile the code  Edit ‘make.def’ under /config/ Change ‘f77’ to ‘gfortran’ Add ‘-pg’ option to FLAGS and FLINKFLAGS  FFLAGS = -O -fopenmp –pg  FLINKFLAGS = -O –fopenmp -pg  Compile BT with ‘make BT CLASS=A’ Run simulation with./bin/BT.A  It will generate gmon.out by default in the directory where you run the program Use gprof to extract the profile information  gprof./bin/BT.A > bt.txt  Open bt.txt with any text editor 6

Korea Univ Testrun Benchmarks Compile the DC benchmark  Read README.install for information of how to compile the code  Edit ‘make.def’ under /config/ Change ‘cc’ to ‘gcc’ Add ‘-pg’ option to FLAGS and FLINKFLAGS  CFLAGS = -O -fopenmp –pg  CLINK = $(CC) –fopenmp -pg  Compile BT with ‘make DC CLASS=A’ Run simulation with./bin/dc.A.x  It will generate gmon.out by default in the directory where you run the program Use gprof to extract the profile information  gprof./bin/dc.A.x > dc.txt  Open dc.txt with any text editor 7