Intel® performance analyze tools Nikita Panov Idrisov Renat.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
Advertisements

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Distributed Systems CS
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
Intel Parallel Advisor Workflow David Valentine Computer Science Slippery Rock University.
Profiling your application with Intel VTune at NERSC
Automated Instrumentation and Monitoring System (AIMS)
Copyright © 2003, SAS Institute Inc. All rights reserved. Where's Waldo Uncovering Hard-to-Find Application Killers Claire Cates SAS Institute, Inc
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Visual Basic 2010 How to Program. © by Pearson Education, Inc. All Rights Reserved.2.
Software Performance Tuning Project – Final Presentation Prepared By: Eyal Segal Koren Shoval Advisors: Liat Atsmon Koby Gottlieb.
Functional Simulation Overview1 OpenTV PC Simulator.
Objectives Machine language vs.. High-level language Procedure-oriented, object-oriented, and event- driven languages Background of Visual Basic VB Integrated.
Types of software. Sonam Dema..
Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.
Design Verification Design Profiler Course 8. All materials updated on: September 30, Design Profiler Design Profiler is a tool integrated within.
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Software Performance Analysis Using CodeAnalyst for Windows Sherry Hurwitz SW Applications Manager SRD Advanced Micro Devices Lei.
Understanding Perfmon The Performance Testing Tool >>>>>>>>>>>>>>>>>>>>>>
Performance Monitor for Complex, Distributed ORB Instances Chen Chen, Kaijian Liu, Hui Long, Gabe Plank Aug. 29, 2010 Problem: Assuming massive data flow.
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Application performance and communication profiles of M3DC1_3D on NERSC babbage KNC with 16 MPI Ranks Thanh Phung, Intel TCAR Woo-Sun Yang, NERSC.
CHAPTER TEN AUTHORING.
Playing Back Scripts In HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Introduction It is developed to create software applications. It is a tool for developers of any program that uses both basic and expert settings. It.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Embedded Software SKKU 14 1 Sungkyunkwan University Tizen v2.3 Application Profiling & Debugging.
Intel Software Development Products. ZJU-Intel Embedded Technology Center VTune ™ Performance Analyzer  Helps you identify.
Click to add text © 2012 IBM Corporation 1 Visualization of View Data Susan L. Cline SWS Visualization.
® IBM Software Group © 2006 IBM Corporation PurifyPlus on Linux / Unix Vinay Kumar H S.
Microsoft Visual Basic 2005 BASICS Lesson 1 A First Look at Microsoft Visual Basic.
SOC Consortium Course Material Debugging and Evaluation Speaker: Yung-Tsung Wang InstructorProf. Tsung-Han Tsai.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Introduction to Interactive Media Interactive Media Tools: Authoring Applications.
Profiling Tools Introduction to Computer System, Fall (PPI, FDU) Vtune & GProfile.
Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS.
Baum, Boyett, & Garrison Comparing Intel C++ and Microsoft Visual C++ Compilers Michael Baum David Boyett Holly Garrison.
A Software Performance Monitoring Tool Daniele Francesco Kruse March 2010.
Copyright (c) 2003 by Prentice Hall Provided By: Qasim Al-ajmi Chapter 2 Introduction to Visual Basic Programming Visual Basic.NET.
Sunpyo Hong, Hyesoon Kim
1 How to do Multithreading First step: Sampling and Hotspot hunting Myongji University Sugwon Hong 1.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
Parallel Computing Presented by Justin Reschke
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
Tuning Threaded Code with Intel® Parallel Amplifier.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Using the VTune Analyzer on Multithreaded Applications
SOFTWARE DESIGN AND ARCHITECTURE
Performance Analysis and optimization of parallel applications
MCTS Guide to Microsoft Windows 7
NVIDIA Profiler’s Guide
HP C/C++ Remote developer plug-in for Eclipse
May 23-24, 2012 Microsoft.
Intel® Parallel Studio and Advisor
Tools.
Tools.
Project Guidelines Prof. Eric Rotenberg.
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Intel® performance analyze tools Nikita Panov Idrisov Renat

Intel VTune™ Amplifier XE Performance Profiler provides information on code performance for users developing serial and multithreaded applications on Windows* and Linux* operating systems on Windows systems, the VTune Amplifier XE integrates into Microsoft Visual Studio* software and is also available as a standalone GUI client on Linux systems, VTune Amplifier XE works only as a standalone GUI client on both Windows and Linux systems, you can benefit from using the command-line interface for collecting data remotely or for performing regression testing 2

Intel VTune™ Amplifier XE Performance Profiler Use the VTune Amplifier XE to locate or determine the following: The most time-consuming (hot) functions in your application and/or on the whole system Sections of code that do not effectively utilize available processor time The best sections of code to optimize for sequential performance and for threaded performance Synchronization objects that affect the application performance Whether, where, and why your application spends time on input/output operations The performance impact of different synchronization methods, different numbers of threads, or different algorithms Thread activity and transitions Hardware-related bottlenecks in your code 3

Hotspot analysis Choose an analysis target. Choose the Hotspots analysis type. Run the Hotspots analysis to locate most time- consuming functions in an application. Analyze the function call flow and threads. Analyze the source code to locate the most time- critical code lines. Compare results before and after optimization. 4

Creating project If symbolic debug information is compiled into the executable it will help to find right lines of the code. But to analyze real application workflow it is recommended to compile with normal options 5

Choose the hotspots analysis type 6 On the left pane of the Analysis Type window, locate the analysis tree and select Algorithm Analysis > Hotspots.

Analysis results Note that CPU Time for the sample application is equal to seconds. It is the sum of CPU time for all application threads. Total Thread Count is 3, so the sample application is multi- threaded. The Top Hotspots section provides data on the most time- consuming functions (hotspot functions) sorted by CPU time spent on their execution. 7

Call stack Select the initialize_2D_buffer function in the grid and explore the data provided in the Call Stack pane on the right. 8

Analyzing the results 9

Analyzizng the results 1. Timeline area. When you hover over the graph element, the timeline tooltip displays the time passed since the application has been launched. 2. Threads area that shows the distribution of CPU time utilization per thread. Hover over a bar to see the CPU time utilization in percent for this thread at each moment of time. Green zones show the time threads are active. 3. CPU Usage area that shows the distribution of CPU time utilization for the whole application. Hover over a bar to see the application-level CPU time utilization in percent at each moment of time. 10

Analyzing the code 1 – source code, 2 – assembler, 3 – processor time, 4 и 5 – useful markers and scroll controls to identify problem code 11

Comparing the results Specify the Hotspots analysis results you want to compare and click the Compare Results button 12

Comparing the results 1 – time difference 2 – before the optimization (first version) 3 – after the optimization 13

Locks and waits analysis Other kind of analysis are provided in a similar way 14

Performing the analysis After the analysis you will be given an information according to the analysis type choosen 15

Analyzing the results Results are also could be viewed with the program call stack 1 – corresponding object, 2 – processor usage, 3 – wait cycles count 16

Analyze the code 1 – source code, 2 – processor usage, 3 – wait loop count, 4 - navigation 17

Comparing the results 1 – wit loop difference, 2 – wait loop count before, 3 – wait loop count after the optimizations, 4 – loop count difference, 5 и 6 – loop count 18

Useful events CPU_CLK_UNHALTED.CORE – processor clock ticks INST_RETIRED.ANY – number of instructions been executed BUS_TRANS_ANY.ALL_AGENTS – bus transaction count L2_LINES_IN.SELF.DEMAND –L2 cache misses. BR_INST_RETIRED.MISPRED – mispredicted branch count 19

Memory access time 20

Application performance analysis 21

Application performance analysis 22

Useful compiler options /Od (-O0 for Linux) – no optimizations (used for debug). /O2 (-O2 for linux) – only default optimizations. /O3 (-O3 for linux) – some additional optimization set. /xO (-xO for Linux) – non-intel arhitecture optimizations. /Qipo (-ipo)- interprocedural optimizations. /Qparallel (-parallel) –autoparallelization. /Qopt-report (-opt-report) /Qopt-report-file /Qopt-report-phase /Qopt-report-help /Qopt-report-routine /Qvec-report [1/2/3] 23

Cluster tools Intel Trace Collector & Analyzer Cluster tools for enterprise applications with MPI. 24

ITAC 25 Similar analysis, but different application and different proposes

26

27

28

Thank you