1 How to do Multithreading First step: Sampling and Hotspot hunting 2010. 3. 19 Myongji University Sugwon Hong 1.

Slides:



Advertisements
Similar presentations
Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.
Advertisements

Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results.
Profiling your application with Intel VTune at NERSC
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Solutions for Scheduling Assays. Why do we use laboratory automation? Improve quality control (QC) Free resources Reduce sa fety risks Automatic data.
JProbe. 1. JProbe Use JProbe Profile –identify method and line level performance bottlenecks Use JProbe Memory Debugger –investigating memory leaks and.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Parallel Processors Todd Charlton Eric Uriostique.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Speeding up VirtualDub Presented by: Shmuel Habari Advisor: Zvika Guz Software Systems Lab Technion.
- 1 - Copyright © 2004 Intel Corporation. All Rights Reserved. Maximizing Application’s Performance by Threading, SIMD and micro arcitecture tuning Koby.
Assignment 1 CMSC 714 September 11, Data Collection For this assignment, you will be asked to provide the following: –Effort spent on various.
Threaded Programming Methodology Intel Software College.
Project Proposal (Title + Abstract) Due Wednesday, September 4, 2013.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Parallel implementation of RAndom SAmple Consensus (RANSAC) Adarsh Kowdle.
Lecture 8: Caffe - CPU Optimization
GmImgProc Alexandra Olteanu SCPD Alexandru Ştefănescu SCPD.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads.
Multi-core Programming Threading Concepts. 2 Basics of VTune™ Performance Analyzer Topics A Generic Development Cycle Case Study: Prime Number Generation.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Standard Grade Computing SYSTEM SOFTWARE CHAPTER 19.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Stage 9: Relay Programming. Objectives Practice communicating ideas through codes and symbols Use teamwork to complete a task Verify the work of their.
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Application performance and communication profiles of M3DC1_3D on NERSC babbage KNC with 16 MPI Ranks Thanh Phung, Intel TCAR Woo-Sun Yang, NERSC.
Multi-core Programming Threading Methodology. 2 Topics A Generic Development Cycle.
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
1 Multithreaded Programming Concepts Myongji University Sugwon Hong 1.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
Experiences parallelising the mixed C-Fortran Sussix BPM post-processor H. Renshall, BE Dept associate, Jan 2012 Using appendix material from CERN-ATS-Note
CSE 232: C++ Programming in Visual Studio Graphical Development Environments for C++ Eclipse –Widely available open-source debugging environment Available.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Debugging Threaded Applications By Andrew Binstock CMPS Parallel.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Single Node Optimization Computational Astrophysics.
A parallel High Level Trigger benchmark (using multithreading and/or SSE)‏ Håvard Bjerke.
Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich.
Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Concurrency and Performance Based on slides by Henri Casanova.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
Tuning Threaded Code with Intel® Parallel Amplifier.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
USEIMPROVEEVANGELIZE ● Yue Chao ● Sun Campus Ambassador High-Performance Computing with Sun Studio 12.
Introduction to Parallel Computing What is parallel computing? A computational method that utilizes multiple processing elements to solve a.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Using the VTune Analyzer on Multithreaded Applications
Chapter 4: Multithreaded Programming
Parallel Software Development with Intel Threading Analysis Tools
Threaded Programming Methodology
Intel® Parallel Studio and Advisor
Chapter 4: Threads.
Threading Methodology using OpenMP*
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
Tuning Threading Code with Intel® Thread Profiler for Explicit Threads
Multithreading Why & How.
Chapter 4: Threads & Concurrency
Intel Parallel Studio Examples
Lecture Topics: 11/1 Hand back midterms
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

1 How to do Multithreading First step: Sampling and Hotspot hunting Myongji University Sugwon Hong 1

2 Benefits of Threads Threads are intended to improve performance and responsiveness of a program. Quick turnaroud time Completing a single job in the smallest amount of time possible High throughput Finishing the most tasks in a fixed amount of time

3 Risks of Threads But if they are not used properly, they can lead to degrade performance, and sometimes unpredictable behavior, and error conditions Data race (race conditions) Deadlock And other extra burdens. Code complexity Portability issues Testing and debugging difficulty

4 Common questions for multithreading Where to thread? Is it worth threading a selected region? What should the expected speedup be? Will we meet the expected performance? Can we correct any error while threading? How long would it take to thread? How much re-design/effort is required? Will it scale as more threads/data are added? Which threading model to use?

5 Starting point: Measurement Before answering all questions, we just try to start the first step. Measurement may give us plenty of information about where we start. Collect data which provide you with CPU hotspots, I/O hotspots, and the degree of parallelism in your code while the program is running. Measure before/during/after threading.

6 Performance tools To do measurement, we need proper performance tools. The Intel VTune Performance Analyzer, along with the Thread Profiler, identifies “ hot spots ” of code that may benefit from threading, locates thread performance bottlenecks, estimates achievable/available performance, and shows call graph to help to identify threading candidates. The Intel Thread Checker allows you to quickly validate designs and create prototypes by locating deadlocks and race conditions.

7 Development cycle Analysis –Verify timings, verify dependencies –Intel® VTune™ Performance Analyzer Design (Introduce Threads) –Use a threaded library –e.g. Intel® Performance libraries: IPP and MKL –OpenMP* (Intel® Compiler) –Explicit threading (Win32*, Pthreads*) Analyze for correctness –Intel® Thread Checker –Intel Debugger Tune performance –Thread Profiler –Intel® VTune™ Performance Analyzer (source : Intel Academy program)

8 Today’s lab Using the Vtune performance analyzer, we do measurements for three cases. Measure and hunt the hot spot for the serial version of Mandelbrot program. Do multithreading and observe any change by measurement. Do some tweak and observe the result for load balancing by measurement.