Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

Similar presentations


Presentation on theme: "Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads."— Presentation transcript:

1 Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads

2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Objectives After successful completion of this module you will be able to… Use Thread Profiler to recognize and fix common performance problems in applications using Windows* threads

3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 3 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Agenda Look at Intel® Thread Profiler features Define Critical Path Analysis Examine Thread Profiler data views available Review common performance issues of multithreaded applications Focus on Load imbalance Focus on Synchronization contention Describe general optimizations to gain better performance

4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 4 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Motivation Developing efficient multithreaded applications is hard New performance problems are caused by the interaction between concurrent threads Load imbalance Contention on synchronization objects Threading overhead

5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 5 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Intel® Thread Profiler Plugs in to the VTune performance environment Instrumentation-based data collector in VTune Identifies performance issues in OpenMP* or threaded applications using the Win32* API, POSIX* threads, and Intel® Threading Building Blocks Pinpoints performance bottlenecks that directly affect execution time

6 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 6 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Intel® Thread Profiler Features Supports several different compilers Intel® C++ and Fortran Compilers, v7 and higher Microsoft* Visual* C++.NET* 2002, 2003 & 2005 Editions Integrated into Microsoft Visual Studio.NET* IDE Binary instrumentation of applications Different views and filters available to assist and organize analysis Uses critical path analysis

7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 7 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads critical path is the longest execution flow The critical path is the longest execution flow What is the Critical Path? Threaded applications contain multiple execution flows A new flow is created when a thread is created or resumes Flow ends when a thread terminates or blocks on a synchronization primitive Thread 1 Thread 2 Thread 3 T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T8T8 T9T9 T 10 T 11 T 12 T 13 T 14 T 15 Acquire L Threads 2 & 3 Done Acquire L Wait for Threads 2 & 3 Release L Acquire lock L Wait for L Release LWait for L Thread 2 terminates Thread 3 terminates Thread 1 terminates

8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 8 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Critical Path Analysis System Utilization Relative to the system executing the application Idle: no threads Serial: a single thread Under Utilized: more than one thread, less than cores Fully Utilized: # threads == # cores Over Utilized: # threads > # cores Thread interaction categories Cruise: threads running without interference Overhead: thread operation overhead Blocking: thread waiting on external event Impact: thread preventing some other thread from executing If the critical path is shortened, the application will run in less time

9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 9 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Thread 1 Thread 2 Thread 3 T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T8T8 T9T9 T 10 T 11 T 12 T 13 T 14 T 15 Acquire lock L Wait for Threads 2 & 3 Wait for L Release LWait for L Release L Acquire L Threads 2 & 3Done System Utilization Examines processor utilization to determine concurrency level of the application Concurrency is the number of active threads Categorization shown for a system configuration with 2 processors IdleSerialFully UtilizedUnder UtilizedOver Utilized Concurrency Level 0 15 5 10 Time

10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 10 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Execution Time Categories Analyze thread interaction and behavior along critical path Record objects that cause CP transitions Cruise timeOverheadBlocking timeImpact time Categorization shown for a system configuration with 2 processors Thread Interaction 0 15 5 10 Time Thread 1 Thread 2 Thread 3 T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T8T8 T9T9 T 10 T 11 T 12 T 13 T 14 T 15 Acquire lock L Wait for Threads 2 & 3 Wait for L Release LWait for L Release L Acquire L Threads 2 & 3 Done

11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 11 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Merging Concurrency and Behavior Concurrency Level Critical Path Thread Behavior 0 15 5 10 Time Start with system utilization Further categorize by behavior

12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 12 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Thread Profiler Views Critical Path View Shows breakdown of the critical path Profile View Shows the breakdown of selected critical paths User can select other views of the selected profile Concurrency level, threads, objects Timeline View Shows thread activity and critical path transitions for the entire application Source View Transition source view, creation source view

13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 13 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Activity 1a Threaded version of potential code Is there a performance issue? Goal Run application through Thread Profiler Examine thread activities by reviewing different views

14 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 14 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Thread Profiler Profile View Profile Pane Timeline Pane

15 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 15 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Profile Pane – Concurrency Level View Concurrency Level View Two threads ran in parallel ~33% of the time Ran single threaded ~65% of the time Lets look at the Thread View

16 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 16 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Profile Pane – Thread View Time on the Critical Path Active time of the thread Lifetime of the thread Lets look at the Object View

17 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 17 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Profile Pane – Object View This object caused all of the impact Lets look at Timeline View

18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 18 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Timeline Pane

19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 19 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Source View

20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 20 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Activity 1b Threaded version of potential code Is there a performance issue? Goal Examine thread activities by reviewing different views Determine system utilization Identify any performance issues

21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 21 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Review Activity 1 Concurrency Level view can be used to determine system utilization by the application Timeline view enables you to understand the thread activity in your application Instrumentation time will be included in first run results; thus, for applications running in a short amount of time, a second run may produce more realistic timings.

22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 22 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Common Performance Issues Load balance Improper distribution of parallel work Synchronization Excessive use of global data, contention for the same synchronization object Parallel Overhead Due to thread creation, scheduling.. Granularity Not sufficient parallel work

23 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 23 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Load Imbalance Unequal work loads lead to idle threads and wasted time Busy Idle Time Thread 0 Thread 1 Thread 2 Thread 3 Start threads Join threads

24 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 24 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Redistribute Work to Threads Static assignment Are the same number of tasks assigned to each thread? Do tasks take different processing time? Do tasks change in a predictable pattern? Rearrange (static) order of assignment to threads Use dynamic assignment of tasks

25 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 25 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Redistribute Work to Threads Dynamic assignment Is there one big task being assigned? Break up large task to smaller parts Are small computations agglomerated into larger task? Adjust number of computations in a task More small computations into single task? Fewer small computations into single task? Bin packing heuristics

26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 26 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Unbalanced Workloads Threads are unbalanced Active Times not equal

27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 27 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Activity 2 – Load Imbalance Threaded version of potential code with thread pools Has a load balance performance issue

28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 28 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Review Activity 2 Threads view can be used to determine activity levels of each thread within the application Timeline view enables you to understand the thread activity in your application

29 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 29 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Synchronization By definition, synchronization serializes execution Lock contention means more idle time for threads Busy Idle In Critical Thread 0 Thread 1 Thread 2 Thread 3 Time

30 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 30 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Synchronization Fixes Eliminate synchronization Expensive but necessary evil Use storage local to threads Use local variable for partial results, update global after local computations Allocate space on thread stack ( alloca ) Use thread-local storage API (TlsAlloc) Use atomic updates whenever possible Some global data updates can use atomic operations (Interlocked API family)

31 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 31 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Atomic Updates Use Win32 Interlocked* intrinsics in place of synchronization object static long counter; // Fast InterlockedIncrement (&counter); // Slower EnterCriticalSection (&cs); counter++; LeaveCriticalSection (&cs);

32 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 32 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Synchronization Fixes Reduce size of critical regions protected by synchronization object Larger critical regions tie up sync objects longer; other threads sit idle longer waiting to acquire objects Only accesses to shared variables need to be protected

33 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 33 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Synchronization Fixes Use best synchronization object for job Critical Section Local object Available to threads within the same process Lower overhead (~8X faster than mutex) Mutex Kernel object Accessible to threads within different processes Deadlock safety (can only be released by owner) Other objects are available

34 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 34 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Object Contention This object caused all of the impact What is all this?These four threads… …are impacting threads by this object

35 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 35 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Activity 3 Threaded version of numerical integration Has serious performance issues Goal Understand thread activity Use the Thread Profiler groupings Examine synchronization and its effect on performance Fix performance issue

36 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 36 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Review Activity 3 Grouping objects and threads provides the information on which objects impact what threads Apply the heuristics from labs for locating bottlenecks in the source code For longer running applications, the difference in first and second run- times is negligible

37 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 37 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads General Optimizations Serial Optimizations Serial optimizations along the critical path should affect execution time Parallel Optimizations Reduce synchronization object contention Balance workload Functional parallelism Analyze benefit of increasing number of processors Analyze the effect of increasing the number of threads on scaling performance

38 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 38 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Intel® Thread Profiler for Explicit Threads Whats Been Covered Identifying performance issues can be time consuming without tools Tools are required to understand and to optimize parallel efficiency and hardware utilization Thread Profiler helps you understand your applications thread activity, system utilization, and scaling performance

39 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 39 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads


Download ppt "Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads."

Similar presentations


Ads by Google