Presentation is loading. Please wait.

Presentation is loading. Please wait.

DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel.

Similar presentations


Presentation on theme: "DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel."— Presentation transcript:

1 DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Alex.Klimovitski@intel.com Software Application Engineer, Intel EMEA

2 Agenda Hyper-Threading technology and What to do about it The answer is: multithreading! Introduction to OpenMP ™ Maximizing performance with OpenMP and Intel ® Threading Tools

3 How HT Technology Works Physical processors Logical processors visible to OS Physical processor resource allocation Throughput Time Resource 1 Resource 2 Resource 3 Thread 2 Thread 1 Without Hyper- Threading With Hyper- Threading Thread 2 Thread 1 Resource 1 Resource 2 Resource 3 + Higher resource utilization, higher output with two simultaneous threads

4 HT Technology, Not Magic Arch State MultiprocessorHyper-Threading Data Caches CPU Front- End Out-of- Order Execution Engine Data Caches CPU Front- End Out-of- Order Execution Engine Data Caches CPU Front- End Out-of- Order Execution Engine HT Technology increases processor performance by improving resource utilization

5 HT Technology: What’s Inside Instruction TLB Next Instruction PointerInstruction Streaming Buffers Trace Cache Fill Buffers Register Alias Tables Trace Cache Next IP Return Stack Predictor HT Technology increases processor performance at extremely low cost

6 Taking Advantage of HT HT is transparent to OS and apps Software simply “sees” multiple CPUs Software usage scenarios to benefit from HT: Multithreading – inside one application Multitasking – among several applications OS support enhances HT benefits: smart scheduling to account for logical CPUs halting logical CPUs in the idle loop implemented in: Windows* XP, Linux* 2.4.x To take advantage of HT Technology, multithread your application!

7 What? Multithreading Strategy How? Multithreading Implementation Multithreading Your Application

8 Multithreading Strategy Exploit Task Parallelism Exploit Data Parallelism

9 Exploiting Task Parallelism Partition work into disjoint tasks Execute the tasks concurrently (on separate threads) Compress Thread Encrypt Thread Data

10 Exploiting Data Parallelism Thread A Thread N … Partition data into disjoint sets Assigns the sets to concurrent threads

11 Multithreading Implementation API / Library Win32* threading API P-threads MPI Programming language mechanisms Java* C# Programming language extension OpenMP ™

12 What is OpenMP ™ ? Compiler extension for easy multithreading Ideally WITHOUT CHANGING C/C++ CODE Three components: #pragma s (compiler directives) – most important API and Runtime Library Environment Variables Benefits: Easy way to exploit Hyper-Threading Portable and standardized Allows incremental parallelization Dedicated profiling tools

13 Programming Model Fork-Join Parallelism: main thread creates a team of additional threads team threads join at the end of parallel region All memory (variables) is shared by default Parallelism can be added incrementally sequential program evolves into parallel parallel regions team of threads main thread

14 Our First OpenMP ™ Program All OpenMP ™ pragmas start with omp Pragma parallel starts team of n threads Everything inside is executed n times! All memory is shared! void func() { int a, b, c, d; a = a + b; c = c + d; } void func() { int a, b, c, d; #pragma omp parallel { a = a + b; c = c + d; }

15 Task Assignment to Threads Work-sharing pragmas assign tasks to threads Pragma sections assigns non-iterative tasks void func() { int a, b, c, d; #pragma omp parallel { #pragma omp sections { #pragma omp section a = a + b; //one thread #pragma omp section c = c + d; //another thread }

16 Work-sharing Pragmas Work-sharing pragmas can be merged with pragma parallel void func() { int a, b, c, d; #pragma omp parallel sections { #pragma omp section a = a + b; //one thread #pragma omp section c = c + d; //another thread } #pragma omp parallel sections { #pragma omp section do_compress(); #pragma omp section do_encrypt(); } Two functions execute in parallel: #pragma omp parallel sections implements Task-Level Parallelism!

17 Work-sharing for Loops Pragma for assigns loop iterations to threads There are many ways to do this… By the way, what does the example do?? void func() { int a[N], sum = 0; #pragma omp parallel for for (int i = 0; i < N; i++) { sum += a[i]; } #pragma omp parallel for implements Data-Level Parallelism!

18 Variable Scope Rules Implicit Rule 1: All variables defined outside omp parallel are shared by all threads Implicit Rule 2: All variables defined inside omp parallel are local to every thread Implicit Exception: In omp for, the loop counter is always local to every thread Explicit Rule 1: Variables listed in shared() clause are shared by all threads Explicit Rule 2: Variables listed in private() clause are local to every threads

19 What’s private, what’s shared? void func() { int a, i; #pragma omp parallel for \ shared(c) private(d, e) for (i = 0; i < N; i++) { int b, c, d, e; a = a + b; c = c + d * e; }

20 Synchronization Problem sum++; mov eax, [sum] add eax, 1 mov [sum], eax mov eax, [sum] add eax, 1 mov [sum], eax s s+1 s s s thread 1 executes thread 2 executes thrd 1’s eax thrd 2’s eax s sum t+0 t+1 t+2 clks s+1 C code CPU instructions mov [sum], eax s+1 t+3 Not what we expected!

21 Synchronization Pragmas #pragma omp single – execute next operator by one (random) thread only #pragma omp barrier – hold threads here until all arrive at this point #pragma omp atomic – executes next memory access op atomically (i.e. w/o interruption from other threads) #pragma omp critical [ name ] – let only one thread at a time execute next op int a[N], sum = 0; #pragma omp parallel for for (int i = 0; i < N; i++) { #pragma omp critical sum += a[i]; // one thread at a time }

22 omp for schedule Schemes schedule clause defines how loop iterations assigned to threads Compromise between two opposite goals: Best thread load balancing With minimal controlling overhead N N/2 C f single thread schedule(static) schedule(dynamic, c) schedule(guided, f)

23 Reduction Can we do smarter than synchronizing all accesses to one common accumulator? Alternative: private accumulator in every thread to produce the final value, all private accumulators summed up (reduced) in the end Reduction ops: +, *, -, &, ^, |, &&, || int a[N], sum = 0; #pragma omp parallel for reduction(+: sum) for (int i = 0; i < N; i++) { sum += a[i]; // no synchronization needed }

24 OpenMP ™ Quiz What do these things do?? omp parallel omp sections omp for omp parallel for private shared reduction schedule omp critical omp barrier

25 OpenMP ™ Pragma Cheatsheet Fundmental pragma, starts team of threads omp parallel Work-sharing pragmas (can merge w parallel) omp sections omp for Clauses for work-sharing pragmas private, shared, reduction – variable’s scope schedule – scheduling control Synchronization pragmas omp critical [name] omp barrier

26 What Else Is in OpenMP ™ ? Advanced variable scope and initialization Advanced synchronization pragmas OpenMP ™ API and environment variables control number of threads manual low-level synchronization… Task queue model of work-sharing Intel-specific extension until standardized Compatibility with Win32* and P-threads  www.openmp.org

27 Task Queue Work-sharing For more complex iteration models beyond counted for-loops struct Node { int data; Node* next; }; Node* head; for (Node* p = head; p != NULL; p = p->next) { process(p->data); } Node* p; #pragma intel omp taskq shared(p) for (p = head; p != NULL; p = p->next) //only 1 thrd { #pragma intel omp task process(p->data); // queue task to be executed } // on any available thread

28 Related TechEd Sessions DEV490: Easy Multi-threading for Native Microsoft®.NET Apps With OpenMP ™ and Intel ® Threading Toolkit HOLINT02: Multithreading for Microsoft ®.NET Made Easy With OpenMP ™ and Intel ® Threading Toolkit HOLINT01: Using Hyper-Threading Technology to enhance your native and managed Microsoft ®.NET applications

29 Summary & Call to Action Hyper-Threading Technology increases processor performance at very low cost HT Technology is available on mass consumer desktop today To take advantage of HT Technology, multithread your application! Use OpenMP ™ to multithread your application incrementally and easily! Use Intel Threading Tools to achieve maximum performance

30 Resources  developer.intel.com/software/products/ compilers/  www.openmp.org Intel ® Compiler User Guide

31 THANK YOU! developer.intel.com Please remember to complete your online Evaluation Form!!

32 Reference

33 Quest to Exploit Parallelism Level of Parallelism Exploited byTechnologyProblem InstructionMultiple, pipelined, out- of-order exec. units, branch prediction NetBurst™ microarch. (pipelined, superscalar, out- of-order, speculative) Utilization DataSIMDMMX, SSE, SSE2 Scope Task, ThreadMultiple processors SMPPrice

34 Watch HT in Action! In-Order PipelineOut-of-Order PipelineIn- Order

35 Intel ® Compiler Switches for Multithreading & OpenMP ™ Automatic parallelization /Qparallel OpenMP ™ support /Qopenmp /Qopenmp_report{0|1|2}

36 Key Terms Multiprocessing (MP) hardware technology to increase processor performance by increasing number of CPUs Hyper-Threading Technology (HT) hardware technology to increase processor performance by improving CPU utilization Multithreading (MT) software technology to improve software functionality & increase software performance by utilizing multiple (logical) CPUs

37 Community Resources http://www.microsoft.com/communities/default.mspx Most Valuable Professional (MVP) http://www.mvp.support.microsoft.com/ Newsgroups Converse online with Microsoft Newsgroups, including Worldwide http://www.microsoft.com/communities/newsgroups/default.mspx User Groups Meet and learn with your peers http://www.microsoft.com/communities/usergroups/default.mspx

38 evaluations evaluations

39 © 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.


Download ppt "DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel."

Similar presentations


Ads by Google