Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phil Pennington Microsoft WSV317.

Similar presentations


Presentation on theme: "Phil Pennington Microsoft WSV317."— Presentation transcript:

1

2 Phil Pennington philpenn@microsoft.com Microsoft WSV317

3 What will you look for? Overall Solution Scalability

4 Agenda Windows Server 2008 R2 New NUMA APIs New User-Mode Scheduling APIs New C++ Concurrency Runtime

5 Example NUMA Hardware Today A 256 Logical Processor System – HP SuperDome A 64 Logical Processor System - Unisys ES7000 64 dual-core hyper-threaded “Montvale” 1.6 GHz Itanium2 32 dual-core hyper-threaded “Tulsa” 3.4 GHz Xeon

6 NUMA Hardware Tommorrow 2, 4, 8 Cores-per-Socket "Commodity" CPU Architectures Expect systems with 128-256 logical processors PCIExpress*PCIExpress* Nehalem I/OHub I/OHub

7 NUMA Node Groups New with Win7 and R2 GROUPGROUP NUMA NODE SocketSocketSocketSocket Core LP

8 NUMA Node Groups Example: 2 Groups, 4 Nodes, 8 Sockets, 32 Cores, 4 LPs/Core = 128 LPs GroupGroup NUMA Node SocketSocket Core LP Core LP Core LP Core LP SocketSocket Core LP Core LP Core LP Core LP NUMA Node SocketSocket Core LP Core LP Core LP Core LP SocketSocket Core LP Core LP Core LP Core LP GroupGroup NUMA Node SocketSocket Core LP Core LP Core LP Core LP SocketSocket Core LP Core LP Core LP Core LP NUMA Node SocketSocket Core LP Core LP Core LP Core LP SocketSocket Core LP Core LP Core LP Core LP

9 Sample SQL Server Scaling 64P To 128P 1.7X 64P 128P 1.3X

10 P1P1 Cache 1 Mem A Node Interconnect Mem B Disk A P3P3 Cache 3 P4P4 Cache 4 Cache(s) (0) (3) (4) (1) (7) I/O InitiatorI/O InitiatorISR I/O Buffer Home DPC (2)(6) (5) P2P2 Cache 2 Disk B Locked out for I/O Initiation outLocked out for I/O Initiation Bad Case Disk Write Software and Hardware Locality NOT Optimal

11 P1P1 Cache 1 Mem A Node Interconnect Mem B Disk A P3P3 Cache 3 P4P4 Cache 4 Cache(s) (3) I/O Initiator ISRDPC (2) P2P2 Cache 2 Disk B ISR (2) Windows Server 2008 R2 Optimization for NUMA Topology

12 NUMA Aware Applications Non-Uniform Memory Architecture Minimize Contention, Maximize Locality Apps scaling beyond even 8-16 logical processors should be NUMA aware A process or thread can set a preferred NUMA node Use the Node Group scheme for Task or Process partitioning Performance-optimize within Node Groups

13 NUMA API's “Minimize Contention and Maximize Locality”

14 Agenda Windows Server 2008 R2 New NUMA APIs New User-Mode Scheduling APIs New C++ Concurrency Runtime

15 Cooperative Scheduling Conceptual Model Avoiding lock contention gives the best scaling Cooperative scheduling in user-mode avoids contention and context switches Core 2 Thread 3 Thread 3 Non-running threads Core 1 Thread 4 Thread 4 Thread 5 Thread 5 Thread 1 Thread 1 Thread 2 Thread 2 Thread 6 Thread 6 Core 2 Core 1 User Thread 2 User Thread 2 Kernel Thread 2 Kernel Thread 2 User Thread 1 User Thread 1 Kernel Thread 1 Kernel Thread 1 User Thread 3 User Thread 3 Kernel Thread 3 Kernel Thread 3 User Thread 4 User Thread 4 Kernel Thread 4 Kernel Thread 4 User Thread 5 User Thread 5 Kernel Thread 5 Kernel Thread 5 User Thread 6 User Thread 6 Kernel Thread 6 Kernel Thread 6

16 User Mode Scheduling (UMS) System Call Servicing User Kernel KT(P 1 ) KT(P 2 ) UT(P 1 ) UT(P 1 ) UT(P 2 ) Primary Threads Core 1Core 2 KT(1) KT(2) KT(3) KT(4) UT(1) UT(2) UT(3) UT(4) UMS KT (Backing threads) USched ready list Parked SYSCALL Migrate request to appropriate KT Running Blocked Wake primary to regain core UMS completion list Kernel User

17 User Mode Context Switch Benefit Lower context switch time means scheduling finer-grained items UMS-based yield: 370 cycles Signal-and-wait: 2600 cycles Direct impact synchronization-heavy fine-grained work speeds up Indirect impact finer grains means more workloads are candidates for parallelization

18 Getting the Processor Back Benefit The scheduler keeps control of the processor when work blocks in the kernel Direct impact More deterministic scheduling and better use of a thread’s quantum Indirect impact Better cache locality when algorithmic libraries take advantage of the determinism to manage available resources

19 Agenda Windows Server 2008 R2 New NUMA APIs New User-Mode Scheduling New C++ Concurrency Runtime

20 Visual Studio 2010 Tools, Programming Models, Runtimes Parallel Pattern library Resource manager Task scheduler Task Parallel library Task Parallel library PLINQ Managed library Native library Key: Threads/UMS Operating system Concurrency runtime Programming models Agents library Agents library Thread pool Task scheduler Resource manager Data structures Tools Parallel Debugger Parallel Debugger Profiler and concurrency analyzer Profiler and concurrency analyzer

21 UMS Threads ConcRT’s use of UMS is an enabler for: Finer-grained parallelism More deterministic behavior Better cache locality UMS allows ConcRT to boost performance in certain situations: Apps that have a lot of blocking

22 Task Parallelism, For Example Key Concepts Task A computation that may be internally decomposed into additional tasks task_handle Task group A collection of tasks that form a logical computation or sub-computation task_group

23 Task Scheduling Tasks are run by worker threads, which the scheduler controls Dead Zone WT 0 WT 1 WT 2 WT 3 Without UMS (signal-and-wait) With UMS (UMS yield) WT 0 WT 1 WT 2 WT 3

24 User-Mode Scheduling API's and the C++ Concurrency Runtime “Cooperative Thread-Scheduling”

25 Summary Call-to-action Consider how your solution will scale on NUMA systems Utilize the NUMA API’s to Maximize Node Locality Leverage UMS for custom user-mode thread scheduling Use the C++ Concurrency Runtime for most native Parallel Computing scenarios and gain benefits of NUMA/UMS implicitly

26 Resources MSDN Concurrency Dev-Center http://msdn.microsoft.com/concurrency MSDN Channel9 http://channel9.msdn.com/tags/w2k8r2 MSDN Code Gallery http://code.msdn.microsoft.com/w2k8r2 MSDN Server Dev Center http://msdn.microsoft.com/en-us/windowsserver 64+ LP and NUMA API Support http://code.msdn.microsoft.com/64plusLP http://www.microsoft.com/whdc/system/Sysinternals/MoreThan64proc.mspx Dev-Team Blogs http://blogs.msdn.com/pfxteam http://blogs.technet.com/winserverperformance

27 www.microsoft.com/teched Sessions On-Demand & Community http://microsoft.com/technet Resources for IT Professionals http://microsoft.com/msdn Resources for Developers www.microsoft.com/learning Microsoft Certification and Training Resources www.microsoft.com/learning Microsoft Certification & Training Resources Resources Required Slide Speakers, TechEd 2009 is not producing a DVD. Please announce that attendees can access session recordings at TechEd Online. Required Slide Speakers, TechEd 2009 is not producing a DVD. Please announce that attendees can access session recordings at TechEd Online.

28 Related Content DTL203 "The Manycore Shift: Making Parallel Computing Mainstream" Monday 5/11, 2:45-4:00, Room 404, Stephen Toub DTL310 Parallel Computing with Native C++ in Microsoft Visual Studio 2010 Friday 5/15, 2:45-4:00, Room 515A, Josh Phillips DTL403 "Microsoft Visual C++ Library, Language, and IDE : Now and Next" Thursday 5/14, 4:30-5:45, Room 408A, Kate Gregory DTL06-INT "Task-Based Parallel Programming with the Microsoft.NET Framework 4" Thursday 5/14, 1:00-2:15, Blue Thr 2, Stephen Toub Required Slide Speakers, please list the Breakout Sessions, TLC Interactive Theaters and Labs that are related to your session. Required Slide Speakers, please list the Breakout Sessions, TLC Interactive Theaters and Labs that are related to your session.

29 Windows Server Resources Make sure you pick up your copy of Windows Server 2008 R2 RC from the Materials Distribution Counter Learn More about Windows Server 2008 R2: www.microsoft.com/WindowsServer2008R2 Technical Learning Center (Orange Section): Highlighting Windows Server 2008 and R2 technologies Over 15 booths and experts from Microsoft and our partners Over 15 booths and experts from Microsoft and our partners Required Slide Track PMs will supply the content for this slide, which will be inserted during the final scrub. Required Slide Track PMs will supply the content for this slide, which will be inserted during the final scrub.

30 Complete an evaluation on CommNet and enter to win! Required Slide

31

32 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. Required Slide


Download ppt "Phil Pennington Microsoft WSV317."

Similar presentations


Ads by Google