Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads.

Slides:



Advertisements
Similar presentations
1 Keith D. Underwood, Eric Borch May 16, 2011 A Unified Algorithm for both Randomized Deterministic and Adaptive Routing in Torus Networks.
Advertisements

11 Auto Regression Analysis Shuang He Intel Linux Graphics Validation Team Open Source Technology Center
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
ATI Stream ™ Physics Neal Robison Director of ISV Relations, AMD Graphics Products Group Game Developers Conference March 26, 2009.
Perceptual Computing SDK Q2, 2013 Update Building Momentum with the SDK 1 Barry Solomon, Senior Product Manager, Intel Xintian Wu, Architect, Intel.
Software & Services Group Developer Products Division Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property.
Intel® Education Fluid Math™
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
HEVC Commentary and a call for local temporal distortion metrics Mark Buxton - Intel Corporation.
Guðmundur Helgi Axelsson Program Manager Inventory and Replenishment.
Jeff Blucher Program Manager Store setup and POS.
Intel ® Server Platform Transitions Nov / Dec ‘07.
Intel® Education Read With Me Intel Solutions Summit 2015, Dallas, TX.
Yabin Liu Senior Program Manager Business Intelligence and Reporting.
Intel® Education Learning in Context: Science Journal Intel Solutions Summit 2015, Dallas, TX.
Scott Tucker Program Manager Customer and Loyalty.
Getting Reproducible Results with Intel® MKL 11.0
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Visit our Focus Rooms Evaluation of Implementation Proposals by Dynamics AX R&D Solution Architecture & Industry Experts Gain further insights on Dynamics.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Threading Game Engines - QUAKE 4 & Enemy Territory QUAKE Wars Anu Kalra - Intel Corporation Jan Paul van Waveren - id Software Feb 21, 2006.
Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC MIT July 27, 2010.
IBIS-AMI and Direction Indication February 17, 2015 Updated Feb. 20, 2015 Michael Mirmak.
K-12 Blueprint Overview March An Overview The K-12 Blueprint offers resources for education leaders involved.
Intel® Education Learning in Context: Concept Mapping Intel Solutions Summit 2015, Dallas, TX.
Legal Notices and Important Information Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each.
Enterprise Platforms & Services Division (EPSD) JBOD Update October, 2012 Intel Confidential Copyright © 2012, Intel Corporation. All rights reserved.
Intel Confidential – For Use with Customers under NDA Only Revision - 01 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL®
IBIS-AMI and Direction Decisions
IBIS-AMI and Direction Indication February 17, 2015 Michael Mirmak.
Copyright © 2006 Intel Corporation. WiMAX Wireless Broadband Access: The World Goes Wireless Michael Chen Director of Product & Platform Marketing Group.
Copyright © 2008 Intel Corporation. All rights reserved. Intel Delivering Leadership HPC Technology – today and tomorrow – …for Grids …for Grids Sept 22th,
Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.
The Drive to Improved Performance/watt and Increasing Compute Density Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 1 How Does The Intel® Parallel.
Copyright © 2011 Intel Corporation. All rights reserved. Openlab Confidential CERN openlab ICT Challenges workshop Claudio Bellini Business Development.
Visit our Focus Rooms Evaluation of Implementation Proposals by Dynamics AX R&D Solution Architecture & Industry Experts Gain further insights on Dynamics.
Visit our Focus Rooms Evaluation of Implementation Proposals by Dynamics AX R&D Solution Architecture & Industry Experts Gain further insights on Dynamics.
Boxed Processor Stocking Plans Server & Mobile Q1’08 Product Available through February’08.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Virtualization for the Win! Scaling Electronic Sports League’s servers way up Sreeram Sammeta Paul Lindberg Intel.
Josef Schauer Program Manager Previous version support.
Martin Kruliš by Martin Kruliš (v1.1)1.
1 Develop & Optimize Your Game for Netbooks Omar Rodriguez and Orion Granatir Visual Computing Software Enabling 03/11/10.
INTEL CONFIDENTIAL Intel® Smart Connect Technology Remote Wake with WakeMyPC November 2013 – Revision 1.2 CDI/IBP #:
Tuning Threaded Code with Intel® Parallel Amplifier.
Game Developers Conference 2009 Multithreaded AI For The Win! Orion Granatir Senior Software Engineer.
© Copyright Khronos Group, Page 1 Real-Time Shallow Water Simulation with OpenCL for CPUs Arnon Peleg, Adam Lake software, Intel OpenCL WG, The.
This document is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information.
1 Game Developers Conference 2008 Comparative Analysis of Game Parallelization Dmitry Eremin Senior Software Engineer, Intel Software and Solutions Group.
Only Use FD.io VPP to Achieve high performance service function chaining Yi Intel.
BLIS optimized for EPYCTM Processors
The Small batch (and Other) solutions in Mantle API
Many-core Software Development Platforms
Intel® Parallel Studio and Advisor
A Proposed New Standard: Common Privacy Vulnerability Scoring System (CPVSS) Jonathan Fox, Privacy Office/PDIT Harold A. Toomey, PSG/ISecG Jason M. Fung,
Example of usage in Micron Italy (MIT)
Building responsive apps and sites with HTML5 web workers
Tech·Ed North America /8/ :16 PM
12/26/2018 5:07 AM Leap forward with fast, agile & trusted solutions from Intel & Microsoft* Eman Yarlagadda (for Christine McMonigal) Hybrid Cloud – Product.
Windows Phone application performance and optimization
Ideas for adding FPGA Accelerators to DPDK
Virtio/Vhost Status Quo and Near-term Plan
By Vipin Varghese Application Engineer (NCSD)
Pedro Miguel Teixeira Senior Software Developer Microsoft Corporation
Delivering great hardware solutions for Windows
Expanded CPU resource pool with
Presentation transcript:

Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads

Agenda Threading is worthwhile Data decomposition is a good place to start Think tasks!! Intel tools help make things easy 2

Threading is important!! 3

Multi-core Needs Parallel Applications Threading is required to maximize performance GHz EraMulti-core Era APP PERFORMANCE TIME PLATFORM POTENTIAL PERFORMANCE Parallel Serial 33 FPS in our demo 104 FPS in our demo 4

Follow these steps to add threading… 1.Use data decomposition 2.Use tasks 5

Functional decomposition is limited Core 6

Functional decomposition is limited Core 7

Functional decomposition is limited Core Potential latency with pipelining Poor load balancing Doesn’t scale on varying core counts 8

Data decomposition can scale to n-cores Core 9

Big loops are ideal cases for data decomposition // Loop through each AI for( int Index = 0; Index < g_NumAI; Index++ ) { // Update each AI for this frame g_AI[ Index ].Update(); } 10

Minimize interactions // Loop through each AI for( int Index = 0; Index < g_NumAI; Index++ ) { // Update each AI for this frame g_AI[ Index ].Update(); } AI 0AI 1 Set m_HP to 10 11

Minimize interactions // Loop through each AI for( int Index = 0; Index < g_NumAI; Index++ ) { // Update each AI for this frame g_AI[ Index ].Update(); } AI 0AI 1 Set m_HP to 10 12

Avoid locking // Loop through each AI for( int Index = 0; Index < g_NumAI; Index++ ) { // Update each AI for this frame g_AI[ Index ].Update(); } AI 0AI 1 Set m_HP to 10 13

Read global data, don’t write // Loop through each AI for( int Index = 0; Index < g_NumAI; Index++ ) { // Update each AI for this frame g_AI[ Index ].Update(); } 14

OpenMP is a great way to get started // Loop through each AI #pragma omp parallel for for( int Index = 0; Index < g_NumAI; Index++ ) { // Update each AI for this frame g_AI[ Index ].Update(); } Serial6 Core 1.00x2.31x Algorithm ~12.0x 15

The next step is to use tasks Core 16

The next step is to use tasks Core 17

The next step is to use tasks Core 18

The next step is to use tasks Core Needed for load balancing (avoid oversubscription) Support large chucks of work Better utilization of cache 19

Task can be used to parallelize complex problems Texture Lookup Data Parallelism Processing Setup 20

Tasks can be arranged in a dependency graph Texture Lookup Data Parallelism Processing Setup 21

Dependency graph can be mapped to a thread pool 22

Dependency graph can be mapped to a thread pool Core 23

Think of a task as a unit of work A task is a unit of work It’s run on a thread pool It runs to completion It has heavy penalties for blocking It’s an efficient way to avoid oversubscription They adapt to any number of threads/cores … regardless of CPU topology 24

// Update all AI void UpdateAI( float DeltaTime ) { for( int Index = 0; Index < g_NumAI; Index++ ) { // Update each AI for this frame g_AI[ Index ].Update(); } Data decomposition makes defining tasks easy 25

// Update all AI void UpdateAI( float DeltaTime ) { // Determine the number of AI tasks we want to create unsigned int AIGroups = g_NumAI / MAX_AI_PER_GROUP; for( unsigned int Index = 0; Index < AIGroups; Index++ ) { // Build the task specific data AITaskData* pData = new AITaskData(); pData->m_Start = Index * MAX_AI_PER_GROUP; pData->m_DeltaTime = DeltaTime; // Submit task SubmitTask( Task_UpdateAI, (void*)pData ); } Data decomposition makes defining tasks easy 26

void Task_UpdateAI( void* pTaskData ) { // Read data AITaskData* pData = (AITaskData*)pTaskData; unsigned int Start = pData->m_Start; unsigned int End = pData->m_Start + MAX_AI_PER_GROUP; // Gap End with max number of AI End = ( End > g_NumAI ) ? g_NumAI : End; // Loop through all of our AI and update for( unsigned int Index = Start; Index < End; Index++ ) { g_AI[ Index ].Update(); } // Cleanup delete pData; } Individual task are run by the thread pool 27

Intel Threading Building Blocks is a good for tasks Intel® Threading Building Blocks (Intel® TBB) has a low-level API to create and process trees of work – each node is a task. Root Task More Callback Spawn & Wait Root Task More Spawn Wait Blocking calls go down Continuations go up Root 28

Learn more about tasking… … or get Game Engine Gems 1 and read Brad Werth’s article. … or get Game Engine Gems 1 * and read Brad Werth’s article. Task-based Multithreading – How to Program for 100 Cores Presented by Ron Fosner Friday, March 4:30PM South * Other names and brands may be claimed as the property of others.

Time to look at our example… 30

Hotspots are good candidates for threading Use tools like Intel® Vtune™ and Intel®Parallel Studio to locate hotspots. 31

Hotspots are good candidates for threading Use tools like Intel® Vtune™ and Intel®Parallel Studio to locate hotspots. Intel® Parallel Studio inspector shows that Flock() is the main bottleneck. This is a good place to investigate threading. 32

Validate threading results with Parallel Amplifier

Use Parallel Amplifier to validate concurrency 34

Use Parallel Amplifier to validate concurrency We have “ideal” CPU utilization for Flocking. Now we can start looking for other hotspots to optimize. 35

Use Parallel Amplifier to validate concurrency We have “ideal” CPU utilization for Flocking. Now we can start looking for other hotspots to optimize. There is still a lot of serial code… 36

Use Parallel Inspector to find threading errors 37

Use Parallel Inspector to find threading errors 38

Use Parallel Inspector to find threading errors Have a lot of system memory Use a reduced data set Workload should be repeatable 39

Use other tools as needed… I like Intel® GPA Intel® Graphics Performance Analyzer is designed for games. System Analyzer gives a complete view of system resources (CPU, GPU, Bus) Frame Analyzer allows you to dive into a DX frame Platform View allow you to instrument code to analyze workload balance and execution time. 40

Conclusion Threading is required to maximize your game Use data decomposition to scale to n-cores Use tasks for load balancing and to be platform independent Use Intel tools to make your life easier Attend: “Task-based Multithreading – How to Program for 100 Cores” this Friday. 41

See Intel at GDC: Intel Booth at Expo, North Hall Intel Interactive Lounge Contact Information 42

Other Sessions A Visual Guide to Game and Task Performance on Mass-market PC Game Platforms Thursday, March 4:30PM North 122 Building Games for Netbooks Friday, March 9AM South 310 Simpler Better Faster Vector Friday, March 1:30PM North

Other Sessions Tuning Your Game for Next Generation Intel Graphics Friday, March 1:30PM South 302 Task-based Multithreading – How to Program for 100 Cores Friday, March 4:30PM South

Please fill out an evaluation form … it’ll help us win a bet Thank you

Legal Disclaimer  INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.  Intel may make changes to specifications and product descriptions at any time, without notice.  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.  Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.  Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.  Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries.  Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license  *Other names and brands may be claimed as the property of others.  Copyright © 2010 Intel Corporation.  INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.  Intel may make changes to specifications and product descriptions at any time, without notice.  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.  Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.  Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.  Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries.  Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license  *Other names and brands may be claimed as the property of others.  Copyright © 2010 Intel Corporation.