1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
Colyseus: A Distributed Architecture for Online Multiplayer Games
Advertisements

QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory Vladimir Gajinov 1,2, Ferad Zyulkyarov 1,2,Osman S. Unsal 1, Adrián Cristal.
COM vs. CORBA.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Omid Efficient Transaction Management and Incremental Processing for HBase Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution.
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Peer-to-Peer Support for Massively Multiplayer Games Bjorn Knutsson, Honghui Lu, Wei Xu, Bryan Hopkins Presented by Mohammed Alam (Shahed)
How do games work? Game Workshop July 4, Parts Sprites/pictures Map/background Music/sounds Player character Enemies Objects.
Distributed Systems 2006 Styles of Client/Server Computing.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Mercury: Scalable Routing for Range Queries Ashwin R. Bharambe Carnegie Mellon University With Mukesh Agrawal, Srinivasan Seshan.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Assets and Dynamics Computation for Virtual Worlds.
1 of 14 1 Scheduling and Optimization of Fault- Tolerant Embedded Systems Viacheslav Izosimov Embedded Systems Lab (ESLAB) Linköping University, Sweden.
Task Based Execution of GPU Applications with Dynamic Data Dependencies Mehmet E Belviranli Chih H Chou Laxmi N Bhuyan Rajiv Gupta.
Chapter 3.6 Game Architecture. 2 Overall Architecture The code for modern games is highly complex With code bases exceeding a million lines of code, a.
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,
Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.
Chapter 1 The Challenges of Networked Games. Online Gaming Desire for entertainment has pushed the frontiers of computing and networking technologies.
Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Cristiana Amza University of Toronto. Once Upon a Time … … locks were painful for dynamic and complex applications …. e.g., Massively Multiplayer Games.
Discovering and Understanding Performance Bottlenecks in Transactional Applications Ferad Zyulkyarov 1,2, Srdjan Stipic 1,2, Tim Harris 3, Osman S. Unsal.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
COM vs. CORBA Computer Science at Azusa Pacific University September 19, 2015 Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department.
Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU) 1DISC 2010.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
JVSTM and its applications João Software Engineering Group.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
A Survey on Optimistic Concurrency Control CAI Yibo ZHENG Xin
Dynamic Scenes Paul Arthur Navrátil ParallelismJustIsn’tEnough.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
A Grid-enabled Multi-server Network Game Architecture Tianqi Wang, Cho-Li Wang, Francis C.M.Lau Department of Computer Science and Information Systems.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
ECE 1747: Parallel Programming Short Introduction to Transactions and Transactional Memory (a.k.a. Speculative Synchronization)
Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet.
Workshop on Transactional Memory 2012 Walther Maldonado Moreira University of Neuchâtel (UNINE), Switzerland Pascal Felber UNINE Gilles Muller INRIA, France.
Introduction to threads
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Web Server Load Balancing/Scheduling
Lecture 1 – Parallel Programming Primer
Web Server Load Balancing/Scheduling
PHyTM: Persistent Hybrid Transactional Memory
Parallel Programming By J. H. Wang May 2, 2017.
Effective Data-Race Detection for the Kernel
L21: Putting it together: Tree Search (Ch. 6)
EE 193: Parallel Computing
Lecture 19: Transactional Memories III
Introduction to Databases Transparencies
Database System Architectures
Presentation transcript:

1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering

2 Multiplayer games Captivating, highly popular Dynamic artifacts

3 Multiplayer games - Long playing times: - More than 100k concurrent players

4 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” - Long playing times: - More than 100k concurrent players

5 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” 2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.” - Long playing times: - More than 100k concurrent players

6 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” 2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.” - Long playing times: - More than 100k concurrent players - Game server is the bottleneck

7 Server scaling Game code parallelization is hard  Complex and highly dynamic code  Concurrency issues (data races) require conservative synchronization  Deadlocks

8 State-of-the-art Parallel programming paradigms:  Lock-based (pthreads)  Transactional memory Previous parallelizations of Quake  Lock-based [Abdelkhalek et. al ‘04] shows that false sharing is a challenge

9 Transactional Memory vs. Locks Advantages  Simpler programming task  Transparently ensures correct execution Shared data access tracking Detects conflicts and aborts conflicting transactions Disadvantages  Software (STM) access tracking overheads

10 Transactional Memory vs. Locks Advantages  Simpler programming task  Transparently ensures correct execution Shared data access tracking Detects conflicts and aborts conflicting transactions Disadvantages  Software (STM) access tracking overheads Never shown to be practical for real applications

11 Contributions Case study of parallelization for games  synthetic version of Quake (SynQuake) We compare 2 approaches:  lock-based and STM parallelizations We showcase the first realistic application where STM outperforms locks

12 Outline Application environment: SynQuake game  Data structures, server architecture Parallelization issues  False sharing  Load balancing Experimental results Conclusions

13 Environment: SynQuake game Simplified version of Quake Entities:  players  resources (apples)  walls Emulated quests

14 SynQuake Players can move and interact (eat, attack, flee, go to quest) Apples Food objects, increase life Walls Immutable, limit movement Contains all the features found in Quake

15 Game map representation Fast retrieval of game objects Spatial data structure: areanode tree

16 Areanode tree Game map Areanode tree Root node

17 Areanode tree Game map Areanode tree AB

18 Areanode tree A B Game map Areanode tree AB

19 A1 A2 B1 B2 Areanode tree A B A1A2B1B2 Game map Areanode tree

20 Areanode tree A1A2B1B2 A B Game map Areanode tree A1 A2 B1 B2

21 Server frame Barrier

22 Server frame Receive & Process Requests Server frame Client requests

23 Server frame Receive & Process Requests Server frame Admin (single thread) Client requests

24 Server frame Client requests Receive & Process Requests Form & Send Replies Client updates Server frame Admin (single thread)

25 Parallelization in games Quake - Locks-based synchronization [Abdelkhalek et al. 2004]

26 Parallelization: request processing Client requests Receive & Process Requests Form & Send Replies Client updates Server frame Admin (single thread) Parallelization in this stage

27 Outline Application environment: SynQuake game Parallelization issues  False sharing  Load balancing Experimental results Conclusions

28 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

29 Collision detection Player actions: move, shoot etc. Calculate action bounding box

30 Action bounding box P1 Short- Range

31 Action bounding box P1 P2 Short- Range Long- Range

32 Action bounding box P1 P2 P3

33 Action bounding box P1 P2 P3 Overlap P1 Overlap P2

34 Player assignment P1 P2 P3 T1 T3 Players handled by threads If players P1,P2,P3 are assigned to distinct threads → Synchronization required Long-range actions have a higher probability to cause conflicts T2

35 False sharing

36 False sharing Move range Shoot range

37 False sharing Action bounding box with locks Move range Shoot range

38 False sharing Action bounding box with TM Action bounding box with locks Move range Shoot range

39 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

40 Synchronization algorithm: Locks Hold locks on parents as little as possible Deadlock-free algorithm

41 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3

42 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest

43 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest Leaves overlapped

44 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest Leaves overlapped Lock parents temporarily

45 Synchronization: Locks vs. STM Locks: 1. Determine overlapping leaves (L) 2. LOCK (L) 3. Process L 4. For each node P in overlapping parents LOCK(P) Process P UNLOCK(P) 5. UNLOCK (L) STM: 1. BEGIN_TRANSACTION 2. Determine overlapping leaves (L) 3. Process L 4. For each node in P in overlapping parents Process P 5. COMMIT_TRANSACTION

46 Synchronization: Locks vs. STM Locks: 1. Determine overlapping leaves (L) 2. LOCK (L) 3. Process L 4. For each node P in overlapping parents LOCK(P) Process P UNLOCK(P) 5. UNLOCK (L) STM: 1. BEGIN_TRANSACTION 2. Determine overlapping leaves (L) 3. Process L 4. For each node in P in overlapping parents Process P 5. COMMIT_TRANSACTION STM acquires ownership gradually, reduced false sharing Consistency ensured transparently by the STM

47 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

48 Load balancing issues Assign tasks to threads Balance workload T1T2 T3 T4 P2 P1 P3 P4

49 Assign tasks to threads Cross-border conflicts → Synchronization T1T2 T3 T4 P2 P1 Move action Shoot action P3 P4 Load balancing issues

50 Load balancing goals Tradeoff:  Balance workload among threads  Reduce synchronization/true sharing

51 Load balancing policies a) Round-robin Y - Thread 3- Thread 4- Thread 1- Thread 2

52 Load balancing policies a) Round-robin Y - Thread 3- Thread 4- Thread 1- Thread 2 b) Spread

53 Load balancing policies c) Static locality-aware - Thread 3- Thread 4- Thread 1- Thread 2 b) Spread

54 Locality-aware load balancing Dynamically detect player hotspots and adjust workload assignments Compromise between load balancing and reducing synchronization

55 Dynamic locality-aware LB Game map Graph representation

56 Dynamic locality-aware LB Game map Graph representation

57 Experimental results Test scenarios Scaling:  with and without physics computation The effect of load balancing on scaling The influence of locality-awareness

Quest 1 X Y Quest scenarios

59 Quest scenarios - Quest Quest 4 X Y - Quest 1- Quest 2

60 Scalability

61 Processing times – without physics

62 Processing times – with physics

63 Load balancing

64 Quest scenarios (4 quadrants) Y static dynamic - Thread 3- Thread 4- Thread 1- Thread 2

65 Quest scenarios (4 splits) staticdynamic - Thread 3- Thread 4- Thread 1- Thread 2

66 Quest scenarios (1 quadrant) staticdynamic - Thread 3- Thread 4- Thread 1- Thread 2

67 Locality-aware load balancing (locks)

68 Conclusions First application where STM outperforms locks:  Overall performance of STM is better at 4 threads in all scenarios  Reduced false sharing through on-the-fly collision detection Locality-aware load balancing reduces true sharing but only for STM

69 Thank you !

70 Splitting components (1 center quest)

71 Load balancing (short range actions)

72 Locality-aware load balancing (STM)