Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering.

Similar presentations


Presentation on theme: "1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering."— Presentation transcript:

1 1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering

2 2 Multiplayer games Captivating, highly popular Dynamic artifacts

3 3 Multiplayer games - Long playing times: - More than 100k concurrent players

4 4 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” - Long playing times: - More than 100k concurrent players

5 5 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” 2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.” - Long playing times: - More than 100k concurrent players

6 6 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” 2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.” - Long playing times: - More than 100k concurrent players - Game server is the bottleneck

7 7 Server scaling Game code parallelization is hard  Complex and highly dynamic code  Concurrency issues (data races) require conservative synchronization  Deadlocks

8 8 State-of-the-art Parallel programming paradigms:  Lock-based (pthreads)  Transactional memory Previous parallelizations of Quake  Lock-based [Abdelkhalek et. al ‘04] shows that false sharing is a challenge

9 9 Transactional Memory vs. Locks Advantages  Simpler programming task  Transparently ensures correct execution Shared data access tracking Detects conflicts and aborts conflicting transactions Disadvantages  Software (STM) access tracking overheads

10 10 Transactional Memory vs. Locks Advantages  Simpler programming task  Transparently ensures correct execution Shared data access tracking Detects conflicts and aborts conflicting transactions Disadvantages  Software (STM) access tracking overheads Never shown to be practical for real applications

11 11 Contributions Case study of parallelization for games  synthetic version of Quake (SynQuake) We compare 2 approaches:  lock-based and STM parallelizations We showcase the first realistic application where STM outperforms locks

12 12 Outline Application environment: SynQuake game  Data structures, server architecture Parallelization issues  False sharing  Load balancing Experimental results Conclusions

13 13 Environment: SynQuake game Simplified version of Quake Entities:  players  resources (apples)  walls Emulated quests

14 14 SynQuake Players can move and interact (eat, attack, flee, go to quest) Apples Food objects, increase life Walls Immutable, limit movement Contains all the features found in Quake

15 15 Game map representation Fast retrieval of game objects Spatial data structure: areanode tree

16 16 Areanode tree Game map Areanode tree Root node

17 17 Areanode tree Game map Areanode tree AB

18 18 Areanode tree A B Game map Areanode tree AB

19 19 A1 A2 B1 B2 Areanode tree A B A1A2B1B2 Game map Areanode tree

20 20 Areanode tree A1A2B1B2 A B Game map Areanode tree A1 A2 B1 B2

21 21 Server frame 1 2 3 Barrier

22 22 Server frame Receive & Process Requests 1 2 3 Server frame Client requests

23 23 Server frame Receive & Process Requests 1 2 3 Server frame Admin (single thread) Client requests

24 24 Server frame Client requests Receive & Process Requests Form & Send Replies Client updates 1 2 3 Server frame Admin (single thread)

25 25 Parallelization in games Quake - Locks-based synchronization [Abdelkhalek et al. 2004]

26 26 Parallelization: request processing Client requests Receive & Process Requests Form & Send Replies Client updates 1 2 3 Server frame Admin (single thread) Parallelization in this stage

27 27 Outline Application environment: SynQuake game Parallelization issues  False sharing  Load balancing Experimental results Conclusions

28 28 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

29 29 Collision detection Player actions: move, shoot etc. Calculate action bounding box

30 30 Action bounding box P1 Short- Range

31 31 Action bounding box P1 P2 Short- Range Long- Range

32 32 Action bounding box P1 P2 P3

33 33 Action bounding box P1 P2 P3 Overlap P1 Overlap P2

34 34 Player assignment P1 P2 P3 T1 T3 Players handled by threads If players P1,P2,P3 are assigned to distinct threads → Synchronization required Long-range actions have a higher probability to cause conflicts T2

35 35 False sharing

36 36 False sharing Move range Shoot range

37 37 False sharing Action bounding box with locks Move range Shoot range

38 38 False sharing Action bounding box with TM Action bounding box with locks Move range Shoot range

39 39 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

40 40 Synchronization algorithm: Locks Hold locks on parents as little as possible Deadlock-free algorithm

41 41 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3

42 42 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest

43 43 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest Leaves overlapped

44 44 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest Leaves overlapped Lock parents temporarily

45 45 Synchronization: Locks vs. STM Locks: 1. Determine overlapping leaves (L) 2. LOCK (L) 3. Process L 4. For each node P in overlapping parents LOCK(P) Process P UNLOCK(P) 5. UNLOCK (L) STM: 1. BEGIN_TRANSACTION 2. Determine overlapping leaves (L) 3. Process L 4. For each node in P in overlapping parents Process P 5. COMMIT_TRANSACTION

46 46 Synchronization: Locks vs. STM Locks: 1. Determine overlapping leaves (L) 2. LOCK (L) 3. Process L 4. For each node P in overlapping parents LOCK(P) Process P UNLOCK(P) 5. UNLOCK (L) STM: 1. BEGIN_TRANSACTION 2. Determine overlapping leaves (L) 3. Process L 4. For each node in P in overlapping parents Process P 5. COMMIT_TRANSACTION STM acquires ownership gradually, reduced false sharing Consistency ensured transparently by the STM

47 47 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

48 48 Load balancing issues Assign tasks to threads Balance workload T1T2 T3 T4 P2 P1 P3 P4

49 49 Assign tasks to threads Cross-border conflicts → Synchronization T1T2 T3 T4 P2 P1 Move action Shoot action P3 P4 Load balancing issues

50 50 Load balancing goals Tradeoff:  Balance workload among threads  Reduce synchronization/true sharing

51 51 Load balancing policies a) Round-robin Y - Thread 3- Thread 4- Thread 1- Thread 2

52 52 Load balancing policies a) Round-robin Y - Thread 3- Thread 4- Thread 1- Thread 2 b) Spread

53 53 Load balancing policies c) Static locality-aware - Thread 3- Thread 4- Thread 1- Thread 2 b) Spread

54 54 Locality-aware load balancing Dynamically detect player hotspots and adjust workload assignments Compromise between load balancing and reducing synchronization

55 55 Dynamic locality-aware LB Game map Graph representation

56 56 Dynamic locality-aware LB Game map Graph representation

57 57 Experimental results Test scenarios Scaling:  with and without physics computation The effect of load balancing on scaling The influence of locality-awareness

58 58 128 256 384 640 768 896 1024 512 01282563846407688961024512 - Quest 1 X Y Quest scenarios

59 59 Quest scenarios - Quest 3 128 256 384 640 768 896 1024 512 01282563846407688961024512 - Quest 4 X Y - Quest 1- Quest 2

60 60 Scalability

61 61 Processing times – without physics

62 62 Processing times – with physics

63 63 Load balancing

64 64 Quest scenarios (4 quadrants) Y static dynamic - Thread 3- Thread 4- Thread 1- Thread 2

65 65 Quest scenarios (4 splits) staticdynamic - Thread 3- Thread 4- Thread 1- Thread 2

66 66 Quest scenarios (1 quadrant) staticdynamic - Thread 3- Thread 4- Thread 1- Thread 2

67 67 Locality-aware load balancing (locks)

68 68 Conclusions First application where STM outperforms locks:  Overall performance of STM is better at 4 threads in all scenarios  Reduced false sharing through on-the-fly collision detection Locality-aware load balancing reduces true sharing but only for STM

69 69 Thank you !

70 70 Splitting components (1 center quest)

71 71 Load balancing (short range actions)

72 72 Locality-aware load balancing (STM)


Download ppt "1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering."

Similar presentations


Ads by Google