Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cristiana Amza University of Toronto. Once Upon a Time … … locks were painful for dynamic and complex applications …. e.g., Massively Multiplayer Games.

Similar presentations


Presentation on theme: "Cristiana Amza University of Toronto. Once Upon a Time … … locks were painful for dynamic and complex applications …. e.g., Massively Multiplayer Games."— Presentation transcript:

1 Cristiana Amza University of Toronto

2 Once Upon a Time … … locks were painful for dynamic and complex applications …. e.g., Massively Multiplayer Games e.g., Massively Multiplayer Games

3 Massively Multiplayer Games Support many concurrent players and Low update interval to players

4 So, game developers said … “Forget locks ! “Forget locks ! We’ll use our secret sauce !” We’ll use our secret sauce !”

5 State-of-the-art in Game Code Ad-hoc parallelization: segments/shards Ad-hoc parallelization: segments/shards e.g., World of Warcraft/ Ultima Online Sequential code, admission control Sequential code, admission control e.g., Quake III

6 Ad-hoc Partitioning (segments) Countries, rooms

7 Artificial Admission Control Admission control Gateways E.g., airports, doors

8 But, gamers said … ”We want to interact, and we hate lag !”

9 Problem with State-of-the-art Flocking Flocking Players move to one area e.g., quests Players move to one area e.g., quests Overload the server hosting the hotspot Overload the server hosting the hotspot

10 So I said … Forget painful locks ! Transactional Memory will make game developers and players happy ! Story endorsed by Intel (fall of 2006).

11 Our Goals Parallelize server code into transactions Easy to thread any game Dynamic load balance of tx on any platform e.g., clusters, multi-cores, mobile devices … Beats locks any day !

12 Ideal solution: Contiguous world Seamless partition Players can “see” across partition boundaries Players can smoothly transfer Regardless of game map

13 Challenge: On Multi-core Inter-thread conflicts Mostly at the boundary

14 Roadmap The game Parallelization Using TM Compiler code transformations for TM Runtime TM design choices Dynamic load balancing of tx in game

15 15 Game Benchmark (SimMud) Interactions: player - Obj, player - player Players can move and interact Food objects Terrain fixed, restricts movement

16 16 Game Benchmark (SimMud) Actions: move, eat, fight Quest: flocking of players to a spot on the game map

17 17 Flocking in SimMud S1 S3 S2 S4 Quest

18 Parallelization of Server Code Process Requests Form & Send Replies Rx Tx 3 1 2 Select Read-only phase Read-Write phase

19 Example: “Move” Request Move(){ region1->removePlayer( p ); region2->addPlayer( p ); }

20 Parallelize Move Request Insert “atomic” keyword in code Compiler makes it a transaction Ex:#pragma omp critical / __tm_atomic { region1->removePlayer( p ); region2->addPlayer( p ); }

21 Ex: SimMud Data Structure struct Region { int x, y; int x, y; int width, height; int width, height; set_t* players; set_t* players; set_t* objects; set_t* objects;}

22 Example Code for Action Move void movePlayer( Player* p, int new_x, int new_y ) { Region* r_old = getRegion( p->x, p->y ); Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y ); Region* r_new = getRegion( new_x, new_y ); if( isVacant_position( r_new, new_x, new_y ) ) if( isVacant_position( r_new, new_x, new_y ) ) { set_remove( r_old->players, p ); set_remove( r_old->players, p ); set_insert( r_new->players, p ); set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; }}

23 Manual Transformations (Locks) void movePlayer( Player* p, int new_x, int new_y ) { lock_player( p); lock_player( p); Region* r_old = getRegion( p->x, p->y ); Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y ); Region* r_new = getRegion( new_x, new_y ); lock_regions( r_old, r_new ); lock_regions( r_old, r_new ); if( isVacant_position( r_new, new_x, new_y ) ) if( isVacant_position( r_new, new_x, new_y ) ) { set_remove( r_old->players, p ); set_remove( r_old->players, p ); set_insert( r_new->players, p ); set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; } unlock_regions( r_old, r_new ); unlock_regions( r_old, r_new ); unlock_player( p->lock ); unlock_player( p->lock );}

24 Manual Transformations (TM) void movePlayer( Player* p, int new_x, int new_y ) { #pragma omp critical { #pragma omp critical { Region* r_old = getRegion( p->x, p->y ); Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y ); Region* r_new = getRegion( new_x, new_y ); if( isVacant_position( r_new, new_x, new_y ) ) if( isVacant_position( r_new, new_x, new_y ) ) { set_remove( r_old->players, p ); set_remove( r_old->players, p ); set_insert( r_new->players, p ); set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; } }}

25 My Story TM will make game developers and players happy ! So far, the developers should be ! So far, the developers should be !

26 It Gets Worse for Locks Move May impact objects within bounding box Short-range or long-range Lock all impacted objects need to search objects Top-view of world Short-range Long-range Objects

27 Each region corresponds to a leaf Top-view of World e.g., Quake III Area Node Tree 27

28 Each region corresponds to a leaf Lock all leaf nodes in bounding box atomically atomically Top-view of World Overlapping regions e.g., Quake III Area Node Tree 28

29 29 – Objects linked to leaf nodes – If cross leaf boundary, link to parent node Non-Overlapping regions Top-view of world Object lists Region leafs Objects cross boundary Area Node Tree – Even Worse !

30 30 – Need to lock parent nodes – False Sharing – The whole tree may be locked Non-Overlapping regions Top-view of world Object lists Region leafs Objects cross boundary Area Node Tree – Even Worse !

31 My Story TM will make game developers and players happy ! Lock down a whole box/tree, vs just read/write what you need in TM. Players should be happy too !

32 Compiler/Runtime TM Support Compiler Automatic source transformations to tx Runtime track accesses resolve conflicts between transactions adapt to application pattern

33 Manual Transformations (TM) void movePlayer( Player* p, int new_x, int new_y ) { i #pragma omp critical { #pragma omp critical { Region* r_old = getRegion( p->x, p->y ); Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y ); Region* r_new = getRegion( new_x, new_y ); if( isVacant_position( r_new, new_x, new_y ) ) if( isVacant_position( r_new, new_x, new_y ) ) { set_remove( r_old->players, p ); set_remove( r_old->players, p ); set_insert( r_new->players, p ); set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; } }}

34 Automatic Transformations (TM) void tm_movePlayer( tm_Player* p, int new_x, int new_y ) { Begin_transaction; Begin_transaction; tm_Region* r_old = tm_getRegion( p->x, p->y ); tm_Region* r_old = tm_getRegion( p->x, p->y ); tm_Region* r_new = tm_getRegion( new_x, new_y ); tm_Region* r_new = tm_getRegion( new_x, new_y ); if( tm_isVacant_position( r_new, new_x, new_y ) ) if( tm_isVacant_position( r_new, new_x, new_y ) ) { tm_set_remove( r_old->players, p ); tm_set_remove( r_old->players, p ); tm_set_insert( r_new->players, p ); tm_set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; } Commit_transaction; Commit_transaction;}

35 Automatic Transformations (TM) struct tm_Region { tm_int x, y; tm_int x, y; tm_int width, height; tm_int width, height; tm_set_t* players; //recursively re-type tm_set_t* players; //recursively re-type tm_set_t* objects; //nested structures tm_set_t* objects; //nested structures}

36 Compiler TM code translation #pragma  begin/end Re-type variables: tm_shared<> or tm_private<>

37 TM Runtime (libTM) Access Tracking: tm_type<> Operator overloading for intercepting reads and writes Access Granularity: basic-type level Conflict detection and resolution Several design choices

38 TM Conflict Resolution Choices Pessimistic Reader/Writer Locks Read Optimistic Only writer locks Fully Optimistic ~No locks Adaptive

39 Pessimistic A transaction (tx) locks an object before use Waits for locks held by other tx Releases all locks at the end

40 BEGINEND Reader-writer locks Reader lock excludes writers Writer lock excludes readers/writers

41 Read Optimistic Writers take locks, readers do not A write invalidates (aborts) all readers a) Encounter-time: at the write a) Encounter-time: at the write T1: BEGIN_TRANSACTION... WRITE A... COMMIT_TRANSACTION T2: BEGIN_TRANSACTION READ A... INVALID T3: BEGIN_TRANSACTION... READ A... INVALID

42 Read Optimistic T1: BEGIN_TRANSACTION... WRITE A... COMMIT_TRANSACTION T2: BEGIN_TRANSACTION READ A... COMMIT_TRANSACTION T3: BEGIN_TRANSACTION... READ A... INVALID Writers take locks, readers do not A write invalidates (aborts) all readers b) Commit-time: at commit

43 Fully Optimistic T1: BEGIN_TRANSACTION... WRITE A... COMMIT_TRANSACTION T2: BEGIN_TRANSACTION WRITE A... COMMIT_TRANSACTION T3: BEGIN_TRANSACTION... READ A... INVALID A write invalidates (aborts) all active readers, but supports multiple writers Commit-time: at commit

44 Implementation Details Meta-data kept with tm_shared<> var Lock, visible-readers set

45 Implementation Details Validation of each read Recoverability:Undo-loggingWrite-buffering Private thread data (needs to be searchable) Necessary for fully optimistic

46 Factors Determining Trade-offs Conflict type w-w conflicts favor fully optimistic Conflict-span long  domino-effect (no progress) for read optimistic

47 Evaluation of Design Trade-offs No. of threads: 4

48 Roadmap The game Parallelization Using TM Compiler code transformations for TM Runtime TM design choices Dynamic load balancing of tx in game

49 Parallel Server Phase Types Process Requests Form & Send Replies Rx Tx 3 1 2 Select Read-only phase Read-Write phase Load balancing

50 Dynamic Load Management Region: grid unit Dynamic load balancing Reassign regions from one server/thread to another

51 Conflicts vs Load Management Locality, fewer conflicts Keep adjacent regions on same thread Global reshuffle Block partition

52 Overload due to Quest

53 Reassign Load & Minimize Conflicts

54 Locality-Aware Load Balancing Locality-Aware Load Balancing SimMud game map with quest in upper left Recorded dynamic load balancing

55 55 Dynamic Load-balancing Algorithms Lightest Shed regions to lightest loaded thread Spread Best load spread across all threads Locality aware Keep nearby regions on same thread

56 Locality-aware (Quad-tree) Split task when: Load > thresh Reassign tasks: reduce conflicts reduce conflicts Can approximate !

57 Task Splitting A B C D E F G H IJ BCD AEF GHIJ

58 Task Re-assignment Assign tasks to reduce conflicts Keep Load < threshold T1 T0T2

59 59 Dynamic Load-balancing Algorithms All algorithms implemented on A cluster (single thread on each node) A multi-core (with multiple threads)

60 Results on Multi-core Load balancing algorithms: StaticLightestSpread Locality (Quad-tree) Metrics Number of clients per thread Border conflicts Client update latency

61 Thread Load on Multi-core

62 Border Conflicts on Multi-core

63 Client update latency on M-core

64 Conclusion Support for seamless world partitioning Compiler & Runtime parallelization support Tx much simpler than locks Locality aware dynamic load balancing Can apply in server clusters, P2P mobile environments and multi-cores

65 I need your help. “When TM first beat locks” is a good story I need a more sophisticated game to make the story happen !

66 Backup Slides

67 67 Client Update Latency on Cluster STATIC LOCALITY most loaded least loaded All dynamic load balancing algs - similar

68 68 Number of Player Migrations Locality aware has fewest migrations

69 Average Execution Time / Request (when App changes access pattern)

70 Trade-offs Private thread data Per-thread data copy overhead (-) Search private data on read (-) No need to restore data on abort (+) Allows multiple concurrent writers (+)

71 Trade-offs (contd) Private thread data Per-thread data copy overhead (-) Search private data on read (-) No need to restore data on abort (+) Allows multiple concurrent writers (+) Locks Aborts due to deadlock (-) No other aborts (+)

72 A WAN distributed server system Quest lasts during 0-1000 sec

73 TM code translation (cont.) Based on Omni OpenMP compiler

74 Average Execution Time / Request


Download ppt "Cristiana Amza University of Toronto. Once Upon a Time … … locks were painful for dynamic and complex applications …. e.g., Massively Multiplayer Games."

Similar presentations


Ads by Google