Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael.

Similar presentations


Presentation on theme: "Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael."— Presentation transcript:

1 Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm Presented by: Holly Grimes

2 Locality  In uniprocessors  Spatial Locality—access neighboring memory locations  Temporal Locality—access memory locations that where accessed recently  In multiprocessors, locality involves each processor using data from its own cache

3 Why is Locality Important?  Modern multiprocessors exhibit  Higher memory latency  Large write-sharing costs  Large cache lines (false sharing)  Larger system sizes  Large secondary caches  NUMA effects

4 Why is Locality Important?

5

6

7

8

9

10  Sharing the counter requires moving it back and forth between the CPU caches  Solution??  Split the counter into an array of integers  Each CPU gets its own counter

11 Achieving Locality

12

13

14  Using an array of counters seems to solve our problem…  But what happens if both array elements map to the same cache line?  False sharing  Solution:  Pad each array element  Different elements map to different cache lines

15 Comparing Counter Implementations

16 Why is Locality Important?  Modern multiprocessors exhibit  Higher memory latency  Large write-sharing costs  Large cache lines (false sharing)  Larger system sizes  Large secondary caches  NUMA effects

17 NUMA Effects

18

19 Achieving Locality in an OS  We’ve seen several ways to achieve locality in the implementation of a counter  Now we extend these concepts to see how locality can be achieved in an OS  Tornado’s Approach –  Built a new OS from the ground up  Make locality the primary design goal

20 Tornado’s Approach Locality Independence User Application Locality Independence OS Implementation Illustration by Philip Howard

21 Tornado’s Locality-Maximizing features  Object-oriented Design  Clustered Objects  A New Locking Strategy  Protected Procedure Calls

22 Object-oriented Design

23 Object-oriented Structure  Each OS resource is represented as a separate object  All locks and data structures are internal to the objects  localizes and encapsulates the locks and data  This structure allows different resources to be managed  without accessing shared data structures  without acquiring shared locks  Simplifies OS implementation  modular design

24 Example: Memory Management Objects HAT Process RegionFCM COR RegionFCM COR DRAM HATHardware Address Translation FCMFile Cache Manager CORCached Object Representative DRAMMemory manager Illustration by Philip Howard

25 Object-oriented Design  For each resource, this design provides one object to be shared by all CPUs  Comparable to having one copy of the counter shared among all CPUs  To maximize locality, something more needs to be done

26 Clustered Objects

27  Clustered objects are composed of a set of representative objects  There can be one rep for the system, one rep per processor, or one rep for a cluster of processors  Clients access a clustered object using a common reference to the object  Each call to an object using this reference is automatically directed to the appropriate rep  Clients do not need to know anything about the location or organization of the reps to use a clustered object  Impact on locality is similar to the effect of padded arrays on the counter

28 Clustered Objects

29 Keeping Clustered Objects Consistent  When a clustered object has multiple reps, there must be a way of keeping the reps consistent  Coordination between reps can happen via  Shared Memory  Protected Procedure Calls

30 Clustered Object Implementation  Each processor has a translation table  The table is located at the same virtual address in every processor  For each clustered object, the table contains a pointer to the rep that serves the given processor  A clustered object reference is just a pointer into this table  Reps for a clustered object are created dynamically when they are first accessed  Dynamic rep creation is dealt with by the global miss handler

31 Clustered Object Implementation

32 A New Locking Strategy

33 Synchronization  Two kinds of synchronization issues must be dealt with  Using locks to protect data structures  Ensuring the existence of needed data structures

34 Locks  Tornado uses spin-then-block locks to minimize the overhead of the lock/unlock instructions  Tornado limits lock contention by  Encapsulating the locks in objects to limit the scope of locks  Using clustered objects to provide multiple copies of a lock

35 Existence Guarantees  Traditional Approach: use locks to protect object references  Prevents races where one process destroys an object that another process is using  Tornado’s Approach: semi-automatic garbage collection  Garbage collection destroys a clustered object only when there are no more references to the object  When clients use an existing reference to access a clustered object, they have a guarantee that the referenced object still exists  References can be safely accessed without the use of (global) locks

36 Protected Procedure Calls

37 Interprocess Communication  Tornado uses Protected Procedure Calls (PPCs) to bring locality and concurrency to interprocess communication  A PPC is a call from a client object to a server object  Acts like a clustered object call that passes between the protection domains of the client and server processes  Advantages of PPC  Client requests are always serviced on their local processor  Clients and servers share the CPU in a manner similar to handoff scheduling  The server has one thread of control for each client request

38 Protected Procedure Calls

39 Performance

40 Conclusions  Tornado’s increased locality and concurrency has produced a scalable OS design for multiprocessors  This locality was provided by several key system features  An object-oriented design  Clustered objects  A new locking strategy  Protected procedure calls


Download ppt "Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael."

Similar presentations


Ads by Google