Presentation is loading. Please wait.

Presentation is loading. Please wait.

March 24 2005University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet.

Similar presentations


Presentation on theme: "March 24 2005University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet."— Presentation transcript:

1 March 24 2005University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet CS 7698

2 March 24 2005CS 7698 A Tale of Two Methods  Snooping based Uses totally ordered broadcasts to preserve correctness Uses lots of bandwidth Big (large busses) = BAD!  Directory based Uses indirection to preserve bandwidth Indirection adds latency Needs a directory controller

3 March 24 2005CS 7698 Potential work arounds Snooping  Snooping is fast, but requires a bus. Big fast busses are complex ->  Use a virtual bus to virtual broadcast! Directory  Networks require lots of logic (especially big ones) ->  Use glueless networks!

4 March 24 2005CS 7698 Token Coherence Provides for both indirection, and speed up through unordered broadcasts Two components:  Correctness substrate  Performance protocol

5 March 24 2005CS 7698 Correctness Speed is Good, Correctness is Better! Need to guarantee ordered reads/writes! Thus, use a correctness “substrate”

6 March 24 2005CS 7698 Correctness Invariants 1.At all times, each block has T tokens 2.A processor can only write a block if it holds all T tokens 3.A processor can read a block only if it holds at least one token 4.If a coherence message contains one or more tokens, it must contain data

7 March 24 2005CS 7698 Invariant 1 Implications Allows for precise control of blocks of data.

8 March 24 2005CS 7698 Invariant 2 Implications Enables write control mechanism to allow in order writes

9 March 24 2005CS 7698 Invariant 3 Implications Restricts reads

10 March 24 2005CS 7698 Invariant 4 Implications Provides a method to ensure cache coherence

11 March 24 2005CS 7698 Starvation Invariants allow of ordered reads/writes, but how do we prevent starvation? Persistent requests: 1.A processor times out on transient requests 2.Raises a persistent request (only one per block) 3.All nodes must forward blocks to the node But repeated & persistent requests only make up 1-3% of the messages

12 March 24 2005CS 7698 Persistent Request State Diagram

13 March 24 2005CS 7698 Performance protocol But if you always follow the rules, it can get slow and tedious! Tokens allow for unordered responses to requests. This opens the door for all sorts of optimizations

14 March 24 2005CS 7698 TokenB A New Contender Akin to MSI snooping protocol: Requests broadcast Data exists either in  Modified (All tokens)  Shared (Some tokens)  Invalid (No tokens) But: Performance protocol allows for better performance!

15 March 24 2005CS 7698 TokenB: Optimized Token Counting MSI was a bit of a lie, can optimize token counting by altering invariants 1,3,4: 1.At all times, each block has T tokens, one of which is the owner token 3.A processor can read a block only if it holds at least one token for that block and has valid data 4.If a coherence message contains the owner token, it must contain data

16 March 24 2005CS 7698 TokenB Continued The Good Stuff Performance in: Tokens allow replies to be sent unordered, and indirectly (no broadcast) This means: 15-28% faster than snooping 17-54% faster than directory 21-25% less bandwidth than snooping

17 March 24 2005CS 7698 An Example P1 reads then P2 writes then P1 reads Presume a 4 node systems, where P1 has an invalid copy, P2 has a shared copy, and P3 is the “home/owner” node

18 March 24 2005CS 7698 Example The Snooping Way P1 P2 P3 P4 1 2 3 4 5 All messages broadcast!

19 March 24 2005CS 7698 Example The Directory Way P1 P2 P3 P4 Directory 1 3 2 4 4 4 4 5 6 Directory process messages 1 3 4 5!

20 March 24 2005CS 7698 Example The Token Way P1 P2 P3 P4 1(broadcast) 2 3(broadcast) 4 4 4 5(broadcast) 6

21 March 24 2005CS 7698 Real world results Examined on a tree structure (virtual broadcast), and on a 2d torus Migratory optimization: a read request after a write is forwarded all tokens Benchmarked on OLTP, SPECjbb, Apache

22 March 24 2005CS 7698 Results Token vs Snooping: TOKEN Wins!

23 March 24 2005CS 7698 Results Directory vs Token: Token mostly wins!

24 March 24 2005CS 7698 Conclusion TokenB offers a good performance for small-middle sized parallel systems Broadcasts limits scalability past 16 nodes But other performance implementations could be scaled larger!


Download ppt "March 24 2005University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet."

Similar presentations


Ads by Google