Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Coherence Method Using A Multicast Address Network

Similar presentations


Presentation on theme: "A New Coherence Method Using A Multicast Address Network"— Presentation transcript:

1 A New Coherence Method Using A Multicast Address Network
Multicast Snooping A New Coherence Method Using A Multicast Address Network

2 Outline Multicast Snooping View from Above Multicast Snooping Details
Experimental Implementation and Methodology Questions Posed by Multicast Snooping

3 View from Above Multicast Snooping: A Hybrid of Snooping and Broadcast
Performance benefits of snooping Scalability of broadcast Graceful degradation to broadcast as system grows Multicast Groups ala Networks Processor “guesses” peer(s) that need to see message, then multicasts to these targets Memory has directory to evaluate guesses, acts upon those that are wrong

4 View from Above, Continued
Advantages of Multicast Snooping When Multicast “Guess” is Right, Speed of Snooping Protocol Achieved with Greater Scaling Over Traditional Snooping Systems “Guessing” right isn’t that hard Net result is support of larger systems with snooping (better snooping scalability) When Multicast “Guess” is Wrong, Directory-Based Mechanisms Maintain Correctness Degradation as system scales to directory-like behavior

5 Outline Multicast Snooping View from Above Multicast Snooping Details
Experimental Implementation and Methodology Questions Posed by Multicast Snooping

6 Multicast Snooping Coherence
Logically Separate Multicast Address and Data Busses Authors Model Physical Separation for Simplicity MOSI Protocol Why MOSI? Why not MSI or MESI or MOESI? MOSI appears to have been chosen to enable a getx with an incomplete mask to transition the requestor’s block to O rather than to fully invalidate the transaction. This allows the first mask’s processors to act upon the transaction at its time of issuance. An upgrade transaction with the proper mask can then be issued, which avoids having the first mask’s processors needing to process two messages, the first of which is invalidated by the directory at memory. Clearly, having an E state is unfavorable for reasons previously discussed in class (E generally needs a single shared line that every processor must monitor).

7 Multicast Snooping Protocol
Broadcast-Like, Three Major Differences Coherence Transactions Carry Mask Mask: specification of which processors should receive the transaction; always includes source processor and memory that owns the requested block Memory Carries Simplified Directory Entry Verifies mask is correct, reacts appropriately If incorrect, sends correct mask back with semi-ack or nack Processor Actions Carry Additional Complexity Needed to support semi-acks/nacks on getx

8 Multicast Snooping Mask Prediction
Each Processor Maintains Local Mini-Directory Tracks Locality of Block Access, Last Invalidator, Arranged by Block Tag Builds Mask Using “StickySpatial(k)” Predictor Ors mask for block with masks for k-nearest neighbors in table Nearest neighbors may not be related blocks

9 Multicast Address Networks
For Now, Consider As A Cloud Notable Utilization of Fat-Tree Network (Recall CM-5) Important Properties For Supporting Multicast Networks Include: Total ordering, need not be simultaneous to all destinations Capable of multiple deliveries per cycle Low latency – avoid bottlenecks, exploit locality

10 Outline Multicast Snooping View from Above Multicast Snooping Details
Experimental Implementation and Methodology Questions Posed by Multicast Snooping

11 Performance Evaluation
Not the Focus of the Paper Preliminary Evidence Suggests that Further Detailed Evaluation of Multicast Snooping is Warranted Simulated 32-Processor CC-NUMA Used Wisconsin WWT II Simulator MSI only – pessimistic for Multicast Snooping Benchmarks Mainly Derived from SPLASH-2

12 Evaluating The Pieces Generated Traces Fed Through Mask Predictor
Prediction Accuracy Range: 73-95% Extra Nodes Predicted Still Leave Multicast Group Size Much Smaller than System Size Would these results scale with smaller system sizes? What’s affecting the number of extra nodes? Network Results Show Multiple Messages Per Cycle Possible (~50% of Optimal) The implications in the paper suggest that multicast group size is more of a factor of the mean number of sharers encountered by a coherence transaction. This suggests that the multicast traffic ratio (average number of multicast group members to perfect number of multicast group members) would be nearly fixed for a range of related system sizes under similar workloads. Where this would start to change is when the number of nodes goes below the perfect number, that is, there is more parallelism present than there are processors to exploit it. The authors suggest that there is additional opportunity to improve the mask predictor to reduce the multicast traffic ratio closer to the ideal (1.0).

13 Outline Multicast Snooping View from Above Multicast Snooping Details
Experimental Implementation and Methodology Questions Posed by Multicast Snooping

14 Questions To Consider Is Multicast Snooping A Good Idea?
If So, Would It Be Better In Some Scenarios Than In Others? If Not, Why Not? Is Multicast Snooping Optimal? Are the Noted Drawbacks Regarding the Evaluation Back-Breakers? In general, it appears that multicast snooping is a good idea in classes of systems where performance and scalability are important. The retention of snooping performance for larger-scale systems is attractive. It is unclear how much additional hardware would be added to a snoop-based system and whether this would become a significant cost factor. For very large systems running workloads that share data across nearly all processors, this would degrade to a directory-style system, which is reasonable. As for optimality, I believe this is an “it depends” issue, with dependencies upon workload, system size, and customer prioritization of performance, cost, and availability. The noted drawbacks need to be addressed in future work, particularly the approximations and lack of full timing simulation. This is not, in my opinion, a valid reason to dismiss multicast snooping altogether.


Download ppt "A New Coherence Method Using A Multicast Address Network"

Similar presentations


Ads by Google