Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Sciences Revisiting the Complexity of Hardware Cache Coherence and Some Implications Rakesh Komuravelli Sarita Adve, Ching-Tsun.

Similar presentations


Presentation on theme: "Department of Computer Sciences Revisiting the Complexity of Hardware Cache Coherence and Some Implications Rakesh Komuravelli Sarita Adve, Ching-Tsun."— Presentation transcript:

1 Department of Computer Sciences Revisiting the Complexity of Hardware Cache Coherence and Some Implications Rakesh Komuravelli Sarita Adve, Ching-Tsun Chou University of Urbana-Champaign, Intel

2 Department of Computer Sciences Motivation Today’s shared memory systems are more complex than ever – Implementing cache coherence protocols is a major challenge – Tens of transient states, hard to test races and add optimizations Have we tamed the protocol complexity yet? – Verified a state-of-the-art implementation of MESI from GEMS – Found six bugs even after 4+ years of usage worldwide Current verification techniques are insufficient – Scalable but hard to use or error prone (e.g. parametric verification) – Protocols designed for verification impact perf. (e.g., fractals) Are there any alternatives? 2

3 Department of Computer Sciences An alternative approach DeNovo: a h/w-s/w co-designed protocol [PACT 2011] – Assumes disciplined programming eliminating data races – Simple protocol, yet providing performance and power advantages Model checked DeNovo – Found three bugs: easy to fix; implementation errors – 15X fewer reachable states compared to MESI Focus of the talk: – Understand what makes hardware protocols complex – Experiences with verifying MESI and DeNovo 3

4 Department of Computer Sciences Motivation Understanding hardware protocol complexity Verification model and findings Conclusions Outline

5 Department of Computer Sciences Why are hardware protocols complex? Text book protocol for MESI 5 InvalidShared Exclusive Modified

6 Department of Computer Sciences Why are hardware protocols complex? Text book protocol for MESI In reality, the actual implementation is a lot more complex 6 InvalidShared Exclusive Modified Read i [sharers exist] Write k Read i Write i Read i [no sharers] Write i Read i Write k Read k Read i / Write i Write k Read k Write i

7 Department of Computer Sciences Why are hardware protocols complex? Text book protocol for MESI In reality, the actual implementation is a lot more complex 7 InvalidShared Exclusive Modified Read i [sharers exist] Write k Read i Write i Read i [no sharers] Write i Read i Write k Read k Read i / Write i Write k Read k Write i

8 Department of Computer Sciences Example transition for MESI 8 LL2/Directory Shared (SS)Transient_3L1Modified(MT) L1 P1 Invalid (I) Transient_2 Modified (M) Store On last Ack Shared (S) Invalid (I) Shared (S) Invalid (I) L1 P2 L1 Pn … … Invalidations GETX Acks Exclusive_Unblock Transient_1 Data Initial state at L1 Initial state at L2

9 Department of Computer Sciences Example transition for MESI One transition requires three transient states (total 21) Transient states ← Hardware races ← Software data races 9 LL2/Directory Shared (SS)Transient_3L1Modified(MT) L1 P1 Invalid (I) Transient_2 Modified (M) Store On last Ack Shared (S) Invalid (I) Shared (S) Invalid (I) L1 P2 L1 Pn … … Invalidations GETX Acks Exclusive_Unblock Transient_1 Data Initial state at L1 Initial state at L2

10 Department of Computer Sciences DeNovo with zero transient states Assumes data-race-free software – Completely eliminates transient states from the protocol Exploits s/w information for simple coherence enforcement Invalidate stale copies in private caches – Caches selectively self-invalidate entries not written by self – No sharers list Track up-to-date copy – Directory keeps track of one up-to-date copy 10 Invalid Valid Registered Read i Write i Read i, Write i Read k Read i Write k

11 Department of Computer Sciences Initial state at L1 Initial state at L2 Example transition for DeNovo Zero transient states => a simplified protocol 11 LL2/Directory Valid (V)Registered (R) L1 P1 Invalid (I) Store Valid (V) Invalid (I) Valid (V) Invalid (I) L1 P2 L1 Pn … Self- Invalidations Registration request Registered (R)

12 Department of Computer Sciences Motivation Understanding hardware protocol complexity Verification model and findings Conclusions Outline

13 Department of Computer Sciences Verification model Murφ model checking tool Verified DeNovo and MESI protocols – State-of-the art GEMS implementation Abstract model – Single address, two data values – Two cores with private L1 and unified L2, unordered n/w – Data-race-free assumption for DeNovo 13

14 Department of Computer Sciences Results Correctness – Six bugs in MESI protocol Two deadlock scenarios Unhandled races due to L1 writebacks Several days to fix 14

15 Department of Computer Sciences A MESI bug Complex to identify the cause and fix Required adding multiple new state transitions 15 LL2/Directory L1Modified (MT) Modified at:P1 Transient_2 L1Modified (MT) Modified at:P2 L1 P1 Modified (M) Store Invalid (I) L1 P2 Transient_1 2a Replacement L1 P1 PUTX GETX Transient_3 Fwd_GETX Invalid (I) DATA Modified (M) … ERROR!! Dangling message!! Exclusive_Unblock

16 Department of Computer Sciences Results Correctness – Six bugs in MESI protocol Two deadlock scenarios Unhandled races due to L1 writebacks Several days to fix and needed more transient states – Three bugs in DeNovo protocol Mistakes in translation from high level specification Simple to fix Complexity – 15x fewer reachable states for DeNovo – 20x faster to verify for DeNovo DeNovo is simpler and needs reduced verification effort 16

17 Department of Computer Sciences Scalability results Extended the base model – Two addresses instead of one DeNovo model finished without new bugs MESI model ran out of system memory (32GB) Need more scalable tools for non-experts 17

18 Department of Computer Sciences Conclusions Have we tamed the coherence protocol complexity yet? No! – 6 bugs in a state-of-the-art MESI protocol in use for 4+ years – Main source: transient states DeNovo: an alternative h/w-s/w co-designed approach – 3 easy-to-fix bugs in an immature protocol – Zero transient states MESI vs. DeNovo – DeNovo has 15X fewer reachable states, 20X faster to verify Easy-to-use verification tools are not scalable – Need better tools for non-experts 18


Download ppt "Department of Computer Sciences Revisiting the Complexity of Hardware Cache Coherence and Some Implications Rakesh Komuravelli Sarita Adve, Ching-Tsun."

Similar presentations


Ads by Google