Presentation is loading. Please wait.

Presentation is loading. Please wait.

DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips Andrew DeOrio †, Konstantinos Aisopos ‡§ Valeria Bertacco †, Li-Shiuan.

Similar presentations


Presentation on theme: "DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips Andrew DeOrio †, Konstantinos Aisopos ‡§ Valeria Bertacco †, Li-Shiuan."— Presentation transcript:

1 DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips Andrew DeOrio †, Konstantinos Aisopos ‡§ Valeria Bertacco †, Li-Shiuan Peh § DAC 2011 † University of Michigan ‡ Princeton University § Massachusetts Institute of Technology

2 Reliable Networks on Chip 2 R up $ processor cache router Detect if fault has occurred Diagnose what fault has occurred Recover and resume normal operation Reconfigure network to account for fault Drain fault-tolerant routing detectiondiagnosis recon- figuratio recovery nodes are disconnected, state is lost!

3 Recovery Approaches Checkpoint/recovery approaches Drain takes a reactive approach, incurring performance overhead only when errors occur 3 R uP $ checkpoint buffers data stuck in checkpoint buffer! R uP $ MEM high performance overhead!

4 Data Recovery with Drain Recover data lost during reconfiguration – Emergency links provide alternate path – Transfers cache contents and architectural state primary link Mem uP $ Router uP $ uP $ processor core local cache memory controller DRAIN emergency link 4

5 Drain Example up $ M $ $ $ 5

6 Drain Example up $ M $ $ $ X link failure 6

7 Drain Example up $ M $ $ $ 7 reconfigure interconnect

8 Drain Example up $ M $ $ $ X link failure 8

9 Drain Example up $ M $ $ $ 9 node isolated!

10 Drain Example up $ M $ $ $ drain connected nodes via primary links 10

11 Drain Example up $ M $ $ $ drain disconnected node via emergency link 11

12 Drain Example up $ M $ $ $ drain connected node again 12

13 Drain Example up $ M $ $ resume normal operation 13 up $

14 Drain Performance as Links Fail 14 increasing emergency link time decreasing functional network size

15 Memory Latency Before and After 15

16 Conclusions DRAIN is a lightweight recovery mechanism for CMPs – 5,000 gates per node Recoup cache data and architectural state from disconnected nodes Performance overhead only during recovery – ~3ms at 1GHz 16


Download ppt "DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips Andrew DeOrio †, Konstantinos Aisopos ‡§ Valeria Bertacco †, Li-Shiuan."

Similar presentations


Ads by Google