Presentation is loading. Please wait.

Presentation is loading. Please wait.

UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design.

Similar presentations


Presentation on theme: "UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design."— Presentation transcript:

1 UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism Marc de Kruijf Shuou Nomura Karu Sankaralingam

2 DSN 2010 - 2 From Hard to Harder 45nm & beyond 90nm 180nm 360nm 720nm 1500um 4000um 10000nm HardHarder

3 DSN 2010 - 3 What is the Problem?  Non-ideal transistor scaling  Transistor wear-out  Process, voltage, and temperature (PVT) variations  Errors due to particle interference  Noise coupling & crosstalk

4 DSN 2010 - 4 What is the Problem? DMR Timing speculation RMT HW checkpoints TMR ECC Watchdog Dynamic verification Multi-core Coherence & consistency On-chip network Out-of-order Branch prediction Performance ToolboxReliability Toolbox NEED HIGH-LEVEL ANALYSIS TOOLS

5 DSN 2010 - 5 Our Contribution Also…. Q.What is the impact of technology scaling? A.Further benefits are small to none. Q. What is the impact of CMOS design style? A.Very low power designs benefit most. Q.What is the impact of the fault recovery mechanism? A.Fine-grained recovery is key to high efficiencies. A model for timing speculation Unifies hardware + system Small set of high-level inputs processor designer

6 DSN 2010 - 6 Outline  Timing Speculation  Model Overview  Hardware Efficiency Model  System Recovery Model  Results  Conclusion

7 DSN 2010 - 7 Timing Speculation … clock circuit delay clock period( = 1/frequency ) Timing failure! variations OK! slower clock detect & recover …

8 DSN 2010 - 8 Outline  Timing Speculation  Model Overview  Hardware Efficiency Model  System Recovery Model  Results  Conclusion

9 DSN 2010 - 9 Model Overview Error rate Time Hardware Efficiency System RecoveryOverall Efficiency Energy Model Inputs 1. A hardware path delay distribution 2. Effect of variations on path delay as N(μ,σ) 3. The time between recovery checkpoints 4.The time to restore a checkpoint

10 DSN 2010 - 10 Hardware Efficiency Model # Paths Path delay Error prob. Clock period Error rate Energy Input 1: Path delay distribution Input 2: Path delay variation (σ) Error prob. Clock period Error prob. Energy Error prob. … … e.g. frequency scaling

11 DSN 2010 - 11 System Recovery Model System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2.The time to restore a checkpoint (restore) overhead(rate) =failures(rate) xwaste(rate)+ restore ( ) Error rate Time (applies to all backward error recovery systems)

12 DSN 2010 - 12 Outline  Timing Speculation  Model Overview  Hardware Efficiency Model  System Recovery Model  Results  Conclusion

13 DSN 2010 - 13 Results High Performance CMOS Low Power CMOS Ultra-low Power CMOS Razor Reunion Paceline 11nm 45nm Is the model useful? What can we learn? CMOS Design Style Technology Node Recovery System

14 DSN 2010 - 14 Results Error rate Time Hardware Efficiency System RecoveryOverall Efficiency Energy

15 DSN 2010 - 15 Hardware Model Inputs 1.Path delay distribution  Application: H.264 decoding  Hardware: OpenRISC processor 2.Effect of process variations as N(μ,σ) using ITRS data  High Performance CMOS  45nm σ = 0.046μ  11nm σ = 0.051μ  Low Power CMOS  45nm σ = 0.029μ  11nm σ = 0.042μ  Ultra-low Power CMOS  45nm σ = 0.196μ

16 DSN 2010 - 16 Hardware Efficiency Error rate Energy Results for High Performance CMOS EDP Energy = Power x Time EDP = Power x Time 2 Normalized EDP Error rate

17 DSN 2010 - 17 Recovery Model Inputs 1.The time between recovery checkpoints & 2.The time to restore a checkpoint  Razor  Latch-level detection + pipeline rollback  1 cycle checkpoint size & 5 cycle recovery cost  Reunion  DMR detection + checkpoint  100 cycle checkpoint size & 100 cycle recovery cost  Paceline  DMR detection + checkpoint + flush  100 cycle checkpoint size & 1000 cycle recovery cost

18 DSN 2010 - 18 System Recovery Error rate Time Normalized Time Error rate

19 DSN 2010 - 19 Overall Efficiency Error rate EDP 1. High Performance CMOS 2. Low Power CMOS 3. Ultra-low Power CMOS

20 DSN 2010 - 20 Normalized EDP Overall Efficiency High Performance CMOS 2 3 % P E A K, 8 - 1 5 % T Y P I C A L Error rate

21 DSN 2010 - 21 Overall Efficiency Low Power CMOS 1 8 % P E A K, 5 - 1 0 % T Y P I C A L Error rate Normalized EDP

22 DSN 2010 - 22 Normalized EDP Overall Efficiency Ultra-low Power CMOS 4 7 % P E A K, 2 0 - 3 0 % T Y P I C A L Error rate

23 DSN 2010 - 23 Outline  Timing Speculation  Model Overview  Hardware Efficiency Model  System Recovery Model  Results  Conclusion

24 DSN 2010 - 24 Conclusions  A High-level Model  Results  Efficiency gains improve only minimally with scaling  Ultra-low power (sub-threshold) CMOS benefits most  Fine-grained recovery is key  Future Work  Incorporate more sources of variation  A tool for processor designers?  Under development at http://www.cs.wisc.edu/verticalhttp://www.cs.wisc.edu/vertical

25 DSN 2010 - 25 Timing speculation Multi-core Coherence & consistency On-chip network Out-of-order Branch prediction Questions?

26 DSN 2010 - ‹#› ?

27 DSN 2010 - 27 Timing Speculation Manufacturing Process RuntimeApplication Source of Timing Variation Speed Binning Online Timing Analysis Timing Speculation Figure adapted from Greskamp et al., Paceline: [...]. In PACT ’07.

28 DSN 2010 - 28 expected # cycles executed upon failure System Recovery Model System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2.The time to restore a checkpoint (restore) expected # failures before success

29 DSN 2010 - 29 Overall Inputs 1.Path delay distribution  Application: H.264 decoding  Hardware: OpenRISC processor 2.Effect of process variations on path delay as N(μ,σ) using ITRS data  High Performance CMOS@45nmσ = 0.046μ  Low Power CMOS@45nmσ = 0.029μ  Ultra-low Power CMOS @45nmσ = 0.196μ 3.The time between recovery checkpoints & 4.The time to restore a checkpoint  Razor – Latch-level detection + pipeline rollback(1 & 5 cycles)  Reunion – DMR detection + checkpoint(100 & 100 cycles)  Paceline – DMR detection + checkpoint + flush(100 & 1000 cycles)


Download ppt "UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design."

Similar presentations


Ads by Google