Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Chakrabarti.

Similar presentations


Presentation on theme: "11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Chakrabarti."— Presentation transcript:

1 11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Chakrabarti 2, Scott Mahlke 1, David Blaauw 1, Trevor Mudge 1 University of Michigan 1, Arizona State University 2

2 22 2 Near Threshold Computing  Super Threshold  high performance  high energy consumption  Near Threshold  10x energy reduction  10x performance degradation  Sub Threshold  exponentially decreasing performance  increasing leakage becomes dominant 2

3 33 3 Near-threshold Computing  Advantage: High energy efficiency  Disadvantage  Low performance throughput  Compensated with very wide SIMD architecture  Sensitive to variations in threshold voltage  More critical issues in wide SIMD architectures  Increased probability of timing errors  Expensive error recovery mechanisms 3

4 44 4 Near-threshold Computing  Advantage: High energy efficiency  Disadvantage  Low performance throughput  Compensated with very wide SIMD architecture  Sensitive to variations in threshold voltage  More critical issues in wide SIMD architectures  Increased probability of timing errors  Expensive error recovery mechanisms  How bad is the delay variation in wide SIMD architectures running at near-threshold voltages?  How to mitigate the variation-induced timing errors? 4

5 55 5 Delay Variations in 90nm 5 ~ 2.3x ~1.6x  Uncorrelated variations are averaged out over the chain.

6 66 6 Delay Variations – f(Vdd=0.55V, N) 6  A long chain helps, but the effect diminishes as N increases.  Variations are exacerbated with technology scaling.

7 77 7 Delay Variations – f(Vdd, N=50) 7 LER causes high variations in advanced technology nodes Strict Design Rules Metal-Gates w/ high-k material or SOI Advanced lithography

8 88 8 Delay Distribution – 90nm GP 8  1 critical path delay = delay of a chain of 50 FO4 inverters.  1-wide system delay = max (delays of 100 critical paths )  128-wide system delay = max (delays of 128 1-wide system) Performance Drop

9 99 9 Variation Effects on 128-wide SIMD Architecture 9 - Structural Duplication - Voltage margining - Frequency margining

10 10 Near-threshold Wide SIMD Architecture: Diet SODA 10 [Seo et al. ISLPED 2010 ]

11 11 Structural Duplication 11 SIMD Function Unit #7 SIMD Function Unit #6 SIMD Function Unit #5 SIMD Function Unit #4 SIMD Function Unit #3 SIMD Function Unit #2 SIMD Function Unit #1 SIMD Function Unit #0 SIMD Function Unit #9 SIMD Function Unit #8 Crossbar Datapath#7 Datapath#6 Datapath#5 Datapath#4 Datapath#3 Datapath#2 Datapath#1 Datapath#0 8-wide+2-spare system  Increase number of processing resources

12 12 Structural Duplication 12 SIMD Function Unit #7 SIMD Function Unit #6 SIMD Function Unit #5 SIMD Function Unit #4 SIMD Function Unit #3 SIMD Function Unit #2 SIMD Function Unit #1 SIMD Function Unit #0 SIMD Function Unit #9 SIMD Function Unit #8 Crossbar Datapath#6 Datapath#5 Datapath#4 Datapath#3 Datapath#2 Datapath#1 Datapath#0 8-wide+2-spare system  Use the spares if required.

13 13 Structural Duplication – 90nm GP 13  6 spares are required to match the chip delay of baseline architecture.

14 14 Voltage Margining 14 Delay distributions: 45nm PTM model is used  Increase supply voltage

15 15 Frequency Margining  Increase clock period  Applicable for applications with relaxed time constraints  For advanced technology nodes, this is impractical  Caveat  Consider its impact on system  SIMD subsystem clock period (Tclk@NTV)  memory subsystem clock period (Tclk@FV) 15

16 16 Structural Duplication vs. Voltage Margining 16

17 17 Combination of two schemes – 45nm GP 17 128-wide system @ 0.6V 26 spares17mV boost5mV + 8 spares10mV + 2 spares

18 18 Variation-Aware Diet SODA 18

19 19 Conclusions  Near-threshold operation of wide SIMD system can have timing problems due to process variations.  Variation effects on a 128-wide SIMD architecture are marginal for 90nm technology node, but could be non- negligible for current/future technology nodes.  A combination of structural duplication and voltage margining provides a minimal power overhead solution to mitigate variation-induced timing problems in wide SIMD architectures. 19

20 20 Questions?  Thank you! 20

21 21 Backup Slides 21

22 22 Local Spares vs. Global Spares 22 Local Sparing 1 out of 4 (2 spares) Global Sparing (2 spares) + small overhead - burst errors + burst errors - Large overhead

23 23 Local Spares vs. Global Spares 23  Global sparing is better than local sparing.  XRAM crossbar supports global sparing. 128 + 8 global spares 128 + 32 local spares (1 out of 4)

24 24 Variation-Aware Diet SODA 24  With little area and power overhead, delay variations can be solved.


Download ppt "11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Chakrabarti."

Similar presentations


Ads by Google