Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.

Similar presentations


Presentation on theme: "Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University."— Presentation transcript:

1 Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University

2 2 Outline of Post-Silicon Tuning Introduction and Motivation Introduction and Motivation Problem Formulation Problem Formulation Algorithms Algorithms Experimental Results Experimental Results Conclusion Conclusion

3 3 Pre-Silicon Optimization Pre-silicon (i.e., design-time) statistical optimization Pre-silicon (i.e., design-time) statistical optimization –Determine the circuit parameters in design time –Apply the resulting design to all dies –Problems Hard to get accurate statistical variation model Hard to get accurate statistical variation model Each die has own specific parameter deviations, so the solution is not necessarily ideal for each die Each die has own specific parameter deviations, so the solution is not necessarily ideal for each die Large computation overhead Large computation overhead 50ps Deterministic DesignStatistical Design

4 4 Post-Silicon Tuning After fabrication, tune e.g., V dd, body voltage of gates. After fabrication, tune e.g., V dd, body voltage of gates. –Post-silicon tuning handles each die separately, compensate specific parameter deviations for each die. In design time In design time –What are the tuning ranges of gates? Tunability/overhead tradeoff Tunability/overhead tradeoff

5 5 Previous Works Logic signal tuning: body voltage tuning Logic signal tuning: body voltage tuning –Good tunability –Large overhead: DA converter and many control signals, applied to a circuit block Clock signal tuning: tunable clock buffer Clock signal tuning: tunable clock buffer –Small tunability –Small overhead: padding different loads to buffers

6 6 Logic Tuning and Clock Tuning FF Tune the body voltage Padding different load to clock buffers Clock

7 7 Example For Unified Adaptivity Optimization Target clock period = 10, yield target: 99% Target clock period = 10, yield target: 99% It is a zero-skew design with nominal delay shown It is a zero-skew design with nominal delay shown Each combinational path has 10% variation Each combinational path has 10% variation 10 FF 99 10

8 8 Worst Delay Due To Variations 11 FF 9.9 11 10% variations on each combinational logic Target clock period: 10

9 9 Logic Tuning 11 FF 9.9 11 10% variations on each combinational logic Target clock period: 10 Tuning body voltage of combinational logic blocks

10 10 Clock Tuning 11 FF 9.9 11 10% variations on each combinational logic Target clock period: 10 Skewing cannot make them simultaneously satisfy timing constraint 11 - skew at right buffer + skew at left buffer 11 + skew at right buffer - skew at left buffer

11 11 Unified Optimization - tuning logic and clock signal simultaneously FF 9.9 11 Skew = 1 10% variations on each combinational logic Target clock period: 10 Logic tuning

12 12 Observation Logic tuning only Logic tuning only –waste area Clock tuning only Clock tuning only –may not satisfy the yield target A unified approach can satisfy yield target with small overhead A unified approach can satisfy yield target with small overhead

13 13 Limitations of Previous Work Mostly restricted to continuous adaptivity optimization even when they only perform logic or clock signal tuning Mostly restricted to continuous adaptivity optimization even when they only perform logic or clock signal tuning –In practice, options are often discrete Assumption on variation distribution Assumption on variation distribution –Limited to Gaussian distribution, not always true in reality –If no such assumption, then depends on computationally expensive Monte Carlo simulation We seek to overcome the above limitations We seek to overcome the above limitations

14 14 Problem Given a sequential circuit, perform optimizations Given a sequential circuit, perform optimizations –the yield target can be achieved by post-silicon tuning on logic and clock signals –the overhead is minimized

15 15 Continuous Problem FF Continuous body voltage Continuous loads

16 16 Continuous Problem Formulation Minimize Overhead Minimize Overhead Subject to: Subject to: Long path constraint Long path constraint Short path constraint Short path constraint Tuning bound at each tunable element Tuning bound at each tunable element FF T 12 S1S1 S2S2 2 1 3

17 17 Robust Linear Programming Linear programming with random variables Linear programming with random variables Worst-case solution Worst-case solution –All S and T can simultaneously be the worst-case values. Robust solution Robust solution –Specify p ≤ total number of random variables –In the solution, at most random variables can be simultaneously the worst-case –Variations of the other random variables rely on p. –Degree of conservatism is controlled by a single parameter. Constraint violation probability (related to yield) is exponentially decreased with increase of p. Constraint violation probability (related to yield) is exponentially decreased with increase of p.

18 18 Linear Programming With Uncertainty Some coefficients are random variables Assume that we have j random variables

19 19 Soyster’s Worst Case Solution (I) a 11 is a random variable Deterministic constraint Guarantees the worst- case values

20 20 Soyster’s Worst Case Solution

21 21 Robust Solution (I)

22 22 Robust Solution (II) Additional variables.

23 23 Nominal-Case Design (P=0) q ij =0 Free to set Z i

24 24 Worst-Case Design (P=j)

25 25 Worst-Case Design (P=j)

26 26 Worst-Case Design (P=j)

27 27 Discretization In reality, tuning is allowed for some steps. In reality, tuning is allowed for some steps. Rounding from continuous solution Rounding from continuous solution –Rounding up continuous solution Increase tuning range more overhead Increase tuning range more overhead –Rounding down continuous solution Reduce tuning range not satisfying yield target Reduce tuning range not satisfying yield target –Nearest rounding not satisfy yield target not satisfy yield target waste area waste area

28 28 Our Approach Continuous solution Clock rounding Logic rounding Rounding by dynamic programming w/ fast pruning A set of solutions w/ discrete clock buffers For each solution, discretize body voltage for logic gates

29 29 Clock Rounding Larger tuning range Smaller tuning range

30 30 Solution Characterization and Solution Update Each candidate solution is associated with Each candidate solution is associated with –C: cumulative area overhead –Y: yield estimation Tunable clock buffer b is being processed, Tunable clock buffer b is being processed, –C is updated by the overhead of b –Y is computed by fast yield estimation

31 31 Fast Pruning For rounding up, no need to estimate the yield. For rounding up, no need to estimate the yield. For rounding down, sort solutions by C and perform yield estimation in a binary search fashion. For rounding down, sort solutions by C and perform yield estimation in a binary search fashion. When the solution set size reaches a threshold, pick top few solutions with smallest C. When the solution set size reaches a threshold, pick top few solutions with smallest C.

32 32 Logic Rounding Reducibility based discretization Reducibility based discretization –Body voltage tuning range of a block is rounded up Timing critical Timing critical Few gates are tunable Few gates are tunable Reducibility cost: total slack x number of gates Reducibility cost: total slack x number of gates

33 33 Batch Optimization Round up blocks with reducibiity cost < threshold and round down others If yield not satisfied, increase the threshold Start from small reducibility threshold Yield estimation is expensive

34 34 Monte Carlo Simulation (Yield Estimation)

35 35 Latin Hypercube Sampling Based Monte Carlo Simulation

36 36 Experimental Setup ISCAS’89 benchmark circuits ISCAS’89 benchmark circuits Pentium IV machine with 3.0G CPU and 2G memory Pentium IV machine with 3.0G CPU and 2G memory 130nm technology 130nm technology Timing yield target 99% Timing yield target 99% For continuous solution, compare to Logic optimization only and Clock optimization only For continuous solution, compare to Logic optimization only and Clock optimization only For discretization, compare to simple batch and nearest rounding approach For discretization, compare to simple batch and nearest rounding approach

37 37 Continuous Solution (Area) In many cases, optimizing clock signal alone cannot find feasible solutions satisfying yield constraint

38 38 Continuous Solution (Yield)

39 39 Continuous Solution (CPU in seconds)

40 40 Observations in Continuous Solution Unified optimization often saves >20% area over Logic optimization while having larger yield Unified optimization often saves >20% area over Logic optimization while having larger yield Clock optimization only cannot satisfy yield target for many circuits Clock optimization only cannot satisfy yield target for many circuits The algorithms run fast The algorithms run fast

41 41 Discretization (Area)

42 42 Discretization (Yield)

43 43 Discretization (CPU in seconds)

44 44 Observations in Discrete Solutions Nearest rounding cannot satisfy yield target (could be <90%). Nearest rounding cannot satisfy yield target (could be <90%). Simple batch is slow and solution quality is not good due to not being guided by continuous solution. Simple batch is slow and solution quality is not good due to not being guided by continuous solution. Our algorithm runs faster than Simple batch and saves >30% area. Our algorithm runs faster than Simple batch and saves >30% area.

45 45 Conclusion Unified adaptivity optimization on logical signal and clock signals shows advantage on cost-effectiveness Unified adaptivity optimization on logical signal and clock signals shows advantage on cost-effectiveness Provide both continuous and discrete solutions Provide both continuous and discrete solutions Use robust linear programming which does not depend on variation distribution Use robust linear programming which does not depend on variation distribution Computation acceleration techniques, e.g., accelerated dynamic programming, batch-based optimization, Latin Hypercube sampling based fast simulation, are used Computation acceleration techniques, e.g., accelerated dynamic programming, batch-based optimization, Latin Hypercube sampling based fast simulation, are used Our algorithm can be used for optimizing logic or clock signal separately while still having the above advantages Our algorithm can be used for optimizing logic or clock signal separately while still having the above advantages


Download ppt "Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University."

Similar presentations


Ads by Google