Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay.

Similar presentations


Presentation on theme: " Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay."— Presentation transcript:

1  Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay estimator (Ptrace)  Optimization result – Reduce energy delay product by 18.4% and area by 23% – LUT size 5 provides the maximum power and delay combined yield  Target function driven component analysis (FCA) –Given a target function f(X 1, X 2 ) –Find out the linear decomposition matrix W to minimize the error mean, variance, and skewness of f(·) when ignoring high order dependence –FCA has the same complexity as PCA and ICA but more accurate  Approximate max operation using second order polynomial  Works for all three delay models  More efficient and accurate than that using Fourier series – 20X faster than that using Fourier Series – Computational complexity  O(n 3 ) for quadratic delay model  O(n) for others  Within 2% error compared to MC simulation  Max operation using Fourier series approximation –Approximate PDF of variation sources by Fourier Series –Apply moment matching to reconstruct the canonical form of max operation  All operations are based on either close form formulae or lookup table – Computational complexity O(nK 2 )  Only works for linear and semi- quadratic delay model  Within 5% error compared to MC simulation For the CMOS technology scaling, process variation has become a potential show-stopper if not appropriately handled. These variations introduce significant uncertainty for both circuit performance and leakage power. Statistical modeling, analysis, and optimization for VLSI circuits has thus become the frontier research topic in recent years in combating such variation effects. As the process advances to nanometer technologies and low-energy embedded applications are explored for FPGAs, power consumption becomes a crucial design constraint for FPGAs. It is well known architecture and device setting have great impact on FPGA power and performance. However, how to perform statistical optimization, considering both device and architecture has not been solved by previous works. In addition, some reliability issues, such device aging and soft error rate (SER) may affect the performance of FPGAs. Such impact was not considered in the previous works either. Besides FPGAs, statistical modeling and analysis for ASICs are also hot research topics. There are many works on statistical timing and power modeling and analysis. However, how to efficiently perform statistical static timing analysis (SSTA) for non-linear delay model with non-Gaussian variation sources is still a hard problem. Moreover, most of statistical analysis assumes independent variation sources and apply principle component analysis (PCA) or independent component analysis (ICA) to decompose dependent variation sources. However, some of the variation sources are non-linearly dependent, such as Leff and Vth. In this case, the linear operation (such as PCA or ICA) cannot completely remove dependence. How to handle the non-linear dependent variation sources is another unsolved problem. Spatial correlation is another concern in statistical analysis. Many recent works try to model spatial correlation as a function of distance. However, some recent research work observe that the spatial correlation mainly comes from the deterministic across wafer variation and the pure random spatial variation is not significant. Modeling across wafer variation is also a challenge problem. Ph.D.’09: Statistical Modeling and Optimization for VLSI Circuits Student: Lerong Cheng (lerong@ee.ucla.edu) Advisor: Lei He Co-advisor: Puneet Gupta EDA Lab (http://eda.ee.ucla.edu), Electrical Engineering Department, UCLA L. Cheng, P. Wong, F. Li, Y. Lin, and L. He, “Device and Architecture Co-Optimization for FPGA Power Reduction,” DAC, 2005. P.Wong, L. Cheng, Y. Lin, and L. He, “FPGA Device and Architecture Evaluation Considering Process Variation,” ICCAD, 2005. L. Cheng, J. Xiong, and L. He, “FPGA Performance Optimization via Chipwise Placement Considering Process Variations,” FPL, 2006. L. Cheng, J. Xiong, and L. He,.Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources,. DAC, 2007. L. Cheng, J. Xiong, and L. He, “Non-Gaussian Statistical Timing Analysis Using Second-Order Polynomial Fitting,” ASPDAC, 2008. L. Cheng, Y.Lin, L. He, and Y. Cao, “Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability,” ISFPGA, 2008. L. Cheng, P. Gupta, and L. He, “Accounting for Non-linear Dependence Using Function Driven Component Analysis,” ASPDAC 2009. Collaborators: Dr. Jinjun Xiong, Dr. Yan Lin, Dr. Fei Li, and Miss Phoebe Wong, Introduction  Block-based SSTA operations –Add (simple) –Max (hard)  Core operation of SSTA  Delay model –Linear, e fficient but not accurate –Quadratic, accurate but not efficient –Quadratic without crossing term (semi-quadratic), efficient and somewhat accurate Analysis of Non-Linear Dependence Statistical Modeling and Optimization for FPGAs References & Collaborators Statistical Static Timing Analysis Modeling of Across-Wafer Variation Switching activity Ratio of short circuit power Critical path structure Circuit element statistics Area Chip level area, delay and power Circuit level delay and power VPR Psim Trace collection Device dependent Device independent  FPGA chipwise placement for timing optimization  Concurrent design of process and FPGA architecture – Develop process and architecture concurrently in order to shorten the time to market – Need to estimate FPGA power and delay from process parameters  Ptrace2 – Based on ITRS Mastar4 transistor model  Analysis result – Device aging leads to 8.5% delay degradation after 10 years – Neither device aging nor process variation has impact on SER – Programmability of FPGAs offer a unique opportunity to leverage process variation and improve circuit performance – Perform placement according to the chipwise variation maps – Improve performance up to 12.1%  Across-wafer variation can be approximated as quadratic function  After subtracting the across-wafer variation, purely random spatial correlation is not significant  In the die point of view, the within wafer is spatially correlated – This variation is not purely random – Cannot be modeled as random correlated variation  New Variation Model – Exactly model the across wafer variation – Only 4 random variables: Xc, Yc, mw, and r – More accurate and efficient than spatial l variation model Loca- tion Our ModelSpatial Correlation model µσ95%T (s)µσ95%T (s) LL-C+0.7+1.1+0.515.3+2.4+1.5+5.2154 (10.1X) LR-C+0.2+1.1-0.214.7+0.0+8.8-1.4155 (10.5X) UL-C-0.2-0.6+0.115.2-0.2+7.6-0.1153 (10.1X) UR-C+0.2+0.6+0.114.9-0.7+4.8-1.3152 (10.2X) Target function f(X 1, X 2 ) Samples of X 1, X 2 Joint moments of X 1, X 2 Error of moments of f as function of transfer matrix Nonlinear programming Minimizing error of moments of f Transfer matrix W Target function f(X 1, X 2 ) Samples of X 1, X 2 Transfer matrix W Joint moments of X 1, X 2 Moments of P 1, P 2 ρ ij of P 1, P 2 Function of P 1, P 2 g(P 1, P 2 ) Result with Correct dependence Result assuming ρ ij =0 Error  Linear operation is used to decomposed dependent variation sources  Not accurate with existence of non-linear dependence  Need to estimate the error introduced by ignoring non- linear dependence  Define high order correlation coefficient Circuit delay Comparison Wafer frequency Wafer leakage Across-wafer variation is looked on as spatial correlated in the die point of view PDF comparison Approximate max operation as second order polynomial PDF comparison Fourier Series approximation of PDF Performance improve under different utilization rate Performance improvement histogram PTRACE


Download ppt " Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay."

Similar presentations


Ads by Google