Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
OCV-Aware Top-Level Clock Tree Optimization
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Improved Algorithms for Link- Based Non-tree Clock Network for Skew Variability Reduction Anand Rajaram †‡ David Z. Pan † Jiang Hu * † Dept. of ECE, UT-Austin.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
The continuous scaling trends of smaller devices, higher operating frequencies, lower power supply voltages, and more functionalities for integrated circuits.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design Hao Yu Berkeley Design Chunta Chu and Lei He EE Department.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
SAMSON: A Generalized Second-order Arnoldi Method for Reducing Multiple Source Linear Network with Susceptance Yiyu Shi, Hao Yu and Lei He EE Department,
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Changbo Long ECE Department, UW-Madison Lei He EDA Research Group EE Department, UCLA Distributed Sleep Transistor Network.
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations Yiyu Shi*, Jinjun Xiong +, Chunchen Liu* and Lei He* *Electrical.
Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability 1 Lerong Cheng, 1 Yan Lin,
CDCTree: Novel Obstacle-Avoiding Routing Tree Construction based on Current Driven Circuit Model Speaker: Lei He.
Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical.
Worst-Case Timing Jitter and Amplitude Noise in Differential Signaling Wei Yao, Yiyu Shi, Lei He, Sudhakar Pamarti, and Yu Hu Electrical Engineering Dept.,
MATE: MPLS Adaptive Traffic Engineering Anwar Elwalid, et. al. IEEE INFOCOM 2001.
VLSI Physical Design Automation
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Xin-Wei Shih and Yao-Wen Chang.  Introduction  Problem formulation  Algorithms  Experimental results  Conclusions.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
Pattern Selection based co-design of Floorplan and Power/Ground Network with Wiring Resource Optimization L. Li, Y. Ma, N. Xu, Y. Wang and X. Hong WuHan.
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation Fang Gong 1, Hao Yu 2, and Lei He 1 1 Electrical Engineering.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
A Stable Fixed-outline Floorplanning Method Song Chen and Takeshi Yoshimura Graduate School of IPS, Waseda University March, 2007.
Stochastic Current Prediction Enabled Frequency Actuator for Runtime Resonance Noise Reduction Yiyu Shi*, Jinjun Xiong +, Howard Chen + and Lei He* *Electrical.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
Xuanxing Xiong and Jia Wang Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois, United States November, 2011 Vectorless.
EE 201C Modeling of VLSI Circuits and Systems
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
Computing and Compressive Sensing in Wireless Sensor Networks
Chapter 5b Stochastic Circuit Optimization
Jinghong Liang,Tong Jing, Xianlong Hong Jinjun Xiong, Lei He
Performance Optimization Global Routing with RLC Crosstalk Constraints
Chapter 5b Stochastic Circuit Optimization
Performance and RLC Crosstalk Driven Global Routing
Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*
Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs
Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu
EE 201C Modeling of VLSI Circuits and Systems
Post-Silicon Calibration for Large-Volume Products
Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*
Reducing Clock Skew Variability via Cross Links
Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*
Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu Hu Partially supported by SRC task 1116.

Introduction Both process and operation variations cause uncertainties and may lead to design failure or over-design. Process variations have been actively studied.  Statistical timing analysis  Stochastic optimization  Post-silicon configuration Stochastic optimization for operation variations below has been largely ignored  Fluctuation of crosstalk noise and P/G network noise due to different input vectors  Time-variant on-chip temperature map over different workloads This work is the first in-depth study on clock synthesis considering time-variant temperature variations

Limitation of Existing Work The existing work [Cho:ICCAD05] ignores the time-variant temperature variations and assumes a fixed temperature map Different work loads lead to different temperature maps (e.g., two SPEC2000 applications: Ammp and Gzip) Optimizing skew for one application hurts the skew for another application, this conflict is solved in this work

Outline Modeling and Problem Formulation Algorithms Experimental Results Conclusions

Stochastic Temperature Model The temperature map is unique for each application or program phase  can be obtained by uArch-level simulation For each region of the chip, temperature is characterized by its mean and variance over a number of maps  Primary component analysis (PCA) to decide # of maps Temperature correlation measured as covariance between regions is high over SPEC2000 benchmark set Considering temperature correlations during optimization can compress searching space! (i,j) Correlation between region i and j

Problem Formulation Given:  The source, sinks and an initial tree embedding  A set of temperature maps for a benchmark set Design freedoms:  Re-embedding of clock tree  Cross link insertion To minimize the worst case skew among given temperature maps

Outline Modeling and Problem Formulation Algorithms Experimental Results Conclusions

Bottom-up Greedy-based Re-embedding d x y abc v Sink Original merging point Re-embedding option

Bottom-up Greedy-based Re-embedding d x y abc v New merging point

Perturbed Modified Nodal Analysis (MNA)  x is for source, sinks and merging point  L selects sink responses  Defining a new state variable with both nominal (x) and sensitivity (Δx) [key to triangulate the system] Structured and parameterized state matrix Delay and Skew with Re-embedding The number of re-embedding options I=5 N is huge! (N is number of merging points)

Compressing Solution Space by Temperature Correlation Motivation  Highly correlated merging points should be re-embedded in the same fashion Solution  Calculate correlation between two merging points based on temperature correlations  Cluster merging points based on correlation strength  Perform the same re-embedding for all points within one cluster

Temperature Correlation Driven Clustering Correlation matrix C of merging points is low-ranked, and Singular Value Decomposition (SVD) reveals the rank K Partition the merging points into K clusters (K-Means)  Maximize the correlation strength within each of K clusters Low-Rank Approx. K = 4, N = 70 Reduced from 5 70 to 5 4

Recap of Skew Calculation with Re-embedding Cluster based reduction (SVD + K-Means) Block-wise MOR [Yu et al, DAC’06] (Best paper award nominee) Transient time analysis (Back-Euler) K << N Delay and Skew

Simultaneous Re-embedding and Cross Link Insertion 1.Decide crosslink candidates according to [Rajaram, DAC04] 2.Cluster crosslink candidates again based on the temperature correlation 3.Calculate skew sensitivities w.r.t. crosslink and re-embedding candidates  In a fashion similar to the previous triangular block-wise MOR 4.Bottom-up select the best crosslink or re-embedding

Outline Modeling and Problem Formulation Algorithms Experimental Results Conclusions

Experimental Settings Temperature maps are obtained by micro-architecture level power-temperature transient simulator [Liao,TCAD’05] with 6 SPEC2000 applications 100 temperature maps, one for each 10 million clock cycles Compare four algorithms (two categories)  Traditional optimization under nominal temperature and Elmore delay DME: deferred merging-point embedding to minimize wire-length for zero-skew xlink: cross-link insertion [Rajaram, ICCAD'04]  The proposed algorithms with temperature variation and high- order delay model re-embed: re-embedding xlink+ Re-embed: simultaneously re-embedding and cross-link insertion

Skew Distribution Over 100 Temperature Maps X+R = cross link insertion + re-embedding DME = Deferred Merging points Embedding

Worst-case Skew For tree structure, re-embed reduces the worst-case skew by 3x on average (up to 20x) compared to DME. For non-tree structure, xlink+re-embed reduces the worst-case skew by 30% on average (up to 7x) compared to xlink. ps

For tree structure, re-embed has less than 1% wire length overhead compared to DME For non-tree structure, xlink+re-embed has 5% LESS wire length compared to xlink. Wire Length

Temperature-aware optimizations (re-embed and xlink+re-embed) are about 10x slower compared to DME and xlink, respectively, but  Our work uses high-order delay model  DME and xlink use Elmore delay Runtime

Conclusions Studied the clock optimization for workload dependent temperature variation  Reduced the worst-case skew by up to 7X with LESS wire-length compared to best existing method Correlation-aware modeling and optimization paradigm can be extended to handle PVT variations, and more design freedoms  “Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load” [Chu et al, ICCAD07]  “Efficient Decoupling Capacitance Budgeting Considering Operation and Processing Variations” [Shi et al, finalist for Best Paper, ICCAD07]

Thank you! SRC TechCon 2007 Hao Yu (graduated), Yu Hu (presenter), Chun-Chen Liu and Lei He (PI) Minimal Skew Clock Embedding Considering Time Variant Temperature Gradient