Continuing Challenges in Static Timing Analysis

Slides:



Advertisements
Similar presentations
Handling Complexity in FEV Erik Seligman CS 510, Lecture 6, January 2009.
Advertisements

Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Timing Analysis - Delay Analysis Models
Timing Override Verification (TOV) Erik Seligman CS 510, Lecture 18, March 2009.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
Timing constraints: Are they constraining designs or designers?
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Accelerated Path-Based Timing Analysis with MapReduce
Microprocessors VLIW Very Long Instruction Word Computing April 18th, 2002.
TIMING CLOSURE IN SYSTEM-ON-CHIP ERA Sam Appleton, CEO CONFIDENTIAL.
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
CSE241 Formal Verification.1Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 6: Formal Verification.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
4 July 2005 overview Traineeship: Mapping of data structures in multiprocessor systems Nick de Koning
Kazi Fall 2006 EEGN 4941 EEGN-494 HDL Design Principles for VLSI/FPGAs Khurram Kazi.
Logic Design Outline –Logic Design –Schematic Capture –Logic Simulation –Logic Synthesis –Technology Mapping –Logic Verification Goal –Understand logic.
Chapter 1 Program Design
Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.
Global Timing Constraints FPGA Design Workshop. Objectives  Apply timing constraints to a simple synchronous design  Specify global timing constraints.
Computing hardware CPU.
1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.
ECO Methodology for Very High Frequency Microprocessor Sumit Goswami, Srivatsa Srinath, Anoop V, Ravi Sekhar Intel Technology, Bangalore, India Introduction.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Testability and architecture. Design methodologies. Multiprocessor system-on-chip.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Design methodologies.
IT253: Computer Organization
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Summer Computing Workshop. Introduction  Boolean Expressions – In programming, a Boolean expression is an expression that is either true or false. In.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Pseudo-nMOS gates. n DCVS logic. n Domino gates. n Design-for-yield. n Gates as IP.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
SSV Summit November 2013 Cadence Tempus™ Timing Signoff Solution.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 26: October 31, 2014 Synchronous Circuits.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Processor Architecture
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Multi-Level Logic Synthesis.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Static Timing Analysis
Introduction to Hardware Verification ECE 598 SV Prof. Shobha Vasudevan.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2012 Synchronous Circuits.
04/21/20031 ECE 551: Digital System Design & Synthesis Lecture Set : Functional & Timing Verification 10.2: Faults & Testing.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
What’s going on here? Can you think of a generic way to describe both of these?
Advanced Algorithms Analysis and Design
Synthesis for Verification
PROJECT LIFE CYCLE AND EFFORT ESTIMATION
Real-Time Ray Tracing Stefan Popov.
Database Performance Tuning and Query Optimization
CSCI1600: Embedded and Real Time Software
Timing Analysis 11/21/2018.
FALSE PATH ANALYSIS AND CRITICAL PATH ANALYSIS
Day 26: November 1, 2013 Synchronous Circuits
Objective of This Course
FPGA Tools Course Basic Constraints
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Automatic Test Pattern Generation
Improvements in FPGA Technology Mapping
ECE 551: Digital System Design & Synthesis
CENG 351 Data Management and File Structures
Amortized Analysis and Heaps Intro
Fast Min-Register Retiming Through Binary Max-Flow
CSCI1600: Embedded and Real Time Software
Pointer analysis John Rollinson & Kaiyuan Li
Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.
Presentation transcript:

Continuing Challenges in Static Timing Analysis Tom Spyrou TAU 2013 3/2013

Goal of this talk Higher level than latest trends Remind ourselves the trade-offs we have made as an industry to have a workable solution for STA Signoff Embedded in Design Synthesis and Optimization Plenty of discussion on new effects, lets discuss core STA Explain basis of industrial algorithms to academic community Challenge ourselves to look at the issues again Technology trends Design Compute

Why Static Timing Analysis Dynamic simulation is impossible for even a small chip Assume combination logic only 100 inputs implies 2^100 vectors needed to verify timing which is about 10^30 vectors If a simulator could process 10^6 vectors per second this works out to a sim time of 10^19 days or about 10^15 years Talk about a verification bottleneck! Now add in state elements and the problem of making sure the critical path is actually in the vector set STA can analyze such a design in 1 minute There are some issues, but they can be mitigated STA’s quality of result is not dependent on the quality of the vector set

What is the trade-off / core issues What is the trade-off / core issues? These have been unchanged for a long time A different kind of setup Result is dependent on quality of constraints and exceptions If all storage elements are clocked and i/o’s constrained generally safe Less accurate delay analysis Exact path is not really known as with event driven simulation When STA was first introduced this was less of an issue, PBA now essential Introduction of false paths due to topological not functional analysis Users have to manually specify these Multiple circuit modes take extra effort Not just more vectors Loops and level sensitive latches add complexity

Analysis Every circuits looks the same to STA since it ignores the functions of the logic.

Topological analysis Simplifies problem, possibility of reporting false paths

What do recent trends mean Design Hyper-optimization means accuracy is critical When a chip is designed at a bleeding edge technology it will be pushed on all dimensions of power, performance and area Simulation based delay calculation Path based analysis Design size means memory use is #1 problem Largest chips are approaching 1TB of RAM needed for flat runs Hierarchical / Parallel solutions must prioritize memory use on compute nodes Runtime also needs to be faster but the first step is to run on machines with reasonable cost Recent design uses 750+Gig of RAM for single mode/corner STA Compute CPU is cheap, data movement is expensive Whenever you hear its an expensive calculation don’t avoid it Parallel computing must not only improve performance but also accuracy and features. Don’t just make the same problem go faster or just divide the data

If you ask a designer what doesn’t work well Hierarchical timing in the final verification loop SI calculations very conservative SDC’s are large and hard to verify Worst case timing is done and process variation is modeled very pessimistically Block based analysis loses too much accuracy True delay (looking at combinational logic to prove a path true) reporting is slow and can’t run during optimization Libraries limit flexibility of analysis

STA Industry and Academia STA technology has been innovated inside Industry much more than in Academia The key approaches are not documented There is no open source reference to build from Industry protects the core concepts as trade secrets Academia does not (rarely) publish on STA beyond single clock designs or delay calculation We need a book on the core search algorithm

Example, Veritime from the 90’s STA Engine that required vectors for the clock Dynamic simulation of the clock Period, multicycle paths, clock to clock false paths automatically determined STA for data portion Absorbed by Cadence and forgotten since at the time SDCs were a lot easier to hand inspect

Requirements of an STA Engine I would like to begin by documenting the basics that everyone in Industry knows. There are no company specific trade secrets Must run in linear memory and runtime with circuit size, number of clocks, exceptions, and number of storage element Touch each vertex only once, maybe twice to simplify pre-processing, not once per clock or exception Must support SDC timing constraints Clocks, clock tree assumptions, multi-cycle paths, false paths, path delays, cases and modes Must be nearly spice accurate in delays and support path based Must be incremental enough Netlist changes / full retrace on one extreme Query based incremental with limited tracing on the other

The Basic Search The Graph Startpoints are inputs to the circuit and clock inputs to storage elements Endpoints are outputs of circuit and data inputs of storage elements Propagate the Clocks For each clock input BFS to all clock data pins Offset startpoint arrival times and end point required times with information from the clock propagation and cycle accounting Propagate the Data Use a BFS from startpoints to end points Use multiple timing totals at every pin to take into account multiple clocks and exceptions Can optionally store back pointers to record K critical paths but this time/memory is wasted on optimization programs and should be left to a reporting phase

Multiple Timing Totals with Partial Path Simplistic implementation is that each clock and each exception gets its own total Simultaneously or via separate traces Memory and/or runtime increase quickly Occurrence pins are the most common netlist object There can be thousands of exceptions At Timing endpoints like totals can be combined and evaluated At Timing endpoints point to point exceptions can be evaluated

Multiple Timing Totals

Multiple Timing Totals with path completion data A BFS has no information about paths However timing exceptions are specified in terms of from, through, and to paths with a boolean expression of pins Mcp –from a –through {b c} –to d From a through b or c and also through d Each total can have a small state machine about what exception points it has seen At timing endpoints like totals with like exception point data can be combined or if false not combined

Through exceptions

Framework can be used for Clock Pessimism d1,d2 Arr 1 d1,d2,d3 Arr 2 d1,d2 Arr 1 d1,d2 Arr 1 d1,d2,d3 Arr 2 d1,d2 Arr 1 d1,d2,d3 Arr 2 d1,d2,d3 Arr 2 d3 d2 d1 17

Delay Calculation, Multiple Timing Totals Worst case slew merging is pessimistic but allows Delay Calculation to be a pre-process step If Delay Calculation is done in the BFS the critical slew merging can be done It is also possible for each timing total to carry its own slew to improve accuracy Loops can be auto detected and dynamically broken avoiding accidental critical path breaks

Incremental Timing Netlist edits, full retime Netlist edits, fanout cone retime Netlist edits, query based retime The choice of how incremental to go depends on the optimization approach More global cost functions require less incrementalness More locally greedy approaches require more

STA needs innovation Increased sharing to Academia Increased research on the problems that are still problems Redirect solutions in light of the Design and Compute trends There is a lot of interesting work to do!

Some ideas New constraint language that is more functional Try to propagate the function with the delays Some combination with cycle based simulation Constraint language enhancements Library-less delay models New data model which is stage based Focus on data locality Hierarchical timing model which is truly context independent within acceptable limitations Constraint improvements to help constraint blocks more accurately