Delay of Multi-Level Circuits

Delay of Multi-Level Circuits
Sungho Kang Yonsei University

Outline Component and Circuit Delay Timing Analysis and Verification
Floating Mode Delay Computation Technology-Independent Optimization The Speedup Algorithm

Component and Circuit Delay
Accurate timing estimation relies on component delay and circuit delay calculation Component delay calculation is the method used for actually calculating the delay of individual components within a circuit and the method uses pre-calculated timing data in this calculation inertial delay transport delay wiring delay can be estimated through simulation or back-annotated from the final layout Calculation of delay data is very dependent on the implementation method

Component Delay Calculation
Delay(i) = transport(i) + resistance(i) X load i : gate load : total capacitive load driven by this gate Component delay calculations have tradeoffs between accuracy and time

Circuit Delay Calculation
Correction operation The clock period is longer than the sum of the maximum propagation delay through the combinational logic, the setup time of the memory device, and the maximum propagation delay through the memory device The circuit’s input lines are stable and valid for a sufficient period surrounding each active clock edge to accommodate both the maximum propagation delay through the combination logic and the setup time of the memory The minimum propagation delay around the cycle exceeds the hold time requirement of the flipflop

Circuit Delay Calculation
Problems of Timing simulation Effort required to build a set of input patterns Ensuring that the set of input patterns comprehensively exercises the circuit enough to accurately determine the delay Computational expenses Therefore Timing Verification is being used

Topological Timing Analysis
Timing Analysis and Verification A(s) : arrival time of signal s time at which the signal settles to its steady value R(s) : required time of signal s time at which the signal is required to be stable S(s) : slack time of signal s R(s) - A(s) The topologically longest path is a path where each signal has the minimum slack Static timing analyzers assume that the critical delay is the delay of the topologically longest path Under this assumption, the longest path is the critical path This can be a pessimistic assumption

False Paths Timing Analysis and Verification The problem with topological analysis is that not all paths need be responsible for the delay A path is false if it is not responsible for the delay of a circuit The critical delay is defined as the delay of the longest true path in the circuit The critical delay of the combinational logic is dependent on not only the topological interconnection of gates and wires, but also the Boolean functionality of each node

Delay Models Transition mode Floating mode Fixed delay model [d]
Timing Analysis and Verification Transition mode The circuit nodes are assumed to be ideal capacitors and retain their value set by v1 until v2 forces the voltage to change The timing response for v2 is also a function of v1 Floating mode The nodes are not assumed to be ideal capacitors and hence their state is unknown until it is set by v2 The timing response for v2 is independent of v1 Fixed delay model [d] Monotone speedup delay model [0,d] d is the upper bound and 0 is the lower bound

Transition Mode and Speedup
Timing Analysis and Verification The path cannot be sensitized under the transition mode of operation and under fixed gate delays The path is false under transition mode and fixed gate delays A sensitization condition based on transition mode and fixed gate delays in unacceptable in a worst-case design methodology Unfortunately if we use only the upper bounds in the transition mode of operation, an erroneous critical delay may be computed In order to obtain a useful sensitization condition, one strategy is to use the transition mode of operation and monotone speedup The process of simulating the circuit is much more complicated since the transitions at the internal gates may occur at varying time

Transition Mode and Speedup
Timing Analysis and Verification In order to determine the critical delay of the circuit, we scan all the possible waveforms at output f and find the time at which the last transition occurs over all the waveforms Timing analysis for a worst-case design methodology can use the strategy of monotone speedup delay simulation under the transition mode of operation Disadvantages The search space is 22n where n is the number of PI s to the circuit Monotone speedup delay simulation is significantly more complicated than fixed delay simulation

Floating Mode and Speedup
Timing Analysis and Verification As compared to transition mode critical delay under floating models significantly easier to compute for the fixed or monotone speedup delay mode - large sets of possible waveforms do not need to be stored at each gate Let P be a path through C and V2 be a vector applied to C At any gate g on P, the side-inputs have to be at noncontrolling values when the controlling or noncontrolling value propagates along P through C If the value at a side-input i to g is noncontrolling on V2, monotone speedup allows us to disregard the time that the noncontrolling value arrives Let the delay of all paths from the PIs to I be greater than the delay of the subpath corresponding to P ending at g

Floating Mode and Speedup
Timing Analysis and Verification Under monotone speedup, we can speed up all the paths to i, ensuring that the noncontrolling value arrives in time Under floating mode with fixed delays we cannot change the delays of the paths to ii, but we can assume that V1, the vector applied before V2, was providing a noncontrolling value

Static Sensitization Timing Analysis and Verification A path is statically sensitized by a vector if all the side-inputs along the path settle to noncontrolling values Delay-independent Static sensitizability is a sufficient condition for a path to be responsible for the delay of a circuit under the floating mode of operation Under the floating mode of operation, on any particular gate g on P, the values of the side-inputs of g on the previously applied vector V1 can be assumed to be at controlling values Steady noncontrolling values at each side-inputs

Static Cosensitization
Timing Analysis and Verification An input vector w statically consensitizes to a 1 path P in C if and only if the value of vn+1 is 1, and for each vi ,1  i  n+1, if vi has a controlled value, then the edge ei-1 presents a controlling value Static cosensitization is a delay-independent condition similar to static sensitization but is weaker than static sensitization Static cosensitization is a necessary condition for a path to be responsible for the delay of a circuit under the floating mode of operation

True Floating Mode Delay
Timing Analysis and Verification Twp delay-independent conditions Static sensitization and static cosensitization that are sufficient and necessary The necessary and sufficient condition for a path to be responsible for circuit delay under the floating mode of operation is a delay-dependent condition that is stronger than static consensitization but weaker than static sensitization

The rules represent a timed calculus for single vector simulation with delay values that can be used to determine the correct floating mode delay of a circuit under an applied vector V2 and the paths that are responsible for the delay under V2 If the gate output is at a controlling value Pick the minimum among the delays of the controlling values at the gate inputs Add the gate delay to the chosen value to obtain the delay at the gate output If the gate output is at a noncontrolling value Pick the maximum of all the delays at the gate inputs

A path is responsible for the floating mode delay of a circuit in V2 if and only if for each gate along the path If the gate output is at a controlling value The input to the gate corresponding to the path has to beat a controlling value and furthermore has to have a delay no greater than the delays of the other inputs with controlling values If the gate output is at a noncontrolling value The input to the gate corresponding to the path has to have a delay no smaller than the delays at the other inputs

Floating Mode Delay Computation
Most methods to compute critical delay operate on a per path basis The longest path in the circuit is found and the method searches for a V2 that sensitizes the path according to the chosen sensitization conditions If the search fails, the next longest path is picked and the process is iterated until the longest true path is found Per-path delay computation methods cannot be used for large circuits

Floating Mode Delay Computation
Alternate strategy is to directly answer the question of what the true critical delay of the circuit is and operate on sets of paths rather than a single path at a time A straightforward O(2n) algorithm to find the true critical delay of a circuit that does not require path enumeration is to simulate each of the 2n input vectors or minterms using the timed calculus and determine the longest delay seen at the circuit output This process can be speedup considerably by using cube simulation rather than minterm simulation

PODEM PODEM(po, lvalue) { jlist = po with logical value lvalue
Floating Mode PODEM(po, lvalue) { jlist = po with logical value lvalue status = SEARCH_1 (jlist) ; return (status) ; }

PODEM SEARCH_1(jlist) { if (length of jlist is zero) return SUCCEED;
Floating Mode SEARCH_1(jlist) { if (length of jlist is zero) return SUCCEED; if(BACKTRACE(po, po_value, &pi, &pi_value)==FALSE) return(FAILED) ; if(IMPLY(pi, pi_value, jlist) != IMPLY_CONFLICT) { search_status = SEARCH_1(jlist) ; if (search_status == FAILED) { restore the state of the network to what it was prior to the most recent primary input assignment ; search_status=SEARCH_2(jlist, pi, 1-pi_value) ; } } else { restore the state of the network ; search_status = SEARCH_2(jlist, pi, 1-pi_value) ; return(search_status) ;

PODEM SEARCH_2(jlist, pi, pi_value) { backtracks = backtracks + 1 ;
Floating Mode SEARCH_2(jlist, pi, pi_value) { backtracks = backtracks + 1 ; if( backtracks > BACKTRACK_LIMIT) return(ABORTED) ; if(IMPLY(pi, pi_value, jlist) != IMPLY_CONFLICT) { search_status = SEARCH_1(jlist) ; if (search_status == FAILED) { restore the state of the network ; } else { search_status = FAILED ; } return(search_status) ;

Cube Simulation Floating Mode It is equivalent to the timed calculus in the case where inputs are completely specified Given an incompletely specified vector, it produces an upper bound on the achievable delay over any of the minterms in the vector Given an incompletely specified vector, it produces a lower bound on the achievable delay over any of minterms in the vector Computed delays using the timed calculus merely give the range of the achievable delays over all the minterms contained in the partial input setting Even if a wire is at a known value under a partial input setting v, the delay of the wire may be a range than a constant

Timed Test Generation Floating Mode Conflicts occurring during implication may be logical conflicts or timed conflicts A time conflicts occurs when the output is set to L but the upper bound on the delay at the output is strictly less than  The procedure ends successfully if the output has been set to L and the lower bound on the computed delay at the output is greater than or equal to 

Backtrace Floating Mode Backtrace procedure is called when the PO of the circuit is at the unknown value for the current PI settings The backtrace procedure in times test generation is similar to the purely logical backtrace of PODEM except that it uses both the logical and desired delay value at the output to choose what path to follow

Circuit Restructuring
Technology Independent Timing optimization of combinational circuits is performed both at the technology-independent level and during technology mapping The critical section of a Boolean network is composed of all the critical paths from PIs to POs Given a critical path, the total delay on the path can be reduced if any section of the path is sped up The nodes along the critical paths chosen to be collapsed and redecomposed form the redecomposition region The algorithm selects a minimum set of subsections, called redecomposition points The goal is to select a set of points which cut all the critical paths and have the minimum total weight Once the redecomposition points are chosen, they are sped up by the collapsing-decomposing procedure

Definitions Speedup Algorithm An -network is defined as a subnetwork in which all the signals have a slack within  of the most negative slack The distance between two nodes f and g is the minimum number of nodes that have to be traversed form g to reach f, including g d_critical_fanin section as the set of nodes that (1) are in the transitive fanin of the node, (2) are at most distance d away from the node, and (3) are part of the -network Partial collapsing of a node collapses all the nodes in the d_critical_fanin of the node into two levels of logic

Outline SPEED_UP(network, d, ) {
Speedup Algorithm SPEED_UP(network, d, ) { /* d is the distance up to which the fanins are collapsed */ /*  is the threshold for generating the -critical network */ while(delay decreases or timing contraints not satisfied){ DELAY_TRACE() ; GENERATE(-network) ; node_list = NODE_CUTSET(-network) ; foreach(node  node_list) PARTIAL_COLLAPSE(node, dist) ; SPEEDUP_NODE(node) ; }

Weight of Critical Nodes
Speedup Algorithm W = Wt +  Wa Wa : number of literals in the duplicated logic Wt : potential for speedup  : coefficient controlling the area-delay tradeoff The standard deviation of the vectors(Ai, Di) is small This implies a near balanced decomposition already exists in the transitive fanin of the node when the inputs arrive at similar times Hence there is not much scope of improving the existing decomposition Let D= A +  be the least square error straight line that fits the data points (Ai, Di) A negative value of  indicates that early arriving signals pass through a larger delay This too suggests that the current decomposition is skewed in the right direction, reducing the potential for speedup

Minimum Weight Cutset Speedup Algorithm After assigning the node weights, the maxflow-mincut algorithm is applied to generate a minimum weighted node cutset The minimum weighted cutset provides us with a minimal area increase when the nodes are resynthesized

Partial Collapsing Speedup Algorithm We collapse all the nodes in the d_critical_fanin of the node to generate a large node with an associated SOP expression to decompose later The choice of the distance d parameter in the PARTIAL_COLLAPSE procedure influences the algorithm Decomposition of the collapsed node takes into consideration the arrival times

Timing Decomposition Speedup Algorithm The general idea in the timing decomposition of a node is to place the critical signals closer to the output, thus making them pass through a smaller number of gates We can reduce the area by sharing common functions Attempt to extract area saving divisors that do not contain critical signals After all such divisors have been extracted, we decompose the node into a NAND-NAND tree using the same heuristic, placing late arriving signals nearer the output

Timing Decomposition SPEEDUP_NODE(f) {
Speedup Algorithm SPEEDUP_NODE(f) { k = CHOOSE_BEST_TIMING_DIVISOR(f) ; if (k!= NULL) { SUBSTITUTE(f,k) ; SPEEDUP_NODE(k) ; /* update the arrival time at inputs of f*/ DELAY_TRACE() ; SPEEDUP_NODE(f) ; } else { AND_OR_DECOMP(f) ; }

Kernel-Based Decomposition
Speedup Algorithm CHOOSE_BEST_TIMING_DIVISOR(f) { /* K={level 0 kernels}  {level 0 kernel intersections} */ D=K ; /* D is the set of divisors */ p=0.1 ; /* Determined experimentally */ for (n  K){ f=qn+r ; D=q D; } for (n  D) { Fin = singals that fan into n ; Ci =  MIN(Ai) + (1-) MAX(Ai) Ca = Literals saved if n is extracted ; C(n) = Ct +  Ca ; return(j s.t. C(j) is minimum) ;

AND-OR Decomposition AND_OR_DECOMP(F) {
Speedup Algorithm AND_OR_DECOMP(F) { /* F is a multiple-cube function */ foreach(cube ci  F) { AND_DECOMP(ci) ; } DELAY_TRACE() ; F’ = xi’ /* xi represents the cube ci */ AND_DECOMP(F’) ;

AND-OR Decomposition AND_DECOMP(F) { /* F is a cube */
Speedup Algorithm AND_DECOMP(F) { /* F is a cube */ if ( |F| > 2 ) { l1 = Earliest arriving input of F ; l2 = Next earlier arriving input ; c = l1 l2 ; SUBSTITUTE(F,c) ; DELAY_TRACE() ; AND_DECOMP(F) ; }

Controlling Algorithm
Speedup Algorithm  Using a large  might result in selecting nodes for speedup from a region where speeding up does not reduce the critical delay area is wasted Too small an  results in a slow algorithm d The large d is useful in making relatively large changes in the delay since the larger nodes provide a greater degree of flexibility in restructuring the logic The run time increases rapidly as d is increased  The larger is the value of , the more we want to avoid the duplication of logic during partial collapsing Model The delay trace performed on the circuit can use a variety of delay modes

Delay of Multi-Level Circuits

Similar presentations

Presentation on theme: "Delay of Multi-Level Circuits"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Delay of Multi-Level Circuits

Similar presentations

Presentation on theme: "Delay of Multi-Level Circuits"— Presentation transcript:

Similar presentations

About project

Feedback