Download presentation
Presentation is loading. Please wait.
Published byGavin McGee Modified over 8 years ago
1
Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University
2
2 Typical System Design Flow Spec. Mapping + Allocation Physical Synthesis System Partitioning RTL (*.v, *.vhd) Simulation Code Emission System Design Loop Too slow! Scalability Issues: Simulation takes minutes to hours Synthesis takes many hours IR
3
3 Proposed Design Flow Spec. Mapping + Allocation Physical Synthesis System Partitioning RTL (*.v, *.vhd) Simulation Code Emission IR Traditional System Design Loop Timing Analysis Optimize Design Update timing in IR New Optimization Loop replace with
4
4 Key Contributions Spec. Mapping + Allocation Physical Synthesis System Partitioning RTL (*.v, *.vhd) Simulation Code Emission IR Timing Analysis Optimize Design Update timing in IR 1.Slack is a distributed representation of system timing 2.Linear-time update algorithm based on slack 3.> 100x reduction in total design time: Optimization Loop runs in seconds/minutes while System Design Loop runs in hours/days Traditional System Design Loop replace with New Optimization Loop
5
5 Outline Motivation Timing Metrics –Cycle time –Slack: An alternative view of cycle time Slack Update Algorithm Experimental Evaluation Conclusions
6
6 The Intermediate Representation (IR) Models a dynamic system Concurrent sub-systems –PE, FSM, S/W, Memory Communication between sub-systems based on pre- defined protocols –FIFOs, NoC, shared bus Sub- System Network … Transaction-Level Modeling (TLM), [Cai, ISSS 03] Adopted by System-C, Bluespec, Balsa, Tangram
7
7 Marked Graphs Model dynamic system interactions Events and transitions –Event: An edge acquires a token –Transition: Node consumes inputs and generates outputs Encode the communication protocols S1S1 S2S2 S3S3 token
8
8 Timing Analysis of IR Time Separation between Event (TSEs) –TSE between consecutive firings of same event in steady state is the mean cycle time Mean Cycle Time, CT Computing CT is about O(|E| 3 ) complexity [Dasdan 04]
9
9 Slack as a Timing Metric Distributed representation of cycle time –Different type of TSE –Defined on each (input edge, node) pair –How early this input arrives Slack(S1, S3) = 3 Slack(S2, S3) = 0 Slack(S3, S1) = 0 Slack(S3, S2) = 0 Zero-slack input is locally critical Longest chain of zero-slack events yields the critical cycle or the Global Critical Path (GCP) –Cycle time = Latency of the GCP –Given slack values, computing cycle time has linear complexity * 25 8 S1S1 S2S2 S3S3 locally critical
10
10 Slack is an Annotation on the IR Sub- System Network … Slack Helps in discovering hotspots and applying optimizations
11
11 Outline Motivation Timing Metrics Slack Update Algorithm Experimental Evaluation Conclusions
12
12 Optimizations Change the IR Spec. Mapping + Allocation Physical Synthesis System Partitioning RTL (*.v, *.vhd) Simulation Code Emission IR + slack Optimize Design Update timing in IR New Optimization Loop Causes changes in component delays, in turn, changing in slack values Need to update slack on each change
13
13 Problem Description Given a graph model and its current slack values, compute new values of slack when latency of a node changes by a given Δ 258 S1S1 S2S2 S3S3 6, Δ = -2
14
14 Insight behind Update Algorithm Slack is also latency difference of two branches of a re-convergent fork-join If delay of node, s c, increases by Δ –Update slack in surrounding re-convergent fork-joins –Propagate change globally s fork s join e1e1 e2e2 P1P1 P2P2 Assume δ(P 1 ) > δ(P 2 ), Di = δ(P 1 ) - δ(P 2 ) Slack(e 1, s join ) = 0 Slack(e 2, s join ) = Di Update (let Δ ≤ Di) : Slack(e2, s join ) = Di + Δ +Δ+Δ scsc
15
15 Insight behind Update Algorithm 1.If there is a path from change point to every input of s join, then no change in slack 2.If not, then slack changes occur s join e1e1 e2e2 +Δ+Δ scsc s fork +Δ+Δ Easy for acyclic graphs, but what about scc graphs?
16
16 Insight for Cyclic Graphs Use token knowledge –Count tokens from change point to every input –If value is equal for all inputs, then no change in slack values 258 S1S1 S2S2 S3S3 2, Δ = -3 # toks = 1 # toks = 2 Change in slack exists
17
17 Insight for Cyclic Graphs Use token knowledge –Count tokens from change point to every input –If value is equal for all inputs, then no change in slack values 258 S1S1 S2S2 S3S3 6, Δ = -2 # toks = 0 No change in slack
18
18 Algorithm Summary Initially, find # tokens between every pair of nodes in the graph –Problem formulated as a flow lattice –Invoked once, complexity is O(|M 0 | |V|) After inducing every change in graph –Compute slack change at each node –Propagate new change to neighboring outputs –Overall complexity is O(|V|)
19
19 Outline Motivation Timing Metrics Slack Update Algorithm Experimental Evaluation Conclusions
20
20 Experimental Setup Slack Update loop incorporated into CASH compiler [ASPLOS 04, DAC 07] –Synthesizes asynchronous circuits from ANSI-C programs Applied three different optimizations –SM: Slack Matching [ICCAD 06] –OC: Operation Chaining [ICCAD 07] –ASU: Heterogeneous Pipeline Synthesis [Async 08] Benchmarks: Fifteen frequently executed kernels from Mediabench suite, [Lee 97] All results are post-synthesis mapped to ST Micro 180nm library
21
21 Absolute Accuracy After SM, compare computed values of slack against actual values of slack for adpcm_d Close to 100 changes applied (algo invoked for each change) 1.2x performance speedup Update inaccuracy due to unknown latency values during circuit transformation
22
22 Design Loop Experiments Spec. Mapping + Allocation Physical Synthesis System Partitioning RTL (*.v, *.vhd) Simulation Code Emission IR System Design Loop Timing Analysis Optimize Design Update timing in IR Optimization Loop Run the same N optimizations in both loops and compare: 1. Overall Performance change 2. Overall design time
23
23 Design Loop Experiments Three optimization sequences 1.SM-ASU: Slack matching followed by Heterogenous latch insertion 2.ASU-SM 3.SM-ASU-OC Compare final circuit timing and loop traversal (design) times between the two methodologies
24
24 Design Loop Experiments About ~500 circuit changes, on average Design Loop time: 0.5 – 4 hours Optimization loop: 10 - 100 seconds SM-ASUASU-SM
25
25 Design Loop Experiments: SM-ASU-OC About ~1000-3000 circuit changes, on average Design Loop time: 1 – 10 hours Optimization loop: 20 - 200 seconds 300x Speedup
26
26 Outline Motivation Timing Metrics Slack Update Algorithm Experimental Evaluation Conclusions
27
27 Conclusions Slack update algorithm to speed up design loop in TLM-based workflows Use slack to track system-level timing changes Orders of magnitude reduction in design time at negligible loss in accuracy New optimizations loop enables scalability in iterative system design flows
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.