Download presentation

Presentation is loading. Please wait.

Published byArmani Dobie Modified about 1 year ago

1
Improved On-Chip Analytical Power and Area Modeling Andrew B. Kahng Bill Lin Kambiz Samadi University of California, San Diego January 20, 2010

2
Outline Motivation Implementation Flow and Design of Experiments Modeling Methodology Modeling Problem Power Modeling Area Modeling Experimental Results and Discussion Conclusions

3
Motivation Many-core chip NoCs needed to interconnect many-core chips Power-efficiency of NoCs is important Performance was the primary concern Now power efficiency is critical 28% of total power in Intel 80-core Teraflops chip is due to interconnection networks (routers + links); Need rapid power estimation to trade off alternative architectures Rapid power-area tradeoffs at the architectural level

4
Related Work Real-chip power measurements (Isci et al. ’03) RTL-level NoC power estimations (A. Banerjee et al. ’07 and N. Banerjee et al. ’04) Simulation time is slow Requires detailed RTL Architectural-level power estimation Interconnection network (Patel et al. ’97); model is not instantiated with architectural parameters not suitable to explore tradeoffs in router microarchitecture Uniprocessor power modeling (Wattch: Brooks et al. ’00 and SimplePower: Ye et al. ’00) ORION models Recently enhanced (i.e., ORION 2.0: Kahng et al. ’09) Early-stage design space exploration

5
Gaps in Previous Works (1) Dependence on underlying architecture / circuit implementation Developed from a mix of template architectures / circuit implementations (cf. ORION 2.0) Not accurate within an architecture-specific CAD flow Useful for early-stage estimations (e.g., complementary to our approach) Power and area estimations via parametric regression (Meloni et al. ‘07) Regression process assumes certain functional forms depends on the underlying architecture / circuit implementation Does not consider implementation parameters (e.g., aspect ratio, etc.) Model accuracy reduces once used across different architectures / circuit implementations not suitable for efficient design space exploration

6
Gaps in Previous Works (2) Not considering the impact of microarchitectural details Parametric cycle-accurate traffic driven power models, without consideration of microarchitectural parameters (cf. NOCEE) Power model with limited dependency on microarchitectural parameters; derived from synthesis results Model applicability will significantly reduce w.r.t. energy design space explorations as the impact of contributing components are not fully realized Our Goal: Develop a modeling framework that: (1) is architecture-independent, i.e., can be ported in any NoC component library, and (2) considers all the relevant microarchitectural details to enhance its applicability for efficient design space exploration

7
Improved NoC Router Power-Area Models implementation parameters Target frequency Chip aspect ratio Row utilization architectural parameters # of ports; # of buffers flit-width; # of VC voltage, frequency interconnect parameters device parameters LEF/Capacitance Tables/etc. … technology parameters req I req E req W req N req S grant I grant E grant W grant N grant S Arbiter out E out W out N out S in I in E in W in N in S out I Crossbar Buf E Buf W Buf N Buf S Buf I Link Source Link Source Write Control Request Signals Built using router layout data Closed-form models suitable for design space exploration Provides significant accuracy improvement compared with existing models (e.g., ORION 2.0)

8
Outline Motivation Implementation Flow and Design of Experiments Modeling Methodology Modeling Problem Power Modeling Area Modeling Experimental Results and Discussion Conclusions

9
Implementation Flow and Tools RTL generation from architecture Timing-driven synthesis, place and route flow Use range of architectural and implementation parameters to capture design space Nonparametric regression modeling Architectural Parameters Implementation Parameters Router RTL (Netmaker) Power / Area Reports Power and Area Models Model Generation (Multiple Adaptive Regression Splines) Place + Route (SOC Encounter) Synthesis (Design Compiler)

10
Design of Experiments Netmaker (Cambridge) fully synthesizable router RTL codes Libraries: TSMC (1) 130G, (2) 90GP, and (3) 65GP Tool Chain: Synopsys Design Compiler (DC), Cadence SOC Encounter (SOCE), Salford MARS 3.0 Experimental aces: Technology nodes: {130nm, 90nm, 65nm} Implementation parameters: f clk = target clock frequency ar = aspect ratio util = row utilization Architectural parameters: fw = flit-width n vc = number of virtual channels n port = number of input/output ports l buf = buffer length (#flit buffers / VC)

11
Outline Motivation Implementation Flow and Design of Experiments Modeling Methodology Modeling Problem Power Modeling Area Modeling Experimental Results and Discussion Conclusions

12
Modeling Problem Accurately predict y given vector of parameters x Difficulties: (1) which variables x to use, and (2) how different variables combine to generate y Parametric regression: requires a functional form Nonparametric regression: learns about the best model from the data itself For our purpose, allows decoupling of underlying architecture / implementation from modeling effort We use nonparametric regression to model power and area of an on-chip router → →

13
Multivariate Adaptive Regression Splines (MARS) MARS is a nonparametric regression technique MARS builds models of form: Each basis function B i (x) can be: a constant a “hinge” function max(0, c-x) or max(0, x-c) a product of two or more hinge functions Two modeling steps: (1) forward pass: obtains model with defined maximum number of terms (2) backward pass: improves generality by avoiding an overfit model → → ^

14
Power and Area Modeling Derive models for both dynamic and leakage power Dynamic power is due to switching capacitance (c switching ) P dynamic = 0.5×α×c switching ×V 2 ×f clk Leakage power is due to leakage current (i leak ) (subthreshold + gate) P leakage = i leak ×V Our modeling task: To model dependence of (P dynamic / α×V 2 ×f clk ) on microarchitectural and implementation parameters To model dependence of (P leakage / V) on microarchitectural and implementation parameters Similarly, we model dependence of extracted area on microarchitectural and implementation parameters Area is the sum of standard cell area

15
Example MARS Output Models (1) B 1 = max(0, n port - 5); B 2 = max(0, 5 – n port ); … B 34 = max(0, f clk - 200)×B 1 ; B 35 = max(0, f clk ) B 1 ; P dynamic = 0.5×α×( ×B ×B ×B 3 … ×B ×B ×B 35 )×V 2 B 1 = max(0, n port - 5); B 2 = max(0, 5 - n port ); … B 34 = max(0, n vc - 3)×B 27 ; B 35 = max(0, 3 - n vc )×B 27 ; P leakage = ( ×B ×B ×B 3 … E-5×B E-5×B 35 )×V Dynamic power model of a router in 65nm technology Leakage power model of a router in 65nm technology

16
Example MARS Output Models (2) Area model of a router in 65nm technology Total wirelength model of a router in 65nm technology (NEW) B 1 = max(0, n port - 5); B 2 = max(0, 5 - n port ); … B 34 = max(0, 24 - fw)×B 14 ; B 35 = max(0, f clk - 100)×B 15 ; Area = ×B ×B ×B 3 … E-6×B E-7×B 35 B 1 = max(0, n port - 5); B 2 = max(0, 5 - n port ); … B 33 = max(0, 1 - ar)×B 26 ; B 34 = max(0, util - 0.7)×B 8 ; WL total = ×B ×B 2 … ×B ×B 34 Closed-form expressions with respect to architectural and implementation parameters Suitable to drive early-stage architecture-level design exploration

17
Outline Motivation Implementation Flow and Design of Experiments Modeling Methodology Modeling Problem Power Modeling Area Modeling Experimental Results and Discussion Conclusions

18
Model Comparison (1) We compare our models against (1) models derived from parametric regression (Reg.), and (2) ORION 2.0 models We assume baseline virtual channel (VC) with FIFO buffers implemented as flip-flop registers c switching ~ O(l buf ×fw×n port ); i leak ~ O(l buf ×fw×n port ×n vc ) Multiplexer tree crossbar c switching ~ O(n 2 port ×fw); i leak ~ O(n 2 port ×fw) VC “selection” arbitration (cf. Kumar et al. ’07) c switching ~ O(n 2 port ); i leak ~ O(n 2 port ×n vc ) Buffer dynamic power does not change with n vc since the number of flits arriving at each input port is the same Due to VC “selection” VC dynamic power becomes invariant to actual number of VCs Requires modeler to have knowledge about the underlying architecture / circuit implementation

19
Model Comparison (2) Comparison against ORION 2.0 w.r.t. microarchitectural parameters: (1) #VC (n vc ), (2) flit-width (fw), (3) #port (n port ), and (4) buffer length (l buf )

20
Model Comparison (3) Power estimation error reductions Reg.: avg error 76.2% (24.4% 5.8%), max error 45.2% (108.4% 59.4%) ORION 2.0: avg error 82.3% (32.8% 5.8%), max error 27.4% (81.8% %59.4) Area estimation error reductions Reg.: avg error 79.4% (26.2% %5.4), max error 45.5% (111.3% 61.8%) ORION 2.0: avg error 83.8% (33.3% 5.4%), max error 28.3% (86.2% 61.8%) Metric Power ModelArea Model NewReg.ORION2.0NewReg.ORION2.0 min % error 130nm nm nm max % error 130nm nm nm avg % error 130nm nm nm

21
Variable Importance Use MARS to identify relative variable importance using post- synthesis and post-layout data n vc and n port are dominant parameters for post-synthesis and post-layout data, respectively impact of missing layout information at post-synthesis Multiplexer crossbar power is due to (1) multiplexers and (2) interconnection grid between input / output ports With post-synthesis data interconnection data is ignored crossbar power is modeled as O( n port ×log 2 n port ) With post-layout crossbar power is modeled as O( n 2 port ) Parameter Variable Importance (%) Post-SynthesisPost-Layout n port n vc l buf fw

22
Model Robustness 256 different router configurations Five different scenarios to train / test our models (1) s tr = 1/2, (2) s tr = 1/3, (3) s tr = 1/5, (4) s tr = 1/10, and (5) s tr = 64 For (1)-(4) we train models using a fraction s tr of the available data points, and validate them on the rest of the data points To assess the sensitivity of the model to sample size For (5) we use 64 (out of 256) data points to train the model, and validate it across all 256 available data points To assess the generality of the model Metric Power Model s tr = 1/2s tr = 1/3s tr = 1/5s tr = 1/10s tr = 64 min % error max % error avg % error Area Model min % error max % error avg % error

23
Recent Extensions Have used same methodology to develop models for interconnect wirelength (WL) and fanout (FO) Wirelength model On average, within 3.4% of layout data 91% reduction of avg error vs. existing models (cf. Christie et al. ’00) Fanout model On average, within 0.8% of the layout data 96% reduction of avg error vs. existing models (cf. Zarkesh-Ha et al. ’00) (WL) (FO)

24
Outline Motivation Implementation Flow and Design of Experiments Modeling Methodology Modeling Problem Power Modeling Area Modeling Experimental Results and Discussion Conclusions

25
Conclusions and Future Directions Generally applicable modeling methodology that can leverage architectural parameters and RTL-to-layout implementation Achieved accurate power and area models for on-chip router Improvement over parametric regression models Power: 76.2% (45.2%) reduction of average (maximum) error Area: 79.4% (44.5%) reduction of average (maximum) error Improvement over ORION 2.0 Power: 82.3% (27.4%) reduction of average (maximum) error Area: 83.8% (28.3%) reduction of average (maximum) error Ongoing work Maximum f clk modeling w.r.t. architectural and implementation parameters Other architectural building blocks (DSP cores, DesignWare library, …) Power, performance and cost estimators for 3-D design space exploration

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google