Presentation is loading. Please wait.

Presentation is loading. Please wait.

System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center.

Similar presentations


Presentation on theme: "System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center."— Presentation transcript:

1 System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center for Embedded Computer Systems University of California Irvine, CA 92697 givargis@ics.uci.edu

2 2 Overview Given: –Parameterized SOC architecture Explore void main(){ while(1){ Receive(); Decode(); Display(); } Application –Fixed application Automatically explore the design space Find optimal points w/respect to power and performance SOC CPUMemory JPEG CODEC Math/FPU UART I$-D$ BRIDGE Size = {1K, 4K, 8K} Line = {4, 8, 16} Assoc = {1, 2, 4}

3 3 Motivation Design trends: –Growing demand for portable devices –Growing demand for low power design –Increased application complexity –Shrinking time-to- market windows Technology trends: –Increased chip capacity –Increased I/O pins –Improved on-chip integration techniques (storage, digital, analog, digital, …) –SOC era Need for greater designer productivity!

4 4 SOC CPU Memory JPEG CODEC Math/FPU UART MMX BRIDGE ?Motivation One approach: reuse of existing IP ? ? ? ? –IP selection ? MIPS RAM JPEG CODEC1 Math/FPU UART ISA BRIDGE ARM SRAM DRAM AMBA BRIDGE JPEG CODEC2 USB –IP integration ? –SOC verification ? –Multi-source IP licensing –More…

5 5 Motivation Alternate approach: reuse of SOC –Designed, integrated, tested –Domain specific –Parameterized Designed by firms specializing in SOC User: map application, then, “configure-and- execute” (successors to microcontrollers!) Parameterized SOC CPUMemory JPEG CODEC Math/FPU UART MMX BRIDGE

6 6 Motivation Composed of 100s of cores Cores are “configurable” Configurations impact power/performance Large number of total configurations! Architecture is otherwise fixed! Parameterized SOC CPUMemory JPEG CODEC Math/FPU UART MMX BRIDGE

7 7 Motivation ATI Technologies – XILLEON™ 220 SOC for Digital Set-top Box Market Tensilica – Xtensa™ 1040 configurable processor cores Philips Semiconductors – Velocity RSP9™ SOC platforms Adelante Technologies – offers complete SOC customizable platforms for DSP domains More…

8 8 Outline Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

9 9 Outline Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

10 10 Previous Work Parameterized SOC design –[Malik00], [Veidenbaum99], [Vahid99], [Stan95] Power/performance evaluation –[Barndolese00], [Simunic99], [Li98], [Tiwari94] Design space exploration (manual) –[givargis99], [Lieverse99] Design space exploration (automatic) –Focus of this work…

11 11 Previous Work Architecture Application Mapping Analysis Numbers Auto Y-chart [Lieverse99]

12 12 Outline Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

13 13 Target Architecture UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

14 14 Target Architecture Voltage scale Size, line, associativity Bus width, encoding (gray, invert) UART tx/rx buffer size DCT resol. UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

15 15 Target Architecture Voltage scale Size, line, associativity Bus width, encoding (gray, invert) UART tx/rx buffer size DCT resol. UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

16 16 Target Architecture Voltage scale Size, line, associativity Bus width, encoding (gray, invert) UART tx/rx buffer size DCT resol. UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

17 17 Target Architecture Voltage scale Size, line, associativity Bus width, encoding (gray, invert) UART tx/rx buffer size DCT resol. UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

18 18 Target Architecture Voltage scale Size, line, associativity Bus width, encoding (gray, invert) UART tx/rx buffer size DCT resol. UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

19 19 Target Architecture 26 parameters 10 14 configurations What are the optimal configuration (given a fixed application)? UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

20 20 Problem Summary What are the possible power/performance tradeoffs? (100 trillion)  Need to efficiently evaluate power/performance (1/sec  150,000 years)  Need to explore the configuration space Parameterized SOC CPUMemory JPEG CODEC Math/FPU UART MMX BRIDGE

21 21 Outline Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

22 22 Power Evaluation Exploration works with: –Chip instrumentation (real-time) –System-level simulation –RTL simulation –Gate-level simulation –Circuit-level simulation Relative accuracy required! Digital camera application mapped on our SOC, capturing 1 image. 1 440 5400 28800 180000

23 23 Power Evaluation Exploration works with: –Chip instrumentation (real-time) –System-level simulation –RTL simulation –Gate-level simulation –Circuit-level simulation Relative accuracy required! Digital camera application mapped on our SOC, capturing 1 image. 1 440 5400 28800 180000

24 24 Power Evaluation - Processor [Tiwari94/00]’s instruction- level Measure watt/inst Account for stalls + dependency Apply traces UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

25 25 Power Evaluation – Cache/Mem. [Evans95] Capacitance model of sub- components Switching obtained via simulation (parameter dependent) UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

26 26 Power Evaluation – Buses [Chern92] Model bus capacitance Switching derived from I/O traffic (parameter dependent) UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

27 27 Power Evaluation – Peripherals Observation: cores execute instructions!  Apply a technique similar to that used for processors! UART MIPS I-Cache D-Cache Bridge Peripheral Bus DCT CODEC Memory DMA

28 28 Power Evaluation – Summary UART (5%) MIPS (10%) I-Cache (8%) D-Cache (8%) Bridge (5%) Peripheral Bus DCT CODEC (5%) Memory (8%) DMA (5%) ~50-100K instruction/second! (Platune)

29 29 Outline Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

30 30 Exploration Problem formulation P 1, P 2, …, P n A configuration (point) is an assignment of values to all parameters How to efficiently generate all Pareto- optimal configurations?

31 31 Exploration * = 320 points Algorithm Idea A (10) B (32) A and B interdependent + = 42 points A and C are independent A (10) C (32) C and B are independent C (32) B (32) + = 64 points 138 points With knowledge about dependency we prune 98.6% * * = 10240 points B (32) C (32) A (10) Directed graph

32 32 Exploration A  B : Pareto-optimal configurations of B calculated after Pareto-optimal configurations of nodes along the path A  B A  B  A, (cycle) : Pareto-optimal configurations of all the parameters on the cycle calculated simultaneously A : Pareto-optimal configurations calculated in isolation

33 33 Exploration A B C D J K E F G H I N O L M R S P Q V W T U X Y Z Dependency Graph

34 34 A B C D J K E F G H I N O L M R S P Q V W T U X Y Z Dependency graph Based on designer knowledge Computed by simulating all pairs of nodes (quadratic time complexity, approx.) One time effort Exploration

35 35 Exploration – Algorithm Step 1: Clustering followed by simulation A B C D J K E F G H I N O L M R S P Q V W T U X Y Z

36 36 Exploration – Algorithm A,H,I B,C,D,E, F,G J,K,T,U L,M,P,Q N,O,V,W X,Y,R,S Z A,H,I,B, C,D,E,F, G J,K,T, U,Z L,M,P,Q, N,O,V,W X,Y,R,S A,H,I,B,C,D,E,F, G,J,K,T,U,Z L,M,P,Q,N,O,V, W,X,Y,R,S A,H,I,B,C,D,E,F,G,J,K,T,U,Z,L,M,P,Q, N,O,V,W,X,Y,R,S Step 2: Pair-wise merge followed by simulation

37 37 Exploration Exhaustive solution Evaluate all points Sort by decreasing execution time Walk through the space, eliminate points with power > minimum seen so far! Substitute heuristics (only works for 1-4 parameters!)

38 38 Exploration Complexity: O((K + log(K)) * 2 N/K ) K is the number of clusters N is the number of parameters 2 N/K bounds the exhaustive comp. (K + log(k)) bounds the number of iterations Worse case K=1, best case K=N 2 N/K decrease rapidly as K increases (e.g., 2 26/2 +2 26/2 is much smaller than 2 26 !)

39 39 Outline Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

40 40 Exploration – Results JPEG Exploration time: 29.1 min Config. visited: 12352 (141) 5.10x exe. time 7.51x power 2.73x energy Pruning ratio > 0.99997

41 41 Exploration – Results CKEY Exploration time: 108 min Config. visited: 15890 (223) 8.31x exe. time 6.08x power 2.57x energy Pruning ratio > 0.99993

42 42 Exploration – Results IMAGE Exploration time: 50.2 min Config. visited: 10135 (80) 8.29x exe. time 8.57x power 1.81x energy Pruning ratio > 0.99998

43 43 Exploration – Results MATRIX Exploration time: 73.6 min Config. visited: 12623 (84) 10.7x exe. time 8.16x power 3.18x energy Pruning ratio > 0.99997

44 44 Exploration – Results JPEG

45 45 Conclusion Gave a system-level algorithm for exploring the solution space of an application mapped to a parameterized SOC architectures –Given a dependency graph we extensively prune the solution space –Pruning ratio > 0.99997 in experiments Future work: –Automatically compute the dependency model –Replace the exhaustive sub-algorithm with a heuristic (e.g., gradient search, GA)


Download ppt "System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center."

Similar presentations


Ads by Google