Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design Space Exploration with SimpleScalar. Vittorio Zaccaria – ST 2001 The SimpleScalar Toolset.

Similar presentations


Presentation on theme: "Design Space Exploration with SimpleScalar. Vittorio Zaccaria – ST 2001 The SimpleScalar Toolset."— Presentation transcript:

1 Design Space Exploration with SimpleScalar

2 Vittorio Zaccaria – ST 2001 The SimpleScalar Toolset

3 Vittorio Zaccaria – ST 2001 The Simplescalar Toolset

4 Vittorio Zaccaria – ST 2001 Simluation Suite

5 Vittorio Zaccaria – ST 2001 SimpleScalar ISA clean and simple instruction set architecture: MIPS/ DLX + more addressing modes - delay slots 64- bit inst encoding facilitates instruction set research 16- bit space for hints, new insts, and annotations four operand instruction format, up to 256 registers

6 Vittorio Zaccaria – ST 2001 SimpleScalar Architected State

7 Vittorio Zaccaria – ST 2001 Out of order simulator Configurable set of FUs

8 Vittorio Zaccaria – ST 2001 Configurable Memory Hierarchy All caches and TLB configurations specified with same format: : : : Block replacement policy l - for LRU f - for FIFO r - for RANDOM

9 Vittorio Zaccaria – ST 2001 Configurable Memory Hierarchy

10 Vittorio Zaccaria – ST 2001 Design Space Exploration Metric definition Energy*Delay Area*Delay Design space definition L1 and L2 caches, n° ALUs... Embedded Application Definition Metric minimization Exhaustive search Greedy search Gradient search Simulated Annealing and so on

11 Vittorio Zaccaria – ST 2001 Design Space Exploration: A case study. Metric Defined: Price over Performance= area*CPI Design space: Sets, block, associativity and replacement polocy for each cache; number of integer ALUs; number of integer multipliers; number of floating-point ALUs; number of floating-point multipliers. Design space exploration performed by F. Cassoli and A. ALARI

12 Vittorio Zaccaria – ST 2001 Design Space Definition Ranges for each parameter DL1:128:{32, 64}:4:L IL1:{256, 512}:32:1:L UL2:{1024, 2048}:{64, 128}:4:{L, F} IALU:{2, 4} IMULT:{1, 2, 4} FPALU:{1, 4} FPMULT:{1, 2} 768 different cases

13 Vittorio Zaccaria – ST 2001 Embedded Application EPIC decoder (Efficient Pyramid Image deCoder) Image data compression utility written in C. Free Mediabench Source Based on wavelet decomposition and a Huffman entropy (de)coder.

14 Vittorio Zaccaria – ST 2001 Cost Function F(x)= A(x)*D(x) Area of x (sum of equivalent gates of each module). Models found in the literature. Delay of x (computed through simulation of EPIC on architecture x).

15 Vittorio Zaccaria – ST 2001 Result of the exhaustive search

16 Vittorio Zaccaria – ST 2001 Optimal Configuration The lowest value of the PoP is 998’732.31, obtained with: DL1:128:32:4:L IL1:256:32:1:L UL2:1024:64:4:F IALU:4 IMULT:2 FPALU:4 FPMULT: 2

17 Vittorio Zaccaria – ST 2001 Cost Function Properties The difference between the PoPs for a DL1 cache of 32 and of 64 sets is very little. The difference between the PoPs for a IL1 cache of 256 and of 512 sets is very little.

18 Vittorio Zaccaria – ST 2001

19 Cost Function Properties Increasing the sets of UL2 increases the PoP (in average). Augmenting the dimension of the block of the UL2 cache always leads to an abrupt growth of the PoP. The L2-cache dimension grows very much, so that the cache becomes significantly larger that the rest of the system.

20 Vittorio Zaccaria – ST 2001 Cost Function Properties

21 Vittorio Zaccaria – ST 2001 Cost Function Properties

22 Vittorio Zaccaria – ST 2001 Cost Function Properties

23 Vittorio Zaccaria – ST 2001 Area – CPI scatter plot

24 Vittorio Zaccaria – ST 2001 Conclusions Reduction of PoP when the number of integer ALUs is doubled. Great benefit with reduced area increase. Optimal configuration has IMULT = 2, (not 1 or 4, because EPIC does not expose much parallelism). However FPALU = 4 leads to better results than FPALU = 1. L2 FIFO policy outperforms LRU. Same benefits when adding an FPMULT.

25 Vittorio Zaccaria – ST 2001 Conclusions A greedy algorithm has also been applied to minimize the cost function. Starting from different points average number of simulations required= 49 minimum number of simulations required= 11 maximum number of simulations required=83 Full search optimum always reached Considering that an exhaustive search needs 768 simulations, we reduce time of about 93.6%.


Download ppt "Design Space Exploration with SimpleScalar. Vittorio Zaccaria – ST 2001 The SimpleScalar Toolset."

Similar presentations


Ads by Google