Presentation on theme: "ASAP 2005 Samos, Greece July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide."— Presentation transcript:
ASAP 2005 Samos, Greece July 23-25, Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide Patti Università di Catania Dipartimento di Ingegneria Informatica e delle Telecomunicazioni DIIT - University of Catania, Italy
ASAP 2005 Samos, Greece July 23-25, Outline Introduction Introduction VLIW in past & future VLIW in past & future Design Exploration Framework Design Exploration Framework ILP oriented compilation ILP oriented compilation Genetic Design Space Exploration Genetic Design Space Exploration Conclusions Conclusions
ASAP 2005 Samos, Greece July 23-25, Instruction Level Parallelism high performance processors in the 1980s: maximize ILP Issue more than one single instruction in a given clock cycle Who decides which instructions can be executed in parallel? Two different philosophies: Superscalar Very Long Instruction Word (VLIW)
ASAP 2005 Samos, Greece July 23-25, ILP philosophy: Superscalar Hide the process of finding ILP ILP is discovered dynamically at run-time by the control hardware of the processor HW Op1,Op2 Op3 Op4,Op5 … Foo.c Op1 Op2 Op3 Op4 Op5 … compiler Instruction stream Run-time
ASAP 2005 Samos, Greece July 23-25, ILP philosophy: VLIW Hardware resources are architecturally visible to the compiler Compiler can create a sequence of Very Long Instructions that defines the plan of execution HW simply execute the plan HW Foo.c Op1,Op2 Op3 Op4,Op5 compiler Hardware resources configuration Plan of execution Run-time
ASAP 2005 Samos, Greece July 23-25, VLIW past & future Decline of VLIWs for general purpose systems: Couldn’t be integrated in a single chip Binary compatibility between implementations Rediscovery of VLIW in embbeded No more integrability issues Binary incompatibility not relevant Advanteges of VLIW: Simplified hardware optimize ad-hoc the architecture to achieve ILP
ASAP 2005 Samos, Greece July 23-25, Reference architecture (HPL-PD) L2 Unified Cache Prefetch Cache Prefetch Unit Fetch Unit Instruction Queue Decode and Control Logic Predicate Registers Branch Registers General Prupose Registers Floating Point Registers Control Registers Load/Store Unit Branch Unit Integer Unit Floating Point Unit L1 Data Cache L1 Data Cache L1 Instruction Cache L1 Instruction Cache
ASAP 2005 Samos, Greece July 23-25, Configuration Space Three main parameter categories: VLIW core: Number of Registers in each register file (from 16 to 256) Number of istancies for Functional Units of each type (from 1 to 6) Mem Hierarchy: Size, Blocksize, Associativity for each of the caches (L1 Instruction, L1 Data, L2) Compiler: Conservative compilation strategy (basic blocks) Aggressive ILP oriented compilation strategy (hyperblocks) Total space size: 1.47 x configurations !
ASAP 2005 Samos, Greece July 23-25, An Open Platform: EPIC Explorer Interfacing to the Trimaran framework that provide VLIW compiler and simulator for dynamic statistics. Estimator component implementing high level models Explorer component implementing multi- objective design space exploration algorithms
ASAP 2005 Samos, Greece July 23-25, The Exploration Data Flow IMPACT Foo.c System configuration Processor Memory Emulib foo.exe Execution statistics Estimator Energy Power Cycles Explorer ELCOR
ASAP 2005 Samos, Greece July 23-25, Energy estimation Subdivide architecture in Functional Block Unit (FBU) Instruction decode logic, Integer units, floating point units, register files For each FBU (from ST Microelectronics LX) Active power: average power dissipated when the FBU is used Inactive power: average power dissipated when the FBU is not used From the execution statistic, we know how many cycles each FBU has been active/inactive E FBU =( P active cycles active + P inactive cycles inactive ) T clock Discrete degree of accuracy (about 25%) investigate relative power savings beetween designs
ASAP 2005 Samos, Greece July 23-25, Reference Application Set Chosen from MediaBench suite ApplicationCategory G721 encode Voice compression Gsm encode Speech transcoding Gsm decode Speech transcoding Ieee 810 IEEE 1180 inverse DCT JPEG Image compression MPEG2 decode Video decoding ADPCM encode Speech encoding ADPCM decode Speech decoding Fir FIR filter
ASAP 2005 Samos, Greece July 23-25, Exploration Methodology Preliminary analisys of compilation Impact of ILP oriented code transformations Predict the right compilation strategy: Basic Blocks (conservative) Hyper Blocks (aggressive, ILP-oriented) Multi-objective Design Space Exploration Extract Pareto Set
ASAP 2005 Samos, Greece July 23-25, Preliminary Analisys (1/3) For each objective, Unpaired two sample t-test allows to estimate the average effect of hyperblock formation Configuration Space CNCN CHCH Random subsets of n configurations T-test ONON OHOH Compilation with (H) and without (N) hyperblock formation Is the mean effect on the objective significant respect to the chosen critical difference?
ASAP 2005 Samos, Greece July 23-25, Preliminary Analisys (2/3) Example of a metric for critical difference in means: d > 50% M
ASAP 2005 Samos, Greece July 23-25, DSE: Genetic Mapping VLIW core VLIW core Cache Bus ctrl Mem Chromosome SizeBSizeAssocFunc unitsRegister Files
ASAP 2005 Samos, Greece July 23-25, DSE: Genetic Iteration Current Population Fitness Evaluation Simulation Estimation Performance Power Architecture configuration Architecture configuration Individual New Architecture configuration New Architecture configuration Selected ? Discendant Crossover Mutation
ASAP 2005 Samos, Greece July 23-25, DSE: Experimental Results Parameters Parameters : Initial population: 30 individuals Crossover probability: 0.8 Mutation probability: 0.1 Generations: 50 Example of two different scenarios: G721 encode: exploration should include the exploration of compilation strategy Gsm-encode: hyperblock formation is predicted to be a better choice
ASAP 2005 Samos, Greece July 23-25, Pareto Set (G721 encode)
ASAP 2005 Samos, Greece July 23-25, Pareto Set (GSM-encode)
ASAP 2005 Samos, Greece July 23-25, Conclusions Open platform for VLIW space exploration Estimate Power, Energy and Performance Preliminary Analisys of ILP-oriented compilation Genetic multi-objective design space exploration Future developments Clustered VLIW Network-on-chip multiprocessors Open source:
ASAP 2005 Samos, Greece July 23-25, Thanks for your attention !
ASAP 2005 Samos, Greece July 23-25, Appendix Bus Power Estimation Bus Power Estimation Implemented Algorithms Implemented Algorithms Multiobjective Fitness assignment Multiobjective Fitness assignment How Many Generations? How Many Generations?
ASAP 2005 Samos, Greece July 23-25, Summarizing Table BenchmarkVisited configurations Elapsed Time Pareto Set Power trade-off Exec time Trade-off Mpeg2dec113747h 737x6.8x Jpeg101217h836x8.2x Adpcm-enc154356h644x3x Adpcm-dec143344h763.5x4x G721-enc125683h942.5x2x
ASAP 2005 Samos, Greece July 23-25, Power Estimation (buses) Bus lines transitions computed from the list of data/address memory accesses P bus = 0.5 (V dd ) 2 f C l V dd supply voltage switching activity f clock frequency C l capacity of a bus line
ASAP 2005 Samos, Greece July 23-25, Design Space Exploration Implemented Algorithms : Exhaustive: intuitive, simple and …unfeasible Dependency analysis (dep), Givargis et al., [TVLSI’02] GA-based DSE (ga), Palesi et al., [CODES’01] Sensitivity Analysis, Fornaciari et al., [DAES’02] Pareto-based Sensitivity Analysis (pbsa), Palesi et al., [VLSI-SOC’01]
ASAP 2005 Samos, Greece July 23-25, Multiobjective Fitness assignment Strength Pareto Approach [Zitzler,Thiele] From current population P, is extracted an external set P*, containing the nondominated configuration of P. Fitness of P* element j : f j = n/(N+1) N = total size of P n = # of P configurations dominated by j Fitness of P element i: 1/S. S is the sum of the fitness values of the P* elements that dominates i
ASAP 2005 Samos, Greece July 23-25, How Many Generations? Fixed number of generations Autostop criteria Based on convergency power delay