Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploratie van de ontwerpruimte 2. De Hardware/software-grens Prof. dr. ir. Dirk Stroobandt Academiejaar 2004-2005.

Similar presentations


Presentation on theme: "Exploratie van de ontwerpruimte 2. De Hardware/software-grens Prof. dr. ir. Dirk Stroobandt Academiejaar 2004-2005."— Presentation transcript:

1 Exploratie van de ontwerpruimte 2. De Hardware/software-grens Prof. dr. ir. Dirk Stroobandt Academiejaar 2004-2005

2 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -2- Inhoud (deel 1) Inleiding over Ingebedde systemen, System-on-Chip en Platform- gebaseerd ontwerp Systeemspecificatietechnieken Exploratie van de ontwerpruimte –Prestatiematen –De hardware/software-grens Raamwerk voor architectuurexploratie Hoogniveautransformaties Hardware/software-partitionering Exploratietools –Prototypes, emulatie en simulatie

3 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -3- Simulatie en Verificatie Ontwerptraject Platformontwerp Hardware/software-partitionering Hoogniveausynthese Logisch ontwerp Fysisch ontwerp Software- compilatie Software- compilatie Interface- synthese Interface- synthese HWSW Hardware-ontwerp Communicatie Component- selectie Component- selectie Testing Systeemspecificatie Architectuurexploratie Analoog ontwerp Analoog ontwerp

4 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -4- Inhoud (deel 1) Inleiding over Ingebedde systemen, System-on-Chip en Platform- gebaseerd ontwerp Systeemspecificatietechnieken Exploratie van de ontwerpruimte –Prestatiematen –De hardware/software-grens Raamwerk voor architectuurexploratie Hoogniveautransformaties Hardware/software-partitionering Exploratietools –Prototypes, emulatie en simulatie

5 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -5- Raamwerk voor architectuurexploratie Gegeven: initiële specificatie –Beschrijving van het systeemgedrag op algoritmeniveau –Een set van vereisten (snelheid, oppervlakte, vermogen, …)

6 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -6- Raamwerk voor architectuurexploratie Gevraagd: systeemarchitectuurspecificatie –Definitie van architecturale componenten en interfaces –Eerste partitionering van de algoritmische beschrijving in segmenten die elk door een andere architecturale component geïmplementeerd zullen worden –Een grondplan van de systeemarchitectuur –Budgetten voor tijd, oppervlakte, vermogen voor de architecturale componenten

7 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -7- Raamwerk voor architectuurexploratie

8 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -8- Iteratieve aanpak Geleid door prestatie- analyse

9 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -9- Hoe vind je heuristisch goede oplossingen? Types van hardware hulpbronnen –Hulpbronnen inherent aan het algoritme Functionele eenheden (vermenigvuldigers, optellers, …) Redelijk goed te schatten op systeemniveau –Hulpbronnen voor implementatie-overhead Al de rest (controlelogica, bussen, MUXen, registers, draden) Niet (nauwelijks) te schatten op systeemniveau (zeker niet voor dedicated hardware) Oplossing: taxonomie op basis van potentieel om specifieke eigenschappen te gebruiken die een minimale implementatie-overhead genereren

10 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -10- Voorbeelden van eigenschappen met weinig implementatie-overhead Lokaliteit van berekeningen –Hoge lokaliteit = berekeningen zijn geïsoleerd, hebben een sterk geïnterconnecteerde deelgraaf –Veel meer data tussen knopen binnen de cluster dan naar buiten Regelmatigheid –Herhaalde berekening van bepaalde functiepatronen –Hergebruik van berekeningselementen Korte tijdsafhankelijkheden –Liefst lokaliteit in de tijd: vermijden van globale bussen

11 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -11- Inhoud (deel 1) Inleiding over Ingebedde systemen, System-on-Chip en Platform- gebaseerd ontwerp Systeemspecificatietechnieken Exploratie van de ontwerpruimte –Prestatiematen –De hardware/software-grens Raamwerk voor architectuurexploratie Hoogniveautransformaties Hardware/software-partitionering Exploratietools –Prototypes, emulatie en simulatie

12 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -12- Voorbereidende taken Task level concurrency management Which tasks in the final system? High level transformations Transformations that are outside the scope of traditional compilers

13 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -13- Task-level concurrency management Granularity: size of tasks (e.g. in instructions) Readable specifications and efficient implementations can possibly require different task structures.  Granularity changes

14 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -14- Merging of tasks Reduced overhead of context switches, More global optimization of machine code, Reduced overhead for inter-process/task communication.

15 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -15- Splitting of tasks No blocking of resources while waiting for input, More flexibility for scheduling, possibly improved result.

16 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -16- Merging and splitting of tasks The most appropriate task graph granularity depends upon the context  merging and splitting may be required. Merging and splitting of tasks should be done automatically, depending upon the context.

17 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -17- High-level optimizations To improve (potentially) the efficiency of embedded software and hardware

18 -18- Floating-point to fixed point conversion Pros –Lower cost –Faster –Lower power consumption –Sufficient SNR, if properly scaled –Suitable for portable applications Cons –Decreased dynamic range –Finite word-length effect, unless properly scaled Overflow and excessive quantization noise –Extra programming effort © Ki-Il Kum, et al. (Seoul National University): A Floating-point To Fixed-point C Converter For Fixed-point Digital Signal Processors, 2nd SUIF Workshop, 1996

19 -19- Fixed-Point Data Format Floating-Point vs. Fixed- Point –exponent, mantissa –Floating-Point automatic computation and update of each exponent at run-time –Fixed-Point implicit exponent determined off-line S100...000010 hypothetical binary point IWL=3 Integer vs. Fixed-Point S100...000010 (a) Integer (b) Fixed-Point FWL © Ki-Il Kum, et al

20 -20- Assignment and Addition/Subtraction Assume y = x, with -x (IWL=2) and -y (IWL=3): s s x x>>1 y s Let result = x + y: equalizing each IWL s s x x>>1 y s s result + © Ki-Il Kum, et al

21 -21- Development Procedure Range Estimation C Program Execution Floating-Point C Program Fixed-Point C Program Floating- Point to Fixed-Point C Program Converter Range Estimator Manual specification IWL information © Ki-Il Kum, et al

22 -22- Performance Comparison - Machine Cycles - © Ki-Il Kum, et al

23 -23- Performance Comparison - Machine Cycles - © Ki-Il Kum, et al

24 -24- Performance Comparison - SNR - © Ki-Il Kum, et al

25 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -25- Fridge RWTH Aachen, commercialized by Synopsys as part of the CoCentric tool suite. Used type definition features of C++ to define types Fixed and fixed. Using types in declarations: fixed a, *b, c[8] Defining types in assignments: a= fixed(5,4,wt,*b) Word-length Wrap-around truncation Fractional wordlength

26 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -26- Other work on the topic Fridge (RWTH Aachen), commercialized by Synopsys Some support in Simulink (MATLAB toolbox).. hundreds of papers on the topic.

27 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -27- Column major order Simple loop transformations: Loop permutation Array p[j][k] Row major order j=0 j=1 j=2 … … k=0 k=1 k=2 j=0 j=1 … j=0 j=1 … j=0 j=1 …

28 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -28- Simple loop transformations: Loop permutation For row major order Two loops, assuming row major order (C): for (k=0; k<=m; k++) for (j=0; j<=n; j++) for (j=0; j<=n; j++) ) for (k=0; k<=m; k++) p[j][k] =... p[j][k] =... Poor cache behaviorGood cache behavior Two loops, assuming row major order (C): for (k=0; k<=m; k++) for (j=0; j<=n; j++) for (j=0; j<=n; j++) ) for (k=0; k<=m; k++) p[j][k] =... p[j][k] =... Poor cache behaviorGood cache behavior

29 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -29- Loop fusion, loop fission for(j=0; j<=n; j++) for (j=0; j<=n; j++) p[j]=... ; {p[j]=... ; for (j=0; j<=n; j++), p[j]= p[j] +...} p[j]= p[j] +... Loops small enough toBetter locality for allow zero overheadaccess to p. loopsBetter chances for parallel execution. Which of the two versions is best?

30 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -30- Loop unrolling for (j=0; j<=n; j++) for (j=0; j<=n; j+=2) p[j]=... ; {p[j]=... ; p[j+1]=...} % factor = 2 Less branches per execution of the loop. More opportunities for optimizations. Tradeoff between code size and improvement. Extreme case: completely unrolled loop (no branch).

31 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -31- Loop tiling/loop blocking Original version for (i=1; i<=N; i++) for(k=1; k<=N; k++){ r=X[i,k]; /* to be allocated to a register*/ for (j=1; j<=N; j++) Z[i,j] += r* Y[k,j] } % Never reusing information in the cache for Y and Z if N is large or cache is small (2 N³ references for Z and Y + N² references for X). j++ k++ i++ j++ k++ i++ j++ k++ i++

32 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -32- Loop tiling/loop blocking tiled version for (kk=1; kk<= N; kk+=B) for (jj=1; jj<= N; jj+=B) for (i=1; i<= N; i++) for (k=kk; k<= min(kk+B-1,N); k++){ r=X[i][k]; /* to be allocated to a register*/ for (j=jj; j<= min(jj+B-1, N); j++) Z[i][j] += r* Y[k][j] } Reuse factor of B for Z and Y, O(N³/B) accesses to main memory Same elements for next iteration of i k++, j++ jj kk j++ k++ i++ jj k++ i++ kk j++ jj kk i++

33 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -33- High-level transformations Example: Separation of margin handling + many if- statements for margin-checking no checking, efficient only few margin elements to be processed

34 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -34- if (x>=10||y>=14) for (y=0; y<49; y++) for (k=0; k<9; k++) for (l=0; l<9;l++ ) for (i=0; i<4; i++) for (j=0; j<4;j++) { then_block_1; then_block_2} else {y1=4*y; for (k=0; k<9; k++) {x2=x1+k-4; for (l=0; l<9; ) {y2=y1+l-4; for (i=0; i<4; i++) {x3=x1+i; x4=x2+i; for (j=0; j<4;j++) {y3=y1+j; y4=y2+j; if (0 || 35 { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/2172582/8/slides/slide_33.jpg", "name": "Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -34- if (x>=10||y>=14) for (y=0; y<49; y++) for (k=0; k<9; k++) for (l=0; l<9;l++ ) for (i=0; i<4; i++) for (j=0; j<4;j++) { then_block_1; then_block_2} else {y1=4*y; for (k=0; k<9; k++) {x2=x1+k-4; for (l=0; l<9; ) {y2=y1+l-4; for (i=0; i<4; i++) {x3=x1+i; x4=x2+i; for (j=0; j<4;j++) {y3=y1+j; y4=y2+j; if (0 || 35

35 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -35- Results for loop nest splitting - Execution times - [H. Falk et al., Inf 12, UniDo, 2002]

36 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -36- Results for loop nest splitting - Code sizes - [Falk, 2002]

37 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -37- Array folding Initial arrays

38 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -38- Array folding Unfolded arrays

39 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -39- Inter-array folding Intra-array folding

40 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -40- Function inlining: advantages and limitations Function sq(c:integer) return:integer; begin return c*c end;.... a=sq(b);........ LD R1,b; MUL R1,R1,R1; ST R1,a.... push PC; push b; BRA sq; pull R1; mul R1,R1,R1; pull R2; push R1; BRA (R2)+1; pull R1; ST R1,a; Advantage: low calling overhead Limitations: Not all functions are candidates. Code size explosion. Requires manual identification using ‘inline’ qualifier. Inlining branching Goal: Controlled code size Automatic identification of suitable functions.

41 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -41- Results for GSM speech and channel encoder: #calls, #cycles (TI ‘C62xx) 33% speedup for 25% increase in code size. # of cycles not a monotonically decreasing function of the code size! L [%]

42 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -42- Inline vectors computed by B&B algorithm sizelimit (%)inlinevector (functions 1-26) 10000000000000000000000000000 10500100000001100001110111111 11010111001011100001111111111 11510110000000001001000111001 12010110100101000100110111101 12510110000001010000100111101 13000110000000010100100111000 13510110010001110101110111101 14010111011111110101111111111 14510110110101010100110111101 15010110110000010110110111101 Major changes for each new size limit. Difficult to generate manually. References: J. Teich, E. Zitzler, S.S. Bhattacharyya. 3D Exploration of Software Schedules for DSP Algorithms, CODES’99 R. Leupers, P.Marwedel: Function Inlining under Code Size Constraints for Embedded Processors ICCAD, 1999

43 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -43- Inhoud (deel 1) Inleiding over Ingebedde systemen, System-on-Chip en Platform- gebaseerd ontwerp Systeemspecificatietechnieken Exploratie van de ontwerpruimte –Prestatiematen –De hardware/software-grens Raamwerk voor architectuurexploratie Hoogniveautransformaties Hardware/software-partitionering Exploratietools –Prototypes, emulatie en simulatie

44 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -44- Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special purpose hardware in the long run? Maybe correct for fixed functionality, wrong in general, since “By the time MPEG-n can be implemented in software, MPEG- n+1 has been invented” [de Man]

45 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -45- Functionality to be implemented in software or in hardware? Decision based on hardware/ software partitioning, a special case of hardware/ software codesign.

46 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -46- Codesign Tool (COOL) as an example of HW/SW partitioning Inputs to COOL: 1.Target technology Available HW platforms Multiprocessors OK but all of same type ASIC: synthesizable (technology library) 2.Design constraints Throughput, latency, memory size, area 3.Required behavior Hierarchical task graphs Behaviour specified in VHDL Communication edges and timing edges Inputs to COOL: 1.Target technology Available HW platforms Multiprocessors OK but all of same type ASIC: synthesizable (technology library) 2.Design constraints Throughput, latency, memory size, area 3.Required behavior Hierarchical task graphs Behaviour specified in VHDL Communication edges and timing edges

47 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -47- Hardware/software codesign: approach [Niemann, Hardware/Software Co-Design for Data Flow Dominated Embedded Systems, Kluwer Academic Publishers, 1998 (Comprehensive mathematical model)] Processor P1 Processor P2 Hardware Specification Mapping

48 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -48- Steps of the COOL partitioning algorithm (1) 1.Translation of the behavior into an internal graph model 2.Translation of the behavior of each node from VHDL into C 3.Compilation All C programs compiled for the target processor, Computation of the resulting program size, estimation of the resulting execution time (simulation input data might be required) 4.Synthesis of hardware components:  leaf node, application-specific hardware is synthesized. High-level synthesis must be sufficiently fast! 1.Translation of the behavior into an internal graph model 2.Translation of the behavior of each node from VHDL into C 3.Compilation All C programs compiled for the target processor, Computation of the resulting program size, estimation of the resulting execution time (simulation input data might be required) 4.Synthesis of hardware components:  leaf node, application-specific hardware is synthesized. High-level synthesis must be sufficiently fast!

49 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -49- Steps of the COOL partitioning algorithm (2) 5.Flattening of the hierarchy: Granularity used by the designer is maintained. Cost and performance information added to the nodes. Precise information required for partitioning is pre- computed 6.Generating and solving a mathematical model of the optimization problem: Integer programming IP model for optimization. Optimal with respect to the cost function (approximates communication time) 5.Flattening of the hierarchy: Granularity used by the designer is maintained. Cost and performance information added to the nodes. Precise information required for partitioning is pre- computed 6.Generating and solving a mathematical model of the optimization problem: Integer programming IP model for optimization. Optimal with respect to the cost function (approximates communication time)

50 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -50- Steps of the COOL partitioning algorithm (3) 7.Iterative improvements: Adjacent nodes mapped to the same hardware component are now merged.

51 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -51- Steps of the COOL partitioning algorithm (4) 8.Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created.

52 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -52- Integer programming models Ingredients: 1.Cost function 2.Constraints Ingredients: 1.Cost function 2.Constraints Involving linear expressions of integer variables from a set X Def.: The problem of minimizing (1) subject to the constraints (2) is called an integer programming (IP) problem. If all x i are constrained to be either 0 or 1, the IP problem said to be a 0/1 integer programming problem. Cost function Constraints: ℕ ℝ

53 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -53- Example Optimal C

54 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -54- Remarks on integer programming  Maximizing the cost function can be done by setting C‘=-C  Integer programming is NP-complete.  In practice, running times can increase exponentially with the size of the problem, but problems of some thousands of variables can still be solved with commercial solvers, depending on the size and structure of the problem.  IP models can be a good starting point for modeling, even if in the end heuristics have to be used to solve them.  Maximizing the cost function can be done by setting C‘=-C  Integer programming is NP-complete.  In practice, running times can increase exponentially with the size of the problem, but problems of some thousands of variables can still be solved with commercial solvers, depending on the size and structure of the problem.  IP models can be a good starting point for modeling, even if in the end heuristics have to be used to solve them.

55 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -55- An IP model for HW/SW partitioning Notation:  Index set I denotes task graph nodes.  Index set L denotes task graph node types e.g. square root, DCT or FFT  Index set KH denotes hardware component types. e.g. hardware components for the DCT or the FFT.  Index set J of hardware component instances  Index set KP denotes processors. All processors are assumed to be of the same type Notation:  Index set I denotes task graph nodes.  Index set L denotes task graph node types e.g. square root, DCT or FFT  Index set KH denotes hardware component types. e.g. hardware components for the DCT or the FFT.  Index set J of hardware component instances  Index set KP denotes processors. All processors are assumed to be of the same type

56 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -56- An IP model for HW/SW partitioning  X i,k : =1 if node v i is mapped to hardware component type k  KH and 0 otherwise.  Y i,k : =1 if node v i is mapped to processor k  KP and 0 otherwise.  NY l,k =1 if at least one node of type l is mapped to processor k  KP and 0 otherwise.  T is a mapping from task graph nodes to their types: T: I  L  The cost function accumulates the cost of hardware units: C = cost(processors) + cost(memories) + cost(application specific hardware)  X i,k : =1 if node v i is mapped to hardware component type k  KH and 0 otherwise.  Y i,k : =1 if node v i is mapped to processor k  KP and 0 otherwise.  NY l,k =1 if at least one node of type l is mapped to processor k  KP and 0 otherwise.  T is a mapping from task graph nodes to their types: T: I  L  The cost function accumulates the cost of hardware units: C = cost(processors) + cost(memories) + cost(application specific hardware)

57 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -57- Constraints  Operation assignment constraints All task graph nodes have to be mapped either in software or in hardware. Variables are assumed to be integers. Additional constraints to guarantee they are either 0 or 1:

58 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -58- Operation assignment constraints (2)  l  L,  i:T(v i )=c l,  k  KP: NY l,k  Y i,k For all types l of operations and for all nodes i of this type: if i is mapped to some processor k, then that processor must implement the functionality of l. Decision variables must also be 0/1 variables:  l  L,  k  KP: NY l,k  1.  l  L,  i:T(v i )=c l,  k  KP: NY l,k  Y i,k For all types l of operations and for all nodes i of this type: if i is mapped to some processor k, then that processor must implement the functionality of l. Decision variables must also be 0/1 variables:  l  L,  k  KP: NY l,k  1.

59 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -59- Resource & design constraints  k  KH, the cost (area) used for components of that type is calculated as the sum of the costs of the components of that type. This cost should not exceed its maximum.  k  KP, the cost for associated data storage area should not exceed its maximum.  k  KP the cost for storing instructions should not exceed its maximum. The total cost (  k  KH ) of HW components should not exceed its maximum The total cost of data memories (  k  KP ) should not exceed its maximum The total cost instruction memories (  k  KP ) should not exceed its maximum  k  KH, the cost (area) used for components of that type is calculated as the sum of the costs of the components of that type. This cost should not exceed its maximum.  k  KP, the cost for associated data storage area should not exceed its maximum.  k  KP the cost for storing instructions should not exceed its maximum. The total cost (  k  KH ) of HW components should not exceed its maximum The total cost of data memories (  k  KP ) should not exceed its maximum The total cost instruction memories (  k  KP ) should not exceed its maximum

60 2004-2005 Scheduling Processor p 1 ASIC h 1 FIR 1 FIR 2 v1v1 v2v2 v3v3 v4v4 v9v9 v 10 v 11 v5v5 v6v6 v7v7 v8v8 e3e3 e4e4 t p1p1 v8v8 v7v7 v7v7 v8v8 or... t c1c1 or... e3e3 e3e3 e4e4 e4e4 t FIR 2 on h 1 v4v4 v3v3 v3v3 v4v4 or... Communication channel c 1

61 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -61- Scheduling / precedence constraints For all nodes v i1 and v i2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable b i1,i2 with b i1,i2 =1 if v i1 is executed before v i2 and = 0 otherwise. Define constraints of the type (end-time of v i1 )  (start time of v i2 ) if b i1,i2 =1 and (end-time of v i2 )  (start time of v i1 ) if b i1,i2 =0 Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph. For all nodes v i1 and v i2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable b i1,i2 with b i1,i2 =1 if v i1 is executed before v i2 and = 0 otherwise. Define constraints of the type (end-time of v i1 )  (start time of v i2 ) if b i1,i2 =1 and (end-time of v i2 )  (start time of v i1 ) if b i1,i2 =0 Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph.

62 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -62- Other constraints Timing constraints These constraints can be used to guarantee that certain time constraints are met. Some less important constraints omitted.. Timing constraints These constraints can be used to guarantee that certain time constraints are met. Some less important constraints omitted..

63 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -63- Example HW types H1, H2 and H3 with costs of 20, 25, and 30. Processors of type P. Tasks T1 to T5. Execution times: HW types H1, H2 and H3 with costs of 20, 25, and 30. Processors of type P. Tasks T1 to T5. Execution times: TH1H2H3P 120100 220100 31210 41210 520100

64 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -64- Operation assignment constraints (1) TH1H2H3P 120100 220100 31210 41210 520100 X 1,1 +Y 1,1 =1 (task 1 mapped to H1 or to P) X 2,2 +Y 2,1 =1 X 3,3 +Y 3,1 =1 X 4,3 +Y 4,1 =1 X 5,1 +Y 5,1 =1

65 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -65- Operation assignment constraints (2) Assume types of tasks are l=1, 2, 3, 3, and 1.  l  L,  i:T(v i )=c l,  k  KP: NY l,k  Y i,k Assume types of tasks are l=1, 2, 3, 3, and 1.  l  L,  i:T(v i )=c l,  k  KP: NY l,k  Y i,k Functionality 3 to be implemented on processor if node 4 is mapped to it.

66 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -66- Other equations Time constraints leading to: Application specific hardware required for time constraints under 100 time units. TH1H2H3P 120100 220100 31210 41210 520100 Cost function: C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor) + cost(memory)

67 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -67- Result For a time constraint of 100 time units and cost(P) { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/2172582/8/slides/slide_66.jpg", "name": "Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -67- Result For a time constraint of 100 time units and cost(P)

68 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -68- Separation of scheduling and partitioning Combined scheduling/partitioning very complex;  Heuristic: 1.Compute estimated schedule 2.Perform partitioning for estimated schedule 3.Perform final scheduling 4.If final schedule does not meet time constraint, go to 1 using a reduced overall timing constraint. Combined scheduling/partitioning very complex;  Heuristic: 1.Compute estimated schedule 2.Perform partitioning for estimated schedule 3.Perform final scheduling 4.If final schedule does not meet time constraint, go to 1 using a reduced overall timing constraint. 2. Iteration t specification Actual execution time 1. Iteration approx. execution time t Actual execution time approx. execution time New specification

69 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -69- Application example Audio lab (mixer, fader, echo, equalizer, balance units) slow SPARC processor 1µ ASIC library Allowable delay of 22.675 µs (~ 44.1 kHz) Audio lab (mixer, fader, echo, equalizer, balance units) slow SPARC processor 1µ ASIC library Allowable delay of 22.675 µs (~ 44.1 kHz) SPARC processor ASIC (Compass, 1 µ) External memory Outdated technology; just a proof of concept.

70 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -70- Design space for audio lab Everything in software: 72.9 µs, 0 2 Everything in hardware: 3.06 µs, 457.9x10 6 2 Lowest cost for given sample rate: 18.6 µs, 78.4x10 6 2,

71 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -71- Final remarks COOL approach: shows that formal model of hardware/SW codesign is beneficial; IP modeling can lead to useful implementation even if optimal result is available only for small designs. Other approaches for HW/SW partitioning: starting with everything mapped to hardware; gradually moving to software as long as timing constraint is met. starting with everything mapped to software; gradually moving to hardware until timing constraint is met. Binary search. COOL approach: shows that formal model of hardware/SW codesign is beneficial; IP modeling can lead to useful implementation even if optimal result is available only for small designs. Other approaches for HW/SW partitioning: starting with everything mapped to hardware; gradually moving to software as long as timing constraint is met. starting with everything mapped to software; gradually moving to hardware until timing constraint is met. Binary search.


Download ppt "Exploratie van de ontwerpruimte 2. De Hardware/software-grens Prof. dr. ir. Dirk Stroobandt Academiejaar 2004-2005."

Similar presentations


Ads by Google