Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2005 ECNU SEIPrinciples of Embedded Computing System Design1 Program design and analysis zOptimizing for execution time. zOptimizing for energy/power.

Similar presentations


Presentation on theme: "© 2005 ECNU SEIPrinciples of Embedded Computing System Design1 Program design and analysis zOptimizing for execution time. zOptimizing for energy/power."— Presentation transcript:

1 © 2005 ECNU SEIPrinciples of Embedded Computing System Design1 Program design and analysis zOptimizing for execution time. zOptimizing for energy/power. zOptimizing for program size.

2 © 2005 ECNU SEIPrinciples of Embedded Computing System Design2 Motivation (P.186) zEmbedded systems must often meet deadlines. yFaster may not be fast enough. zNeed to be able to analyze execution time. yWorst-case, not typical. zNeed techniques for reliably improving execution time.

3 © 2005 ECNU SEIPrinciples of Embedded Computing System Design3 Run times will vary (P.186) zProgram execution times depend on several factors: yInput data values. yState of the instruction, data caches. yPipelining effects.

4 © 2005 ECNU SEIPrinciples of Embedded Computing System Design4 Measuring program speed zCPU simulator. yI/O may be hard. yMay not be totally accurate. zHardware timer. yRequires board, instrumented program. zLogic analyzer. yLimited logic analyzer memory size.

5 © 2005 ECNU SEIPrinciples of Embedded Computing System Design5 Program performance metrics zAverage-case: yFor typical data values, whatever they are. zWorst-case: yFor any possible input set. zBest-case: yFor any possible input set. zToo-fast programs may cause critical races at system level.

6 © 2005 ECNU SEIPrinciples of Embedded Computing System Design6 What data values? zWhat values create worst/average/best case behavior? yanalysis; yexperimentation. zConcerns: yoperations; yprogram paths.

7 © 2005 ECNU SEIPrinciples of Embedded Computing System Design7 Performance analysis (P.187) zElements of program performance : yexecution time = program path + instruction timing yPath depends on data values. Choose which case you are interested in. yInstruction timing depends on pipelining, cache behavior.

8 © 2005 ECNU SEIPrinciples of Embedded Computing System Design8 Programs and performance analysis zBest results come from analyzing optimized instructions, not high-level language code: ynon-obvious translations of HLL statements into instructions; ycode may move; ycache effects are hard to predict.

9 © 2005 ECNU SEIPrinciples of Embedded Computing System Design9 Program paths (P.188) zConsider for loop: for (i=0, f=0, i<N; i++) f = f + c[i]*x[i]; zLoop initiation block executed once. zLoop test executed N+1 times. zLoop body and variable update executed N times. i<N i=0; f=0; f = f + c[i]*x[i]; i = i+1; N Y test body update initialization

10 © 2005 ECNU SEIPrinciples of Embedded Computing System Design10 Instruction timing (P.189) zNot all instructions take the same amount of time. yHard to get execution time data for instructions. zInstruction execution times are not independent. zExecution time may depend on operand values.

11 © 2005 ECNU SEIPrinciples of Embedded Computing System Design11 Trace-driven performance analysis (P.189) zTrace: a record of the execution path of a program. zTrace gives execution path for performance analysis. zA useful trace: yrequires proper input values; yis large (gigabytes). Trace processors Rotenberg, E.; Jacobson, Q.; Sazeides, Y.; Smith, J.; Microarchitecture, 1997. Proceedings. Thirtieth Annual IEEE/ACM International Symposium on, 1-3 Dec 1997 Page(s): 138 -148

12 © 2005 ECNU SEIPrinciples of Embedded Computing System Design12 Trace generation (P.190) zHardware capture: ylogic analyzer; yhardware assist in CPU. zSoftware: yPC sampling. yInstrumentation instructions. ySimulation.

13 © 2005 ECNU SEIPrinciples of Embedded Computing System Design13 Trace scheduling Trace scheduling: the most likely path is found, and its basic blocks are merged into one. Bookkeeping is required to ensure correctness.

14 © 2005 ECNU SEIPrinciples of Embedded Computing System Design14 Loop optimizations (P.191) zLoops are good targets for optimization. zBasic loop optimizations: ycode motion; yinduction-variable elimination; ystrength reduction (x*2 x<<1).

15 © 2005 ECNU SEIPrinciples of Embedded Computing System Design15 Code motion for (i=0; i<N*M; i++) z[i] = a[i] + b[i]; i<N*M i=0; z[i] = a[i] + b[i]; i = i+1; N Y i<X i=0; X = N*M

16 © 2005 ECNU SEIPrinciples of Embedded Computing System Design16 Induction variable elimination zInduction variable: loop index. zConsider loop: for (i=0; i<N; i++) for (j=0; j<M; j++) z[i][j] = b[i][j]; zRather than recompute i*M+j for each array in each iteration, share induction variable between arrays, increment at end of loop body. Cf. P.192

17 © 2005 ECNU SEIPrinciples of Embedded Computing System Design17 Cache analysis zLoop nest: set of loops, one inside other. yRewrite loop nest to change the order of access array. zPerfect loop nest: no conditionals in nest. zBecause loops use large quantities of data, cache conflicts are common.

18 © 2005 ECNU SEIPrinciples of Embedded Computing System Design18 Array conflicts in cache (P.194) a[0][0] b[0][0] main memory cache 10244096... 1024 4096 pad

19 © 2005 ECNU SEIPrinciples of Embedded Computing System Design19 Array conflicts, cont’d. zArray elements conflict because they are in the same line, even if not mapped to same location. zSolutions: ymove one array; ypad array.

20 © 2005 ECNU SEIPrinciples of Embedded Computing System Design20 zUse registers efficiently. zUse page mode memory accesses. zAnalyze cache behavior: yinstruction conflicts can be handled by rewriting code, rescheudling; yconflicting scalar data can easily be moved; yconflicting array data can be moved, padded. Performance optimization hints

21 © 2005 ECNU SEIPrinciples of Embedded Computing System Design21 Energy/power optimization (P.195) zEnergy: ability to do work. yMost important in battery-powered systems. zPower: energy per unit time. yImportant even in wall-plug systems---power becomes heat.

22 © 2005 ECNU SEIPrinciples of Embedded Computing System Design22 Measuring energy consumption zExecute a small loop, measure current: while (TRUE) a(); I CPU

23 © 2005 ECNU SEIPrinciples of Embedded Computing System Design23 Sources of energy consumption zRelative energy per operation (Catthoor et al): ymemory transfer: 33 yexternal I/O: 10 ySRAM write: 9 ySRAM read: 4.4 ymultiply: 3.6 yadd: 1 Cf. Fig.5-26 P.196

24 © 2005 ECNU SEIPrinciples of Embedded Computing System Design24 Cache behavior is important zEnergy consumption has a sweet spot as cache size changes: ycache too small: program thrashes, burning energy on external memory accesses; ycache too large: cache itself burns too much power. Cf. Fig.5-27 P.197 cache ~ energy cache ~ execute time

25 © 2005 ECNU SEIPrinciples of Embedded Computing System Design25 Optimizing for energy (P.198) zFirst-order optimization: yhigh performance = low energy. zNot many instructions trade speed for energy. ?

26 © 2005 ECNU SEIPrinciples of Embedded Computing System Design26 Optimizing for energy, cont’d. zUse registers efficiently. zIdentify and eliminate cache conflicts. zUse page mode memory accesses. zModerate loop unrolling eliminates some loop overhead instructions. zEliminate pipeline stalls. zInlining procedures may help: reduces linkage, but may increase cache thrashing.

27 © 2005 ECNU SEIPrinciples of Embedded Computing System Design27 Optimizing for program size zGoal: yreduce hardware cost of memory; yreduce power consumption of memory units. zTwo opportunities: ydata; yinstructions.

28 © 2005 ECNU SEIPrinciples of Embedded Computing System Design28 Data size minimization zReuse constants, variables, data buffers in different parts of code. yRequires careful verification of correctness. yEliminates the copy of data zGenerate data using instructions.

29 © 2005 ECNU SEIPrinciples of Embedded Computing System Design29 Reducing code size zAvoid function inlining. zChoose CPU with compact instructions. yARM Thumb yMIPS-16 yVariable length of instruction zUse specialized instructions where possible. yRPTS/RPTB zCode compression contradiction ?

30 © 2005 ECNU SEIPrinciples of Embedded Computing System Design30 Code compression (P.199) zUse statistical compression to reduce code size, decompress on-the-fly: CPU decompressor table cache main memory 0101101 LDR r0,[r4]


Download ppt "© 2005 ECNU SEIPrinciples of Embedded Computing System Design1 Program design and analysis zOptimizing for execution time. zOptimizing for energy/power."

Similar presentations


Ads by Google