Presentation is loading. Please wait.

Presentation is loading. Please wait.

Embedded Computer Systems Chapter1: Embedded Computing Eng. Husam Y. Alzaq Islamic University of Gaza.

Similar presentations


Presentation on theme: "Embedded Computer Systems Chapter1: Embedded Computing Eng. Husam Y. Alzaq Islamic University of Gaza."— Presentation transcript:

1 Embedded Computer Systems Chapter1: Embedded Computing Eng. Husam Y. Alzaq Islamic University of Gaza

2 © 2010 Husam Alzaq Computers as Components2 Program design and analysis zCompilation flow. zBasic statement translation. zBasic optimizations. zInterpreters and just-in-time compilers.

3 © 2010 Husam Alzaq Computers as Components3 Compilation zCompilation strategy (Wirth): ycompilation = translation + optimization zCompiler determines quality of code: yuse of CPU resources; ymemory access scheduling; ycode size.

4 © 2010 Husam Alzaq Computers as Components4 Basic compilation phases HLL parsing, symbol table machine-independent optimizations machine-dependent optimizations assembly

5 © 2010 Husam Alzaq Computers as Components5 Statement translation and optimization zSource code is translated into intermediate form such as CDFG. zCDFG is transformed/optimized. zCDFG is translated into instructions with optimization decisions. zInstructions are further optimized.

6 © 2010 Husam Alzaq Computers as Components6 Arithmetic expressions a*b + 5*(c-d) expression DFG *- * + a b cd 5

7 © 2010 Husam Alzaq Computers as Components7 2 3 4 1 Arithmetic expressions, cont’d. ADR r4,a MOV r1,[r4] ADR r4,b MOV r2,[r4] ADD r3,r1,r2 DFG *- * + a b cd 5 ADR r4,c MOV r1,[r4] ADR r4,d MOV r5,[r4] SUB r6,r4,r5 MUL r7,r6,#5 ADD r8,r7,r3 code

8 © 2010 Husam Alzaq Computers as Components8 Control code generation if (a+b > 0) x = 5; else x = 7; a+b>0x=5 x=7

9 © 2010 Husam Alzaq Computers as Components9 3 21 Control code generation, cont’d. ADR r5,a LDR r1,[r5] ADR r5,b LDR r2,b ADD r3,r1,r2 BLE label3 a+b>0x=5 x=7 LDR r3,#5 ADR r5,x STR r3,[r5] B stmtent LDR r3,#7 ADR r5,x STR r3,[r5] stmtent...

10 © 2010 Husam Alzaq Computers as Components10 Procedure linkage zNeed code to: ycall and return; ypass parameters and results. zParameters and returns are passed on stack. yProcedures with few parameters may use registers.

11 © 2010 Husam Alzaq Computers as Components11 Procedure stacks proc1 growth proc1(int a) { proc2(5); } proc2 SP stack pointer FP frame pointer 5 accessed relative to SP

12 © 2010 Husam Alzaq Computers as Components12 ARM procedure linkage zAPCS (ARM Procedure Call Standard): yr0-r3 pass parameters into procedure. Extra parameters are put on stack frame. yr0 holds return value. yr4-r7 hold register values. yr11 is frame pointer, r13 is stack pointer. yr10 holds limiting address on stack size to check for stack overflows.

13 © 2010 Husam Alzaq Computers as Components13 Data structures zDifferent types of data structures use different data layouts. zSome offsets into data structure can be computed at compile time, others must be computed at run time.

14 © 2010 Husam Alzaq Computers as Components14 One-dimensional arrays zC array name points to 0th element: a[0] a[1] a[2] a = *(a + 1)

15 © 2010 Husam Alzaq Computers as Components15 Two-dimensional arrays zColumn-major layout: a[0,0] a[0,1] a[1,0] a[1,1] = a[i*M+j]... M N

16 © 2010 Husam Alzaq Computers as Components16 Structures zFields within structures are static offsets: field1 field2 aptr struct { int field1; char field2; } mystruct; struct mystruct a, *aptr = &a; 4 bytes *(aptr+4)

17 © 2010 Husam Alzaq Computers as Components17 Expression simplification zConstant folding: y8+1 = 9 zAlgebraic: ya*b + a*c = a*(b+c) zStrength reduction: ya*2 = a<<1

18 © 2010 Husam Alzaq Computers as Components18 Dead code elimination zDead code: #define DEBUG 0 if (DEBUG) dbg(p1); zCan be eliminated by analysis of control flow, constant folding. 0 dbg(p1); 1 0

19 © 2010 Husam Alzaq Computers as Components19 Procedure inlining zEliminates procedure linkage overhead: int foo(a,b,c) { return a + b - c;} z = foo(w,x,y);  z = w + x + y;

20 © 2010 Husam Alzaq Computers as Components20 Loop transformations zGoals: yreduce loop overhead; yincrease opportunities for pipelining; yimprove memory system performance.

21 © 2010 Husam Alzaq Computers as Components21 Loop unrolling zReduces loop overhead, enables some other optimizations. for (i=0; i<4; i++) a[i] = b[i] * c[i];  for (i=0; i<2; i++) { a[i*2] = b[i*2] * c[i*2]; a[i*2+1] = b[i*2+1] * c[i*2+1]; }

22 © 2010 Husam Alzaq Computers as Components22 Loop fusion and distribution zFusion combines two loops into 1: for (i=0; i<N; i++) a[i] = b[i] * 5; for (j=0; j<N; j++) w[j] = c[j] * d[j];  for (i=0; i<N; i++) { a[i] = b[i] * 5; w[i] = c[i] * d[i]; } zDistribution breaks one loop into two. zChanges optimizations within loop body.

23 © 2010 Husam Alzaq Computers as Components23 © 2000 Morgan Kaufman Loop tiling zBreaks one loop into a nest of loops. zChanges order of accesses within array. yChanges cache behavior.

24 © 2010 Husam Alzaq Computers as Components24 Loop tiling example for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = a[i,j]*b[i]; for (i=0; i<N; i+=2) for (j=0; j<N; j+=2) for (ii=0; ii<min(i+2,n); ii++) for (jj=0; jj<min(j+2,N); jj++) c[ii] = a[ii,jj]*b[ii];

25 © 2010 Husam Alzaq Computers as Components25 Array padding zAdd array elements to change mapping into cache: a[0,0]a[0,1]a[0,2] a[1,0]a[1,1]a[1,2] before a[0,0]a[0,1]a[0,2] a[1,0]a[1,1]a[1,2] after a[0,2] a[1,2]

26 © 2010 Husam Alzaq Computers as Components26 Register allocation zGoals: ychoose register to hold each variable; ydetermine lifespan of varible in the register. zBasic case: within basic block.

27 © 2010 Husam Alzaq Computers as Components27 Register lifetime graph w = a + b; x = c + w; y = c + d; time a b c d w x y 123 t=1 t=2 t=3

28 © 2010 Husam Alzaq Computers as Components28 Instruction scheduling zNon-pipelined machines do not need instruction scheduling: any order of instructions that satisfies data dependencies runs equally fast. zIn pipelined machines, execution time of one instruction depends on the nearby instructions: opcode, operands.

29 © 2010 Husam Alzaq Computers as Components29 Reservation table zA reservation table relates instructions/time to CPU resources. Time/instrAB instr1X instr2XX instr3X instr4X

30 © 2010 Husam Alzaq Computers as Components30 Overheads for Computers as Components Software pipelining zSchedules instructions across loop iterations. zReduces instruction latency in iteration i by inserting instructions from iteration i+1.

31 © 2010 Husam Alzaq Computers as Components31 Instruction selection zMay be several ways to implement an operation or sequence of operations. zRepresent operations as graphs, match possible instruction sequences onto graph. * + expressiontemplates *+ * + MULADD MADD

32 © 2010 Husam Alzaq Computers as Components32 Using your compiler zUnderstand various optimization levels (- O1, -O2, etc.) zLook at mixed compiler/assembler output. zModifying compiler output requires care: ycorrectness; yloss of hand-tweaked code.

33 © 2010 Husam Alzaq Computers as Components33 Interpreters and JIT compilers zInterpreter: translates and executes program statements on-the-fly. zJIT compiler: compiles small sections of code into instructions during program execution. yEliminates some translation overhead. yOften requires more memory.

34 © 2010 Husam Alzaq Computers as Components34 Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution time. yEnergy/power. yProgram size. zProgram validation and testing.

35 © 2010 Husam Alzaq Computers as Components35 Program-level performance analysis zNeed to understand performance in detail: yReal-time behavior, not just typical. yOn complex platforms.  Program performance  CPU performance: yPipeline, cache are windows into program. yWe must analyze the entire program.

36 © 2010 Husam Alzaq Computers as Components36 Complexities of program performance zVaries with input data: yDifferent-length paths. zCache effects. zInstruction-level performance variations: yPipeline interlocks. yFetch times.

37 © 2010 Husam Alzaq Computers as Components37 How to measure program performance zSimulate execution of the CPU. yMakes CPU state visible. zMeasure on real CPU using timer. yRequires modifying the program to control the timer. zMeasure on real CPU using logic analyzer. yRequires events visible on the pins.


Download ppt "Embedded Computer Systems Chapter1: Embedded Computing Eng. Husam Y. Alzaq Islamic University of Gaza."

Similar presentations


Ads by Google