Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microelectronic devices: processing architectures – Koen De Bosschere – 2006-06-12 Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen.

Similar presentations


Presentation on theme: "Microelectronic devices: processing architectures – Koen De Bosschere – 2006-06-12 Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen."— Presentation transcript:

1 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 1 Microelectronic devices: processing architectures Koen De Bosschere Ghent University

2 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 2 Moore’s Law (1965) [Electronics, April 19, 1965] Gordon Moore Fairchild Semiconductor LOG 2 OF THE NUMBER OF COMPONENTS PER INTEGRATED FUNCTION

3 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 3 Moore’s Law [Source: Intel]

4 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 4 Itanium Montecito 1720 M Transistors

5 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 5 In-order execution Fetch & Decode Finish Execute

6 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 6 Retire (commit) Superscalar execution Fetch & Decode Fetch width Commit width Issue width Instruction Window EEEE

7 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 7 Superscalar out-of-order processor L1 I-cache Branch predictor Branch predictor Instruction window Instruction window ALU ld/st L1 D-cache L2 cache to L1 I-cache mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g ALU front-end pipeline out-of-order uitvoering in-order commit out-of-order uitvoering in-order commit decoding, register renaming, etc.

8 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 8 Cycle 1 L1 I-cache Branch predictor Branch predictor Instruction window Instruction window ALU ld/st L1 D-cache L2 cache a b ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g Per clock cycle, B instruction are fetched, With B = processor width Per clock cycle, B instruction are fetched, With B = processor width to L1 I-cache

9 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 9 Cycle 2 L1 I-cache Branch predictor Branch predictor Instruction window Instruction window ALU ld/st L1 D-cache L2 cache a bd1 c1 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g to L1 I-cache

10 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 10 Cycle 4 L1 I-cache Branch predictor Branch predictor Instruction window Instruction window ALU ld/st L1 D-cache L2 cache a bd1 c1e1 f1d2 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g Branch predicted taken to L1 I-cache

11 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 11 Cycle 5 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache ab ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d1 c1e1 f1d2 c2e2 f2 to L1 I-cache

12 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 12 ALU Cycle 6 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache ab d1 c1 a b mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d3 c3e1 f1d2 c2e2 f2 to L1 I-cache

13 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 13 Cycle 7 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache ab d1 c1e1 f1 c1 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d2 c2e2 f2d3 c3e3 f3 Operands are available operands not yet available to L1 I-cache

14 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 14 Cycle 8 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d1 c1e1 f1 d2 c2 e1 c1 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d4 c4e2 f2d3 c3e3 f3 out-of-order execution to L1 I-cache

15 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 15 Cycle 9 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d1 c1e1 f1 d2 c2 e2 f2 f1 c2 ALU d1 mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d3 c3e3 f3d4 c4e4 f4 Instruction-level parallelism (ILP) to L1 I-cache

16 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 16 Cycle 10 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d1e1 f1 d2 c2e2 f2 d3 c3 e2 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d5 c5e3 f3d4 c4e4 f4 in-order commit to L1 I-cache

17 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 17 Cycle 11 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 e3 f3 f2 c3 ALU d2 mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d4 c4e4 f4d5 c5e5 f5 to L1 I-cache

18 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 18 Instruction Level Parallelism [Starting from full trace, SpecInt 2000] bzip2craftyeongccgzipparserperlbmktwolfvortexvpr

19 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 19 Performance bzip2craftyeongccgzipparserperlbmktwolfvortexvpr IPC (8 execution units)

20 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 20 IPC =  W  Size instruction window W IPC

21 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 21 Branch prediction ? Retire (commit) Fetch & Decode Fetch width Commit width Issue width Instruction pool EEEE

22 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 22 Retire (commit) Branch predictor Fetch & Decode Fetch width Commit width Issue width Instruction pool EEEE Branch Predictor 90-95% correct

23 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 23 Cycle 11 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 e3 f3 f2 c3 ALU d2 mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d4 c4e4 d5 c5e5 f5f4 Mispredicted branch speculative fetch and execution to L1 I-cache

24 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 24 Cycle 13 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d3 c3e3 f3 d4 c4 e4 f3 c4 ALU d3 mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d5 c5e5 d6 c6e6 f6f5 f4 to L1 I-cache

25 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 25 ALU Cycle 15 L1 I-cache Branch predictor Branch predictor ld/st L1 D-cache L2 cache d4 c4e4 d5 c5 e5 c5 ALU d4 mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d6 c6e6 d7 c7e7 f7f6 f5 f4 to L1 I-cache

26 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 26 Cycle 16 ALU L1 I-cache Branch predictor Branch predictor ld/st L1 D-cache L2 cache d4 e4 d5 c5 e5 c5 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g d6 c6e6 d7 c7e7 f7f6 f5 f4 Instructions on mispredicted path must be nullified Instructions on mispredicted path must be nullified to L1 I-cache

27 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 27 Cycle 17 L1 I-cache Branch predictor Branch predictor Instruction window Instruction window ALU ld/st L1 D-cache L2 cache mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,32 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g ALU g h Instructions on correct path are fetched to L1 I-cache

28 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 28 Interval analysis IPC t Mispredicted branch enters instruction window instructies langs voorspeld pad worden uitgevoerd Mispredicted branch gets executed; instructions from the correct path are fetched Correct instructions enter the instruction window Performance recovers IPC max

29 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 29 Branch prediction bzip2craftyeongccgzipparserperlbmktwolfvortexvpr everything perfect real branch predictor IPC

30 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 30 Memory Wall Problem Processor L1 I (12Ki)L1 D (8KiB) L2 cache (512 KiB) L3 cache (2 MiB) Cycles: 2 Cycles: 19 Cycles: 43 Memory Cycles: 206 Pentium 4 EE

31 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 31 Cycle 10 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d1e1 f1 d2 c2e2 f2 d3 c3 e2 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3d4 c4e4 f4 Suppose f4 is correctly predicted, g causes an I cache miss Suppose f4 is correctly predicted, g causes an I cache miss I-cache miss latency is 10 cycles to L1 I-cache

32 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 32 Cycle 13 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d3 c3e3 f3 d4 c4 e4 f3 c4 ALU d3 mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g f4 to L1 I-cache

33 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 33 Cycle 14 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d3 e3 f3 d4 c4 e4 c4 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g f4 e4 to L1 I-cache

34 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 34 Cycle 15 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d4 c4e4 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g f4 d4 f4 to L1 I-cache

35 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 35 Cycle 16 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d4 e4 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g f4 to L1 I-cache

36 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 36 Cycle L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g to L1 I-cache

37 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 37 Cycle 20 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g g h to L1 I-cache

38 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 38 Interval analysis IPC t I-cache miss Instruction from front-end pipe line arrive in instruction window Instruction from front-end pipe line arrive in instruction window Instruction window empties front-end pipe line refills front-end pipe line refills Performance recovers IPC max I-cache miss latency

39 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 39 Cycle 10 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d1e1 f1 d2 c2e2 f2 d3 c3 e2 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3d4 c4e4 f4d5 c5 stel: dit is een L1 D-cache miss stel: dit is een L1 D-cache miss to L1 I-cache

40 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 40 Cycle 11 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4e4 f4d5 c5 toegangstijd tot L2 cache bedraagt 3 cycli f2 c3 e5 f5 to L1 I-cache

41 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 41 Cycle 12 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4d5 c5 c3 e5 f5 e3 d6 c6 to L1 I-cache

42 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 42 Cycle 13 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4 d5 c5 c4 e5 f5 f3 d6 c6e6 f6 to L1 I-cache

43 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 43 Cycle 14 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4 d5 c5 c4 e5 f5 e4 d6 c6e6 f6 d2 d7 c7 to L1 I-cache

44 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 44 L1 D-cache miss  instruction with long execution latency bzip2 crafty eon gap gzip perlbmk vortex IPC no L1 D-cache misseswith L1 D-cache misses

45 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 45 Cycle 13 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4 d5 c5 c4 e5 f5 f3 d6 c6e6 f6 to main memory MSHRs Suppose this is a L2 cache miss to L1 I-cache

46 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 46 MSHRs Cycle 14 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4 d5 c5 c4 e5 f5 e4 d6 c6e6 f6d7 c7 c2 instruction waits in MSHRs (Miss Status Handling Registers) for data from main memory -- Assume the access time to main memory is 250 cycles instruction waits in MSHRs (Miss Status Handling Registers) for data from main memory -- Assume the access time to main memory is 250 cycles to main memory to L1 I-cache

47 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 47 Cycle 15 MSHRs L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4 d5 c5 e5 f5 f4 d6 c6e6 f6d7 c7 c2 e7 f7 c5 to main memory to L1 I-cache

48 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 48 Cycle 16 MSHRs L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4 d5 c5 e5 f5 e5 d6 c6 e6 f6d7 c7 c2 e7 f7 c5 Instruction window fills up instruction c2 prohibits commit Instruction window fills up instruction c2 prohibits commit d8 c8 to main memory to L1 I-cache

49 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 49 … cycle 264 MSHRs L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4 d5 c5 e5 f5 d2 d6 c6 e6 f6d7 c7e7 f7 d8 c8 to main memory to L1 I-cache

50 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 50 Performance impact of non-ideal memory bzip2craftyeongccgzipparserperlbmktwolfvortexvpr everything perfect real branch predictor IPC real branch predictor; real memory hierarchy

51 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 51 Interval analysis IPC t L2 D-cache miss Instruction window fills up Instruction that do not depend on the cache miss are executed Performance recovers IPC max L2 D-cache miss latency Instruction window full

52 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 52 Multiple L2 D-cache misses MSHRs L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4 e4 f4 d5 c5 e5 f5d6 c6 e6 f6d7 c7 c4 e7 f7 d8 c8 c3 c2 c5 c6 to main memory Memory-Level Parallelism (MLP): to L1 I-cache

53 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 53 Overview Superscalar pipeline Speculative execution –branch prediction –value prediction Predicated execution Multithreaded execution VLIW

54 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 54 Pipeline i1i2i3 FDEW FDEW FDEW 4 stages 8 stages i1 i2 i3 i1 i2 i3

55 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 55 Examples ProcessorStagesFrequency Itanium GHz Alpha GHz AMD Opteron GHz Power GHz Pentium GHz IA32 Prescott313.4 GHz [Source: Microprocessor report]

56 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 56 Performance/Mhz speed demons brainiacs Frequency (MHz) SPECint_peak Alpha Athlon Opteron PA-RISC Pentium III, 4 POWER MIPS SPARC64 SPARC Sun Xeon Itanium

57 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 57 Pentium 4 pipeline stages 1 Trace cache next IP 2 Trace cache next IP 3 Trace cache fetch 4 Trace cache fetch 5 Drive 6 Allocate and rename 7 Allocate and rename 8 Allocate and rename 9 Queue 10 Issue 11 Issue 12 Issue 13 Dispatch 14 Dispatch 15 Operand 1 16 Operand 2 17 Execute 18 Flags 19 Branch check 20 Drive

58 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag stages 5 stages (1-5) 7 stages (6-12) 5 stages (13-17) 3 stages (18-20) 3 op/cycle 6 op/cycle

59 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 59 Tomasulo + Speculation D-Cache FU1FU2FU3 Address unit registers Instruction queue Load/store operationbus Common data bus (CDB) reservation stations Load buffer Fetch unit I-Cache reorder buffer

60 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 60 Overview Superscalar pipeline Speculative execution –branch prediction –value prediction Predicated execution Multithreaded execution VLIW

61 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 61 Control flow instructions i1 i2 bc lab i4 i5 i6 jmp end lab: i8 i9 i10 end:i11 i1 i2 bc lab i4 i5 i6 jmp end lab: i8 i9 i10 end:i11

62 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 62 Branch Predictors Static predictors –Not taken –Backward taken/forward not taken Dynamic predictors –Simple dynamic predictor –2-bit predictor –Local –Global –Hybrid –Branch Target Buffer

63 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 63 Predict not taken loop: cmpr1,r2 je end … loop … jumploop end: loop: cmpr1,r2 je end … loop … jumploop end:

64 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 64 Branch backward taken/ forward not taken cmpr1,r2 je end loop: … loop … cmpr1,r2 jne loop end: cmpr1,r2 je end loop: … loop … cmpr1,r2 jne loop end:

65 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 65 Simple dynamic predictor table PC lowest bits 1=taken 0=not taken predict taken update table with correct information

66 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag bit dynamic predictor table PC lowest bits 1=taken 0=not taken taken not taken predict taken predict not taken

67 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 67 Local dynamic predictor PC lowest bits history table 1=taken 0=not taken predict taken predictor table

68 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 68 Global dynamic predictor =taken 0=not taken Global history register prediction table predict taken + PC

69 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 69 Hybrid predictor PC predictor A predictor B Meta predictor

70 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 70 Branch Target Buffer PC target address branch prediction information prediction

71 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 71 Branch prediction bzip2craftyeongccgzipparserperlbmktwolfvortexvpr everything perfect real branch predictor IPC

72 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 72 Overview Superscalar pipeline Speculative execution –branch prediction –value prediction Predicated execution Multithreaded execution VLIW

73 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 73 Cycle 11 L1 I-cache Branch predictor Branch predictor ALU ld/st L1 D-cache L2 cache d2 c2e2 f2 d3 c3 c2 ALU mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] mov 0 → r1 mov 0x0fe0 → r3 L: ld MEM[r3] → r2 add r2,r1 → r1 add r3,4 → r3 brl r3,0x10a0 → L st r3 → MEM[A] a b d c e f g e3 f3 d4 c4e4 f4d5 c5 toegangstijd tot L2 cache bedraagt 3 cycli f2 c3 e5 f5 to L1 I-cache

74 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 74 Dependency Graph AB C DD G H JI K M LH ilp=13/8=1.62 t

75 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 75 Dependency Graph AB C DD G H JI K M LH ilp=13/8=1.62 t

76 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 76 Dependency Graph AB C DD G H JI K M LH t ilp=13/4=3.25 >> 1.62 True dependencies limit the achievable IPC

77 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 77 Prediction Schemes  Last value prediction (LVP)  Stride prediction (SP)  Finite Context Method (FCM) Cases: one instruction generates (constant) (stride) (repetition) Cases: one instruction generates (constant) (stride) (repetition)

78 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 78 Last Value Prediction (Lipasti, 1996) last value PC 49% accuracy for an infinite table

79 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 79 Stride Prediction (Gabbay, 1997) value stride PC 60% accuracy for an infinite table

80 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 80 Improved Stride Prediction value s1 s2 PC Or saturating counter... Cause:

81 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 81 Finite Context Method (Sazeides, 1997) Order 3

82 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 82 Finite Context Method (Sazeides, 1997) history PC 78% accuracy for infinite tables and order 3 value hashed values

83 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 83 Accuracy vs. size 0% 10% 20% 30% 40% 50% 60% 70% 80% size (Kbit) lvp stride saz_4 saz_6 saz_10 saz_12 saz_14 saz_16

84 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 84 Overview Superscalar pipeline Speculative execution –branch prediction –value prediction Predicated execution Multithreaded execution VLIW

85 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 85 Predicated Execution if (r1==0 || r2==0) { if (r3==0) r5 = r4 - 1; else r5 = r7 + 3; } else r5 = r3+1; r5 = r5 * 2;

86 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 86 Predicated Execution beq r1,0,L1 beq r2,0,L1 add r5,r3,1 jump L3 L1:beq r3,0,L2 add r5,r7,3 jump L3 L2:sub r5,r4,1 L3:mul r5,r5,2 beq r1,0,L1 beq r2,0,L1 add r5,r3,1 beq r3,0,L2 add r5,r7,3 mul r5,r5,2 sub r5,r4,1

87 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 87 beq r1,0,L1 beq r2,0,L1 add r5,r3,1 beq r3,0,L2 add r5,r7,3 mul r5,r5,2 sub r5,r4,1 p1=(r1==0) Predicated Execution p1 -p1 p1&p2 p1&-p2 p1=(r2==0) p2=(r3==0)

88 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 88 Predicated code beq r1,0,L1 beq r2,0,L1 add r5,r3,1 jump L3 L1:beq r3,0,L2 add r5,r7,3 jump L3 L2:sub r5,r4,1 L3:mul r5,r5,2 p1 = r1 == 0 p1 = r2 == 0 add r5,r3,1 p2 = r3 == 0 add r5,r7,3 sub r5,r4,1 mul r5,r5,2 Predicates control retirement

89 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 89 Advantages predication Less control transfers Easier scheduling of instructions p1 = r1 == 0 p1 = r2 == 0 add r5,r3,1 p2 = r3 == 0 add r5,r7,3 sub r5,r4,1 mul r5,r5,2

90 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 90 Overview Superscalar pipeline speculative execution –branch prediction –value prediction predicated execution multithreaded execution VLIW

91 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 91 Simultaneous multithreading (also called hyperthreading) t + =

92 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 92 Overview Superscalar pipeline Speculative execution –branch prediction –value prediction Predicated execution Multithreaded execution VLIW

93 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 93 VLIW: Very Long Instruction Word processors FD E W E E E F D E W E E E F D E W E E E

94 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 94 Retire (commit) VLIW execution Fetch & Decode E E E E Static scheduling Simple processor

95 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 95 Branch penalty FD E W E E jc F D E W E E E F D E W E E E F D E W E E E 8 instructions lost! Solution: Execute anyhow Problem: How to find 8 instructions

96 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 96 VLIW

97 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 97 EPIC: Explicitly Parallel Instruction Computing (Itanium) Operation 1TempOperation 1 41 bits 5 bits Instruction bundle Template determines which operations can be executed in parallel

98 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen TMS320C62xx VLIW Processor 0 A 0 D 1 F 0 G 1 E 1 C 1 B CycleInstruction 1A2BCD3EFG1A2BCD3EFG 32 bits

99 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 99 Questions?

100 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 100 Optimisation de logiciels pour les systèmes enfouis Prof. Koen De Bosschere Université de Gand

101 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 101 Memory hierarchy Second Lecture

102 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 102 Preliminaria 1 tebibyte (TiB) = 2 40 bytes 1 kibibyte (KiB) = 2 10 = 1024 bytes 1 mebibyte (MiB) = 2 20 = bytes 1 gibibyte (GiB) = 2 30 = bytes 1 terabyte (TB) = kilobyte (kB) = 10 3 = bytes 1 megabyte (MB) = 10 6 = bytes 1 gigabyte (GB) = 10 9 = bytes [International Standard IEC ]

103 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 103 Memory Hierarchy registers on-chip L1 cache (SRAM) main memory (DRAM) local secondary storage (local disks) Larger, slower, and cheaper (per byte) storage devices remote secondary storage (distributed file systems, Web servers) Local disks hold files retrieved from disks on remote network servers. Main memory holds disk blocks retrieved from local disks. On/off-chip L2/L3 cache (SRAM) L1 cache holds cache lines retrieved from the L2 cache memory. CPU registers hold words retrieved from L1 cache. L2 cache holds cache lines retrieved from main memory. L0: L1: L2: L3: L4: L5: Smaller, faster, and costlier (per byte) storage devices

104 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 104 Storage Evolution [Source: Byte and PC Magazine] metric :1980 $/MiB8, ,000 access (ns) typical size(MiB) ,000 DRAM metric :1980 $/MiB19,2002, access (ns) SRAM metric :1980 $/MiB ,000 access (ms) typical size(MiB) ,0009,0009,000 Disk

105 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 105 Magnetic storage 20 MB/mm nm particles 100 nm [Assumed max density 50 Tbpsi]

106 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 106 $/MiB 0,01 0, SRAM DRAM DISK

107 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 107 Storage Capacity Evolution 0,01 0, DISK/DRAM DRAM DISK MiB Machrone’s law: RAM ≈ $500 Hard disk ≈ $500

108 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 108 Access time evolution SRAM DRAM DISK ns ‘access time gap’

109 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 109 Memory Wall µProc 60%/yr. (2X/1.5yr ) DRAM 9%/yr. (2X/10 yrs) DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance “Moore’s Law”

110 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 110 Overview Caches: basic operation Miss classification Cache improvements

111 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 111 Caches Cache keeps intruders away from backcountry supplies

112 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 112 Pentium 4 EE Cache hierarchy Processor L1 I (12Ki)L1 D (8KiB) L2 cache (512 KiB) L3 cache (2 MiB) Cycles: 2 Cycles: 19 Cycles: 43 Memory Cycles: 206

113 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 113 Basic cache operation CPU cache memory

114 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 114 Time Instruction adres Locality temporal spatial [Quicksort]

115 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 115 Working set Set of memory locations used during Δt t Working set size

116 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 116 Performance impact of non-ideal memory bzip2craftyeongccgzipparserperlbmktwolfvortexvpr everything perfect real branch predictor IPC real branch predictor; real memory hierarchy

117 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 117 Basic Cache Types Direct-mapped caches Set-associative caches Fully associative caches

118 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 118 Direct mapped cache memory cache

119 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 119 Direct mapped cache = datahit tagindexoffset validdirty address e.g. 512 blocks of 32 bytes

120 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag way set associative ==== multiplexer address data hit

121 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 121 Two-way set associative cache

122 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 122 Fully associative cache

123 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 123 Associativity Size = #sets x associativity x blocksize Direct mappedassociativity = 1 Fully associative sets = 1 Direct mappedassociativity = 1 Fully associative sets = 1 Direct mapped 2-way SA, 4 sets 4-way SA, 2 sets Fully associative = tag = data

124 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 124 Exploiting spatial locality = multiplexer address

125 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 125 Average Memory Access Time AMAT = Hit Time + (Miss Rate x Miss Penalty) = (Hit Rate x Hit Time) + (Miss Rate x Miss Time) AMAT = Hit Time + (Miss Rate x Miss Penalty) = (Hit Rate x Hit Time) + (Miss Rate x Miss Time) 3 c x 100 c = 5 c 0.98 x 3 c x 103 c = 5 c Miss rate ↓ Miss penalty ↓ Hit time ↓  AMAT ↓

126 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 126 Overview Caches: basic operation Miss classification Cache improvements

127 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 127 Miss classification: 3C’s model Compulsory misses: first time misses –INF = infinitely large cache –compulsory misses = misses(INF) Capacity misses: cache size –FA = fully associative cache, LRU replacement –capacity misses = misses(FA) - misses(INF)

128 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 128 Miss classification: 3C’s model Conflict misses: set index functions –C = investigated cache with investigated replacement policy –Conflict misses = misses(C) - misses(FA)

129 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 129 Cache size ↑  Miss rate ↓ Associativity ↑  Miss rate ↓ Cache size (KiB) Miss Rate way 2-way 4-way 8-way capacity misses 2:1 rule [Spec 92 Benchmarks] [source: Patterson&Hennessy] compulsory misses

130 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen 3Cs Relative Miss Rate Cache Size (KiB) Miss Rate per Type 0% 20% 40% 60% 80% 100% way 2-way 4-way 8-way Capacity Compulsory Conflict !

131 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 131 Replacement strategies Least recently used OPT (will not be used for the longest time) Random (choose one) Associativity2-way4-way8-way SizeLRURandomLRURandomLRURandom 16 KiB5.18%5.69%4.67%5.29%4.39%4.96% 64 KiB1.88%2.01%1.54%1.66%1.39%1.53% 256 KiB1.15%1.17%1.13% 1.12% Miss Rates instruction cache

132 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 132 Overview Caches: basic operation Miss classification Cache improvements –Related to block size –Related to cache size –Related to indexing

133 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 133 Block size ↑  Miss rate ↓ ↑ Blok size (bytes) Miss Rate Direct Mapped Cache 0% 5% 10% 15% 20% 25% KiB 4KiB 16KiB 64KiB 256KiB

134 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 134 AMAT Cache Size Block Size MissPen (to mem) 4KiB16KiB64KiB256KiB

135 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 135 Critical word first Early restart: hit time ↓ Critical word first: first load the requested word from memory and forward it to the CPU, then complete the rest of the cache block. Early restart: load a complete cache block, but forward the requested word to the CPU as soon as it arrives. Good for large cache blocks Early restart: varying hit time

136 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 136 Stream buffer: miss rate ↓ L1 L2 Stream buffer Instruction cache: Alpha fetches 2 blocks on a miss Extra block placed in stream buffer On miss check stream buffer - 1 data stream buffer eliminated 25% misses from 4KiB cache; - 4 streams got 43% [Jouppi, 1990]

137 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 137 Stream buffer Data cache: for scientific programs for 8 streams got 50% to 70% of misses from 64KiB, 4-way set associative caches [Palacharla & Kessler, 1994] Stream buffer only make sense when there is enough bandwidth to the next level in the memory hierarchy. L1 L2 Stream buffer Reduces compulsory and capacity misses

138 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 138 Stream buffer improvements Multi-way streams –Multiple parallel stream buffers, one per instruction or data stream Stride detection –For non-unit stride access to memory

139 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 139 Overview Caches: basic operation Miss classification Cache improvements –Related to block size –Related to cache size –Related to indexing

140 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 140 Cache size ↑  hit time ↑ Associativity ↑  hit time ↑ KiB FA assoc [L1 data cache reduced from 2W 16KiB in Pentium III to 4W 8KiB in Pentium 4] ns

141 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 141 Cache size/assoc vs. AMAT Cache Size (KiB) AMAT (c) 1-way2-way (+10%) 4-way (+12%) 8-way (+14%) AMAT = Hit Time + Miss Rate x Miss Penalty

142 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 142 Split L1 caches Processor L1 I (12Ki)L1 D (8KiB) L2 cache (512 KiB) L3 cache (2 MiB) Cycles: 2 Cycles: 19 Cycles: 43 Memory Cycles: 206 Pentium 4 EE

143 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 143 Split vs. Unified Cache SizeInstruction Cache Data CacheUnified Cache 1 KiB3.06%24.61%13.34% 2 KiB2.26%20.57%9.78% 4 KiB1.78%15.94%7.24% 8 KiB1.10%10.19%4.57% 16 KiB0.64%6.47%2.87% 32 KiB0.39%4.82%1.99% 64 KiB0.15%3.77%1.35% 128 KiB0.02%2.88%0.95% Harvard architecture

144 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 144 Example [20% data cache; 80% instruction cache; 16 KiB miss penalty = 50 cycles; hit time = 1 cycle] Split Cache AMAT = 80% x ( % x 50) + 20% x ( % x 50) = For the unified cache AMAT = 80% x ( % x 50) + 20% x ( % x 50) = Extra cycle: single ported cache Make the common case fast

145 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 145 Filter cache Small L0 direct mapped cache (e.g. 256 B) Standard cache Performance penalty of 21% due to high miss rate (Kin’97) Consumes less power Processor Filter cache (L0) L1 Cache

146 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 146 Dynamically Loaded Loop Cache Small loop cache Alternative location to fetch instructions Dynamically fills the loop cache –Triggered by short backwards branch (sbb) instruction... add r1,2... sbb -5 Processor Dynamic loop cache L1 memory Mux Iteration 3 : fetch from loop cache Dynamic loop cache Iteration 1 : detect sbb instruction L1 memory Iteration 2 : fill loop cache Dynamic loop cache L1 memory

147 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 147 Preloaded loop cache Processor Preloaded loop cache L1 memory Mux Small loop cache Loop cache filled at compile time and remains fixed Fetch triggered by –short backwards branch –start address of the loop... add r1,2... sbb -5 Iteration 1 : detect sbb instruction L1 memory Iteration 2 : check to see if loop preloaded, if so fetch from cache Preloaded loop cache L1 memory

148 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 148 Victim Buffer Memory CPU HIT L1 cache Victim buffer One cycle MISS HIT Miss 22 cycles Two cycles 21 cycles [Jouppi’90] 4-entry victim cache removes 20% to 95% of conflicts for a 4 KiB direct mapped data cache

149 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 149 Overview Caches: basic operation Miss classification Cache improvements –Related to block size –Related to cache size –Related to indexing

150 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 150 Randomizing cache index functions 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx a5a4a3a2a1a0a5a4a3a2a1a0 H a3a2a3a Direct mapped cache

151 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 151 Randomizing cache index functions 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx a5a4a3a2a1a0a5a4a3a2a1a0 H (a 5  a 3 )(a 4  a 2 ) Direct mapped cache

152 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 152 Effect of randomized address bits No. of randomized address bits 0% 1% 2% 3% 4% 5% 6% 7% Miss rate fpintoverall [Vandierendonck’04]

153 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 153 Skewed-Associative Cache: mapping conflicts ↓ 2-way skewing –2 banks, different set index functions –Randomization! Inter-bank dispersion –Blocks may conflict in one bank, but probably not in the other Set-associative: –H1 = H2 tagdata bank 1 tagdata bank 2 H 1 H 2 block address

154 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 154 Inter bank dispersion in action Set-associativeSkewed-associative tagdata bank 1 tagdata bank 2 tagdata bank 1 tagdata bank 2

155 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 155 Limited Inter Bank Dispersion H1 H Goal: choose H1 and H2 such that the IBD is maximal

156 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 156 Trace cache i7 i8 i9 ret i1 call f i2 i3 trace cache traditional cache

157 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 157 Example ProcessorPentium 4Ultrasparc III Clock (2001)2000 Mhz900 Mhz L1 I cache96 KiB TC32 KiB, 4WSA Latency42 L1 D cache8 KiB 4WSA64 KiB, 4WSA Latency22 TLB L2 cache256 KiB 8WSA8 MiB DM (off chip) Latency615 Block size64 bytes32 bytes Bus width64 bits128 bits Bus clock400 Mhz150 Mhz ProcessorPentium 4Ultrasparc III Clock (2001)2000 Mhz900 Mhz L1 I cache96 KiB TC32 KiB, 4WSA Latency42 L1 D cache8 KiB 4WSA64 KiB, 4WSA Latency22 TLB L2 cache256 KiB 8WSA8 MiB DM (off chip) Latency615 Block size64 bytes32 bytes Bus width64 bits128 bits Bus clock400 Mhz150 Mhz

158 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 158 Other examples of caching Hardware1On-Chip TLBAddress translations TLB Web browser10,000,000Local diskWeb pagesBrowser cache Web cache Network buffer cache Buffer cache Virtual Memory L2 cache L1 cache Registers Cache Type Web pages Parts of files 4-KiB page 32-byte block 32-bit word What Cached Web proxy server 1,000,000,000Remote server disks OS≈ 100Main memory Hardware1-3On-Chip L1 Hardware≈ 10On/Off-Chip L2 AFS/NFS client 10,000,000Local disk Hardware+OS≈ 100Main memory Compiler1CPU registers Managed ByLatency (cycles)Where Cached

159 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 159 Memory Access load/store in cache?in RAM? Load cache line Load page 1 cycle60 cycles cycles yes no 90-99% 0,001-0,00001% 1 s1 min92 days

160 Microelectronic devices: processing architectures – Koen De Bosschere – Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen pag. 160 Questions?


Download ppt "Microelectronic devices: processing architectures – Koen De Bosschere – 2006-06-12 Faculteit Ingenieurswetenschappen – Vakgroep Elektronica en Informatiesystemen."

Similar presentations


Ads by Google