Presentation is loading. Please wait.

Presentation is loading. Please wait.

It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent.

Similar presentations


Presentation on theme: "It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent."— Presentation transcript:

1 It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent

2 Overview Introduction of processor model Show importance of latency Techniques to handle latency Quantify memory latency effect Why consider optical interconnects? Latency of an optical interconnect Conclusions

3 Out-of-order processor pipeline I-cache fetchdecode instruction window rename architectural register file LD ST execution units ‘future’ register file INT in-order retirement

4 Branch latency I-cache fetchdecode instruction window rename LD ST execution units ‘future’ register file INT BR time ADDORSTXORLD ORBRSTXORLD... BR latency

5 Eliminate branch latency By prediction: predict outcome of branch => eliminate dependency (with a high probability) By predication: convert control dependency to data dependency => eliminate control dependency

6 while (pointer!=0) pointer = pointer.next; Load latency Loop: LD R1, R1(32) BNE R1, Loop cycles LD CPI = 2 cycles/2 instructions = 1 cycle/instruction load latency = 2 cycles branch latency = 1 cycle BNE LD BNE LD BNE LD execution units

7 When longer load latency cycles LD CPI = 8 cycles/2 instructions = 4 cycles/instruction load latency = 2+6 cycles branch latency = 1 cycle BNE execution units When L1-cache misses and L2-cache hits: LD When L2-cache misses and main memory hits: load latency = cycles CPI = 34 cycles/instruction

8 Memory hierarchy register file execution units L1 cache L2 cache main memory hard drive storage capacity and latency

9 L1 cache latency IPC = Instructions Per clock Cycle, 1 Ghz processor, spec95 programs

10 Main memory latency IPC = Instructions Per clock Cycle, 1 Ghz processor, spec95 programs

11 Performance and latency performance change = sensitivity * load latency change

12 Increase performance by eliminating/reducing load latency: –By prefetching: predict the next miss and fetch the data to e.g. L1-cache –By address prediction: address known earlier => load executed earlier => data early in register file or reducing sensitivity to load latency: – by fine-grain multithreading

13 Some prefetch techniques Stride prefetching: search for pattern with constant stride e.g. walking through a matrix (row- or column-order) Markov prefetching: recurring patterns of misses stride: 11 miss history prediction …...

14 Stride prefetching IPC = Instructions Per clock Cycle, 1 Ghz processor, program: compress

15 Prefetching and sensitivity Factors of “performance sensitivity to latency” increase with stride-prefetching:

16 Latency is important: generalization to other processor architectures Consider schedule of program: time Present in every program execution: Latency of instruction execution Latency of communication => latency important whatever processor architecture

17 Optical interconnects (OI) Mature components: – Vertical-Cavity Surface Emitting Lasers (VCSELs) – Light Emitting Diodes (LEDs) Very high bandwidths Are replacing electronic interconnects in telecom and networks Useful for short inter-chip and even intra-chip interconnects?

18 OI in processor context At levels close to processor core, latency is very important => latency of OI determines how far OI penetrates in the memory hierarchy What is the latency of an optical interconnect?

19 An optical link Total latency = buffer latency + VCSEL/LED latency + time of flight + receiver latency LED/VCSEL buffer/modulation/bias fiber or light conductor receiver diode transimpedance amplifier

20 VCSEL characteristics A small semiconductor laser Carrier density should be high enough for lasing action

21 Total VCSEL link latency consists of Buffer latency Parasitic capacitances and series resistances of VCSEL and pads Threshold carrier density build up From low optical output to final optical output (intrinsic latency) Time of flight (TOF) Receiver latency

22 Total optical link latency CMOS: 0.6  m0.25  m0.6  m0.25  1 mW

23 Latency as function of power

24 Conclusions When combining performance sensitivity and optical latency we conclude: –optical interconnects are feasible to main memory and for multiprocessors –for interconnects close to processor core, optical interconnects have too high latency with present (telecom) devices, drivers and receivers => but now evolution to lower latency devices, drivers and receivers is taking place... For more information on the presented results: Henk Neefs, Latentiebeheersing in processors, PhD Universiteit Gent, January 2000

25


Download ppt "It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent."

Similar presentations


Ads by Google