It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent.

It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent

Overview Introduction of processor model Show importance of latency Techniques to handle latency Quantify memory latency effect Why consider optical interconnects? Latency of an optical interconnect Conclusions

Out-of-order processor pipeline I-cache fetchdecode instruction window rename architectural register file LD ST execution units ‘future’ register file INT in-order retirement

Branch latency I-cache fetchdecode instruction window rename LD ST execution units ‘future’ register file INT BR time ADDORSTXORLD ORBRSTXORLD... BR latency

Eliminate branch latency By prediction: predict outcome of branch => eliminate dependency (with a high probability) By predication: convert control dependency to data dependency => eliminate control dependency

while (pointer!=0) pointer = pointer.next; Load latency Loop: LD R1, R1(32) BNE R1, Loop cycles LD CPI = 2 cycles/2 instructions = 1 cycle/instruction load latency = 2 cycles branch latency = 1 cycle BNE LD BNE LD BNE LD execution units

When longer load latency cycles LD CPI = 8 cycles/2 instructions = 4 cycles/instruction load latency = 2+6 cycles branch latency = 1 cycle BNE execution units When L1-cache misses and L2-cache hits: LD When L2-cache misses and main memory hits: load latency = 2+6+60 cycles CPI = 34 cycles/instruction

Memory hierarchy register file execution units L1 cache L2 cache main memory hard drive storage capacity and latency

L1 cache latency IPC = Instructions Per clock Cycle, 1 Ghz processor, spec95 programs

Main memory latency IPC = Instructions Per clock Cycle, 1 Ghz processor, spec95 programs

Performance and latency performance change = sensitivity * load latency change

Increase performance by eliminating/reducing load latency: –By prefetching: predict the next miss and fetch the data to e.g. L1-cache –By address prediction: address known earlier => load executed earlier => data early in register file or reducing sensitivity to load latency: – by fine-grain multithreading

Some prefetch techniques Stride prefetching: search for pattern with constant stride e.g. walking through a matrix (row- or column-order) Markov prefetching: recurring patterns of misses 2031425364 stride: 11 miss history prediction 10 110 15 12 100 …...

Stride prefetching IPC = Instructions Per clock Cycle, 1 Ghz processor, program: compress

Prefetching and sensitivity Factors of “performance sensitivity to latency” increase with stride-prefetching:

Latency is important: generalization to other processor architectures Consider schedule of program: time Present in every program execution: Latency of instruction execution Latency of communication => latency important whatever processor architecture

Optical interconnects (OI) Mature components: – Vertical-Cavity Surface Emitting Lasers (VCSELs) – Light Emitting Diodes (LEDs) Very high bandwidths Are replacing electronic interconnects in telecom and networks Useful for short inter-chip and even intra-chip interconnects?

OI in processor context At levels close to processor core, latency is very important => latency of OI determines how far OI penetrates in the memory hierarchy What is the latency of an optical interconnect?

An optical link Total latency = buffer latency + VCSEL/LED latency + time of flight + receiver latency LED/VCSEL buffer/modulation/bias fiber or light conductor receiver diode transimpedance amplifier

VCSEL characteristics A small semiconductor laser Carrier density should be high enough for lasing action

Total VCSEL link latency consists of Buffer latency Parasitic capacitances and series resistances of VCSEL and pads Threshold carrier density build up From low optical output to final optical output (intrinsic latency) Time of flight (TOF) Receiver latency

Total optical link latency CMOS: 0.6  m0.25  m0.6  m0.25  m @ 1 mW

Latency as function of power

Conclusions When combining performance sensitivity and optical latency we conclude: –optical interconnects are feasible to main memory and for multiprocessors –for interconnects close to processor core, optical interconnects have too high latency with present (telecom) devices, drivers and receivers => but now evolution to lower latency devices, drivers and receivers is taking place... For more information on the presented results: Henk Neefs, Latentiebeheersing in processors, PhD Universiteit Gent, January 2000 www.elis.rug.ac.be/~neefs

It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent.

Similar presentations

Presentation on theme: "It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent.

Similar presentations

Presentation on theme: "It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent."— Presentation transcript:

Similar presentations

About project

Feedback