Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Structure 2013 – Introduction 1 MAMAS – Computer Structure 234267 Lecturers: Lihu Rappoport Adi Yoaz Some of the slides were taken from Avi Mendelson,

Similar presentations


Presentation on theme: "Computer Structure 2013 – Introduction 1 MAMAS – Computer Structure 234267 Lecturers: Lihu Rappoport Adi Yoaz Some of the slides were taken from Avi Mendelson,"— Presentation transcript:

1 Computer Structure 2013 – Introduction 1 MAMAS – Computer Structure 234267 Lecturers: Lihu Rappoport Adi Yoaz Some of the slides were taken from Avi Mendelson, Randi Katz, Patterson, Gabriel Loh

2 Computer Structure 2013 – Introduction 2 General Course Information u Grade  20% : 4 exercises (all mandatory) תקף Submission in pairs  80% Final exam  No midterm exam u Course web site  http://webcourse.cs.technion.ac.il/234267  Foils will be on the web several days before the class

3 Computer Structure 2013 – Introduction 3 Class Focus u CPU  Introduction: performance, instruction set (RISC vs. CISC)  Pipeline, hazards  Branch prediction  Out-of-order execution u Memory Hierarchy  Cache  Main memory  Virtual Memory u System  Motherboard & chipset, DRAM, I/O, Disk, peripherals  Power considerations

4 Computer Structure 2013 – Introduction 4 Computer System – Sandy Bridge mouse LAN Lan Adap South Bridge (PCH) Audio Codec DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive PS/2 keybrd/ mouse Cache DDRIII Channel 1 Mem BUS DDRIII Channel 2 Memory controller Core Display port External Graphics Card PCI express ×16 GFX Display link 2133-1066 MHz Line in Line out S/PDIF in S/PDIF out Super I/O LPC USB SATA BIOS PCI express ×1 exp slots System Agent 4×DMI D-sub HDMI DVI

5 Computer Structure 2013 – Introduction 5 Architecture & Microarchitecture u Architecture The processor features seen by the “user”  Instruction set, addressing modes, data width, … u Micro-architecture The way of implementation of a processor  Caches size and structure, number of execution units, …  Timing is considered uArch (though it is user visible) u Processors with different uArch can support the same Architecture

6 Computer Structure 2013 – Introduction 6 Compatibility u Backward compatibility  New hardware can run existing software Core2 Duo  can run SW written for Pentium  4, Pentium  M, Pentium  III, Pentium  II, Pentium , 486, 386, 268 u Forward compatibility  New software can run on existing hardware  Example: new software written with AVX TM runs on older processor which does not support AVX TM  Commonly supports one or two generations behind u Architecture independent SW  JIT – just in time compiler: Java and.NET  Binary translation

7 Computer Structure 2013 – Introduction 7 Moore’s Law The number of transistors doubles every ~2 years

8 Computer Structure 2013 – Introduction 8 CPI – Cycles Per Instruction u CPUs work according to a clock signal  Clock cycle is measured in nsec (10 -9 of a second)  Clock frequency (= 1/clock cycle) measured in GHz (10 9 cyc/sec) u Instruction Count (IC)  Total number of instructions executed in the program u CPI – Cycles Per Instruction  Average #cycles per Instruction (in a given program)  IPC (= 1/CPI) : Instructions per cycles CPI = #cycles required to execute the program IC

9 Computer Structure 2013 – Introduction 9 Calculating the CPI of a Program u ICi: #times instruction of type i is executed in the program u IC: #instruction executed in the program:  Fi: relative frequency of instruction of type i : Fi = ICi/IC u CPI i – #cycles to execute instruction of type i  e.g.: CPI add = 1, CPI mul = 3 u #cycles required to execute the entire program: u CPI:

10 Computer Structure 2013 – Introduction 10 CPU Time u CPU Time - time required to execute a program CPU Time = IC  CPI  clock cycle u Our goal: minimize CPU Time  Minimize clock cycle: more GHz (process, circuit, uArch)  Minimize CPI: uArch (e.g.: more execution units)  Minimize IC: architecture (e.g.: AVX TM )

11 Computer Structure 2013 – Introduction 11 Speedup overall = t exe t’ exe = 1 S F (1 – F) + Suppose enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: Amdahl’s Law t’ exe = t exe × S F (1 – F) + t exe t’ exe

12 Computer Structure 2013 – Introduction 12 Floating point instructions improved to run at 2×, but only 10% of executed instructions are FP Speedup overall = 1 0.95 =1.053 t’ exe = t exe × (0.9 + 0.1 / 2) = 0.95 × t exe Corollary: Make The Common Case Fast Amdahl’s Law: Example

13 Computer Structure 2013 – Introduction 13 Comparing Performance u Peak Performance  MIPS, GFLOPS  Often not useful: unachievable / unsustainable in practice u Benchmarks  Real applications, or representative parts of real apps  Synthetic benchmarks representative parts of real apps  Targeted at the specific system usages  SPEC INT – integer applications Data compression, C complier, Perl interpreter, database system, chess-playing, Text-processing, …  SPEC FP – floating point applications Mostly important scientific applications  TPC Benchmarks Measure transaction-processing throughput

14 Computer Structure 2013 – Introduction 14 Evaluating Performance of future CPUs u Use a performance simulator to evaluate the performance of a new feature / algorithm  Models the uarch to a great detail  Run 100’s of representative applications u Produce the performance s-curve  Sort the applications according to the IPC increase  Baseline (0) is the processor without the new feature Negative outliers Positive outliers Bad S-curve Small negative outliers Positive outliers Good S-curve

15 Computer Structure 2013 – Introduction 15 The ISA is what the user / compiler see The HW implements the ISA instruction set software hardware Instruction Set Design

16 Computer Structure 2013 – Introduction 16 ISA Considerations u Reduce the IC to reduce execution time  E.g., a single vector instruction performs the work of multiple scalar instructions u Simple instructions  simpler HW implementation  Higher frequency, lower power, lower cost u Code size  Long instructions take more time to fetch  Longer instructions require a larger memory Important in small devices, e.g., cell phones

17 Computer Structure 2013 – Introduction 17 Architectural Consideration Example Immediate data size  1% of data values > 16-bits  12 – 16 bits of needed 0% 10% 20% 30% 0 1 2 3456789 10 11 12131415 Immediate data bits Int. Avg. FP Avg.

18 Computer Structure 2013 – Introduction 18 Software Specific Extensions u Extend arch to accelerate exec of specific apps  Reduce IC u Example: SSE TM – Streaming SIMD Extensions  128-bit packed (vector) / scalar single precision FP (4×32)  Introduced on Pentium® III on ’99  8 new 128 bit registers (XMM0 – XMM7)  Accelerates graphics, video, scientific calculations, … u Packed:Scalar: x0x1x2x3 y0y1y2y3 x0+y0x1+y1x2+y2 x3+y3 + 128-bits x0x1x2x3 y0y1y2y3 x0+y0y1y2 y3 + 128-bits

19 Computer Structure 2013 – Introduction 19 CISC Processors u CISC – Complex Instruction Set Computer  The idea: a high level machine language  Example: x86 u Characteristic  Many instruction types, with a many addressing modes  Some of the instructions are complex Execute complex tasks Require many cycles  ALU operations directly on memory Only a few registers, in many cases not orthogonal  Variable length instructions common instructions get short codes  save code length

20 Computer Structure 2013 – Introduction 20 Rankinstruction% of total executed 1load22% 2conditional branch20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move register-register4% 9call1% 10return1% Total96% Simple instructions dominate instruction frequency Top 10 x86 Instructions

21 Computer Structure 2013 – Introduction 21 CISC Drawbacks u Complex instructions and complex addressing modes  complicates the processor  slows down the simple, common instructions  contradicts Make The Common Case Fast u Not compiler friendly  Non orthogonal registers  Unused complex addressing modes u Variable length instructions are a pain  Difficult to decode few instructions in parallel As long as instruction is not decoded, its length is unknown  Unknown where the inst. ends, and where the next inst. starts  An instruction may cross a cache line or a page

22 Computer Structure 2013 – Introduction 22 RISC Processors u RISC – Reduced Instruction Set Computer  The idea: simple instructions enable fast hardware u Characteristics  A small instruction set, with few instruction formats  Simple instructions that execute simple tasks Most of them require a single cycle (with pipeline)  A few indexing methods  ALU operations on registers only Memory is accessed using Load and Store instructions only  Many orthogonal registers – all instructions and all addressing modes available with any register  Three address machine: Add dst, src1, src2  Fixed length instructions u Complier friendly – easier to utilize the ISA  Orthogonal, simple

23 Computer Structure 2013 – Introduction 23 RISC Processors (Cont.) u Simple architecture  Simple micro-architecture  Simple, small and fast control logic  Simpler to design and validate  Leave space for large on die caches  Shorten time-to-market u Existing RISC processor are not “pure” RISC  E.g., support division which takes many cycles  Examples: MIPSTM, SparcTM, AlphaTM, PowerTM u Modern CISC processors use RISC  arch ideas  Internally translate the CISC instructions into RISC-like operations the inside core looks much like that of a RISC processor


Download ppt "Computer Structure 2013 – Introduction 1 MAMAS – Computer Structure 234267 Lecturers: Lihu Rappoport Adi Yoaz Some of the slides were taken from Avi Mendelson,"

Similar presentations


Ads by Google