Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture 2009 – Introduction 1 MAMAS – Computer Architecture 234267 Lecturer: Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson,

Similar presentations


Presentation on theme: "Computer Architecture 2009 – Introduction 1 MAMAS – Computer Architecture 234267 Lecturer: Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson,"— Presentation transcript:

1 Computer Architecture 2009 – Introduction 1 MAMAS – Computer Architecture 234267 Lecturer: Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson, Randi Katz, Patterson, Gabriel Loh

2 Computer Architecture 2009 – Introduction 2 General Course Information u Grade  20% Exercise (mandatory) תקף  80% Final exam  No midterm exam u Textbooks  Computer Architecture a Quantitative Approach: Hennessy & Patterson u Other course information  Course web site: http://webcourse.cs.technion.ac.il/234267  Foils will be on the web several days before the class

3 Computer Architecture 2009 – Introduction 3 Lecturer details u Name: Lihu Rappoport u Phone: 04-865-1554 u Email: lihu.rappoport@intel.com

4 Computer Architecture 2009 – Introduction 4 Class Focus u CPU  Introduction: performance, instruction set (RISC vs. CISC)  Pipeline, hazards  Branch prediction  Out-of-order execution u Memory Hierarchy  Cache  Main memory  Virtual Memory u Advanced Topics u PC Architecture  Motherboard & chipset, DRAM, I/O, Disk, peripherals

5 Computer Architecture 2009 – Introduction 5 Computer System Structure CPU PCI North Bridge DDRII Channel 1 mouse LAN Lan Adap External Graphics Card Mem BUS CPU BUS Cache Sound Card speakers South Bridge PCI express ×16 IDE controller IO Controller DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive keybrd DDRII Channel 2 USB controller SATA controller PCI express ×1 Memory controller On-board Graphics

6 Computer Architecture 2009 – Introduction 6 Architecture & Microarchitecture u Architecture The processor features seen by the “user”  Instruction set, addressing modes, data width, … u Micro-architecture The way of implementation of a processor  Caches size and structure, number of execution units, …  Timing is considered uArch (though it is user visible) u Processors with different uArch can support the same Architecture

7 Computer Architecture 2009 – Introduction 7 Compatibility u Backward compatibility  New hardware can run existing software Core2 Duo  can run SW written for Pentium  4, Pentium  M, Pentium  III, Pentium  II, Pentium , 486, 386, 268 u Forward compatibility  New software can run on existing hardware  Example: new software written with SSE2TM runs on older processor which does not support SSE2TM  Commonly supports one or two generations behind u Architecture independent SW  JIT – just in time compiler: Java and.NET  Binary translation

8 Computer Architecture 2009 – Introduction 8 Performance

9 9 Technology Trends and Performance u Computing capacity:4× per 3 years  If we could keep all the transistors busy all the time  Actual: 3.3× per 3 years u Moore’s Law: Performance is doubled every ~18 months  Trend is slowing: process scaling declines, power is up 2× in 3 years 1.1× in 3 years CPU speed and Memory speed grow apart 2× in 3 years 4× in 3 years

10 Computer Architecture 2009 – Introduction 10 Moore’s Law Graph taken from: http://www.intel.com/technology/mooreslaw/index.htm

11 Computer Architecture 2009 – Introduction 11 CPI – Cycles Per Instruction u CPUs work according to a clock signal  Clock cycle is measured in nsec (10 -9 of a second)  Clock frequency (= 1/clock cycle) measured in GHz (10 9 cyc/sec) u Instruction Count (IC)  Total number of instructions executed in the program u CPI – Cycles Per Instruction  Average #cycles per Instruction (in a given program)  IPC (= 1/CPI) : Instructions per cycles CPI = #cycles required to execute the program IC

12 Computer Architecture 2009 – Introduction 12 CPU Time u CPU Time - time required to execute a program CPU Time = IC  CPI  clock cycle u Our goal: minimize CPU Time  Minimize clock cycle: more GHz (process, circuit, uArch)  Minimize CPI: uArch (e.g.: more execution units)  Minimize IC: architecture (e.g.: SSE TM )

13 Computer Architecture 2009 – Introduction 13 Speedup overall = ExTime old ExTime new = 1 Speedup enhanced Fraction enhanced (1 - Fraction enhanced ) + Suppose enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: Amdahl’s Law ExTime new = ExTime old × Speedup enhanced Fraction enhanced (1 – Fraction enhanced ) +

14 Computer Architecture 2009 – Introduction 14 Floating point instructions improved to run at 2×, but only 10% of executed instructions are FP Speedup overall = 1 0.95 =1.053 ExTime new = ExTime old × (0.9 + 0.1 / 2) = 0.95 × ExTime old Corollary: Make The Common Case Fast Amdahl’s Law: Example

15 Computer Architecture 2009 – Introduction 15 Calculating the CPI of a Program u ICi: #times instruction of type i is executed in the program u IC: #instruction executed in the program: u Fi: relative frequency of instruction of type i : Fi = ICi/IC u CPI i – #cycles to execute instruction of type i  e.g.: CPI add = 1, CPI mul = 3 u #cycles required to execute the program: u CPI:

16 Computer Architecture 2009 – Introduction 16 Evaluating Performance u Use a performance simulator to evaluate the performance of a new feature / algorithm  Models the uarch to a great detail  Run 100’s of representative applications u Produce the performance s-curve  Sort the applications according to the IPC increase  Baseline (0) is the processor without the new feature Negative outliers Positive outliers Bad S-curve Small negative outliers Positive outliers Good S-curve

17 Computer Architecture 2009 – Introduction 17 Comparing Performance u Peak Performance  MIPS, MFLOPS  Often not useful: unachievable / unsustainable in practice u Benchmarks  Real applications, or representative parts of real apps  Targeted at the specific system usages u SPEC INT – integer applications  Data compression, C complier, Perl interpreter, database system, chess-playing, Text-processing, … u SPEC FP – floating point applications  Mostly important scientific applications u TPC Benchmarks  Measure transaction-processing throughput

18 Computer Architecture 2009 – Introduction 18 The ISA is what the user / compiler see The HW implements the ISA instruction set software hardware Instruction Set Design

19 Computer Architecture 2009 – Introduction 19 ISA Considerations u Code size  Long instructions take more time to fetch  Longer instructions require a larger memory Important in small devices, e.g., cell phones u Number of instructions (IC)  Reducing IC reduce execution time At a given CPI and frequency u Code “simplicity”  Simple HW implementation Higher frequency and lower power  Code optimization can better be applied to “simple code”

20 Computer Architecture 2009 – Introduction 20 Architectural Consideration Example Immediate data size  1% of data values > 16-bits  12 – 16 bits of needed 0% 10% 20% 30% 0 1 2 3456789 10 11 12131415 Immediate data bits Int. Avg. FP Avg.

21 Computer Architecture 2009 – Introduction 21 CISC Processors u CISC - Complex Instruction Set Computer  The idea: a high level machine language  Example: x86 u Characteristic  Many instruction types, with a many addressing modes  Some of the instructions are complex Execute complex tasks Require many cycles  ALU operations directly on memory Only a few registers, in many cases not orthogonal  Variable length instructions common instructions get short codes  save code length

22 Computer Architecture 2009 – Introduction 22 Rankinstruction% of total executed 1load22% 2conditional branch20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move register-register4% 9call1% 10return1% Total96% Simple instructions dominate instruction frequency Top 10 x86 Instructions

23 Computer Architecture 2009 – Introduction 23 CISC Drawbacks u Complex instructions and complex addressing modes  complicates the processor  slows down the simple, common instructions  contradicts Make The Common Case Fast u Compilers don’t use complex instructions / indexing methods u Variable length instructions are real pain in the neck  Difficult to decode few instructions in parallel As long as instruction is not decoded, its length is unknown  It is unknown where the instruction ends  It is unknown where the next instruction starts  An instruction may be over more than a single cache line  An instruction may be over more than a single page

24 Computer Architecture 2009 – Introduction 24 RISC Processors u RISC - Reduced Instruction Set Computer  The idea: simple instructions enable fast hardware u Characteristic  A small instruction set, with only a few instructions formats  Simple instructions execute simple tasks Most of them require a single cycle (with pipeline)  A few indexing methods  ALU operations on registers only Memory is accessed using Load and Store instructions only Many orthogonal registers Three address machine: Add dst, src1, src2  Fixed length instructions u Examples: MIPS TM, Sparc TM, Alpha TM, Power TM

25 Computer Architecture 2009 – Introduction 25 RISC Processors (Cont.) u Simple architecture  Simple micro-architecture  Simple, small and fast control logic  Simpler to design and validate  Room for large on die caches  Shorten time-to-market u Using a smart compiler  Better pipeline usage  Better register allocation u Existing RISC processor are not “pure” RISC  e.g., support division which takes many cycles

26 Computer Architecture 2009 – Introduction 26 Compilers and ISA u Ease of compilation  Orthogonality: no special registers few special cases all operand modes available with any data type or instruction type  Regularity: no overloading for the meanings of instruction fields  streamlined resource needs easily determined u Register Assignment is critical too  Easier if lots of registers

27 Computer Architecture 2009 – Introduction 27 CISC Is Dominant u The x86 architecture, which is a CISC architecture, dominates the processor market  A vast amount of existing software  Intel, AMD, Microsoft and others benefit from this Intel and AMD put a lot of money to make high performance x86 processors, despite the architectural disadvantage Current x86 processor give the best cost/performance  CISC processors use  arch ideas from the RISC world  Starting at Pentium  II and K6 , x86 processors translate CISC instructions into RISC-like operations internally the inside core looks much like that of a RISC processor

28 Computer Architecture 2009 – Introduction 28 Software Specific Extensions u Extend arch to accelerate exec of specific apps u Example: SSE TM – Streaming SIMD Extensions  128-bit packed (vector) / scalar single precision FP (4×32)  Introduced on Pentium® III on ’99  8 new 128 bit registers (XMM0 – XMM7)  Accelerates graphics, video, scientific calculations, … u Packed:Scalar: x0x1x2x3 y0y1y2y3 x0+y0x1+y1x2+y2 x3+y3 + 128-bits x0x1x2x3 y0y1y2y3 x0+y0y1y2 y3 + 128-bits


Download ppt "Computer Architecture 2009 – Introduction 1 MAMAS – Computer Architecture 234267 Lecturer: Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson,"

Similar presentations


Ads by Google