Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Design – Introduction 1 MAMAS – Computer Architecture 234267 Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson, Randi Katz,

Similar presentations


Presentation on theme: "Computer Design – Introduction 1 MAMAS – Computer Architecture 234267 Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson, Randi Katz,"— Presentation transcript:

1 Computer Design – Introduction 1 MAMAS – Computer Architecture 234267 Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson, Randi Katz, Patterson, Gabriel Loh

2 Computer Design – Introduction 2 General Course Information u Grade  20% Exercise (mandatory)  80% Final exam u Textbooks  Computer Architecture a Quantitative Approach: Hennessy & Patterson u Other course information  Course web site: http://webcourse.cs.technion.ac.il/234267  Foils will be on the web several days before the class

3 Computer Design – Introduction 3 Class Focus u CPU  Introduction: performance, instruction set (RISC vs. CISC)  Pipeline, hazards  Branch prediction  Out-of-order execution u Memory Hierarchy  Cache  Main memory  Virtual Memory u Advanced Topics u PC Architecture  Motherboard & chipset, DRAM, I/O, Disk, peripherals

4 Computer Design – Introduction 4 Computer System Structure CPU PCI North Bridge DDRII Channel 1 mouse LAN Lan Adap Graphic Adapt Mem BUS CPU BUS Cache Sound Card speakers South Bridge PCI express ×16 IDE controller IO Controller DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive keybrd DDRII Channel 2 USB controller SATA controller PCI express ×1

5 Computer Design – Introduction 5 Architecture & Microarchitecture u Architecture The processor features seen by the “user”  Instruction set, addressing modes, data width, … u Micro-architecture The way of implementation of a processor  Caches size and structure, number of execution units, …  Timing is considered uArch (though it is user visible) u Processors with different uArch can support the same Architecture

6 Computer Design – Introduction 6 Compatibility u Backward compatibility  New hardware can run existing software Core2 Duo  can run SW written for Pentium  4, Pentium  M, Pentium  III, Pentium  II, Pentium , 486, 386, 268 u Forward compatibility  New software can run on existing hardware  Example: new software written with SSE2TM runs on older processor which does not support SSE2TM  Commonly supports one or two generations behind u Architecture independent SW  JIT – just in time compiler: Java and.NET  Binary translation

7 Computer Design – Introduction 7 Performance

8 8 Technology Trends and Performance u Computing capacity:4× per 3 years  If we could keep all the transistors busy all the time  Actual: 3.3× per 3 years u Moore’s Law: Performance is doubled every ~18 months  Trend is slowing: process scaling declines, power is up 2× in 3 years 1.1× in 3 years CPU speed and Memory speed grow apart 2× in 3 years 4× in 3 years

9 Computer Design – Introduction 9 Moore’s Law Graph taken from: http://www.intel.com/technology/mooreslaw/index.htm

10 Computer Design – Introduction 10 CPI – Cycles Per Instruction u CPUs work according to a clock signal  Clock cycle is measured in nsec (10 -9 of a second)  Clock frequency (= 1/clock cycle) measured in GHz (10 9 cyc/sec) u Instruction Count (IC)  Total number of instructions executed in the program u CPI – Cycles Per Instruction  Average #cycles per Instruction (in a given program)  IPC (= 1/CPI) : Instructions per cycles CPI = #cycles required to execute the program IC

11 Computer Design – Introduction 11 CPU Time u CPU Time - time required to execute a program CPU Time = IC  CPI  clock cycle u Our goal: minimize CPU Time  Minimize clock cycle: more GHz (process, circuit, uArch)  Minimize CPI: uArch (e.g.: more execution units)  Minimize IC: architecture (e.g.: SSE TM )

12 Computer Design – Introduction 12 Speedup overall = ExTime old ExTime new = 1 Speedup enhanced Fraction enhanced (1 - Fraction enhanced ) + Suppose enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: Amdahl’s Law ExTime new = ExTime old × Speedup enhanced Fraction enhanced (1 – Fraction enhanced ) +

13 Computer Design – Introduction 13 Floating point instructions improved to run at 2×, but only 10% of executed instructions are FP Speedup overall = 1 0.95 =1.053 ExTime new = ExTime old × (0.9 + 0.1 / 2) = 0.95 × ExTime old Corollary: Make The Common Case Fast Amdahl’s Law: Example

14 Computer Design – Introduction 14 Calculating the CPI of a Program u ICi: #times instruction of type i is executed in the program u IC: #instruction executed in the program: u Fi: relative frequency of instruction of type i : Fi = ICi/IC u CPI i – #cycles to execute instruction of type i  e.g.: CPI add = 1, CPI mul = 3 u #cycles required to execute the program: u CPI:

15 Computer Design – Introduction 15 Comparing Performance u Peak Performance  MIPS, MFLOPS  Often not useful: unachievable / unsustainable in practice u Benchmarks  Real applications, or representative parts of real apps  Targeted at the specific system usages u SPEC INT – integer applications  Data compression, C complier, Perl interpreter, database system, chess-playing, Text-processing, … u SPEC FP – floating point applications  Mostly important scientific applications u TPC Benchmarks  Measure transaction-processing throughput

16 Computer Design – Introduction 16 The ISA is what the user / compiler see The HW implements the ISA instruction set software hardware Instruction Set Design

17 Computer Design – Introduction 17 ISA Considerations u Code size  Long instructions take more time to fetch  Longer instructions require a larger memory Important in small devices, e.g., cell phones u Number of instructions (IC)  Reducing IC reduce execution time At a given CPI and frequency u Code “simplicity”  Simple HW implementation Higher frequency and lower power  Code optimization can better be applied to “simple code”

18 Computer Design – Introduction 18 Architectural Consideration Example u Displacement Address Size  1% of addresses > 16-bits  12 - 16 bits of displacement needed 0% 10% 20% 30% 0 12 3456789 10 11 12 131415 Address Bits Int. Avg. FP Avg.

19 Computer Design – Introduction 19 CISC Processors u CISC - Complex Instruction Set Computer  The idea: a high level machine language  Example: x86 u Characteristic  Many instruction types, with a many addressing modes  Some of the instructions are complex Execute complex tasks Require many cycles  ALU operations directly on memory Only a few registers, in many cases not orthogonal  Variable length instructions common instructions get short codes  save code length

20 Computer Design – Introduction 20 Rankinstruction% of total executed 1load22% 2conditional branch20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move register-register4% 9call1% 10return1% Total96% Simple instructions dominate instruction frequency Top 10 x86 Instructions

21 Computer Design – Introduction 21 CISC Drawbacks u Complex instructions and complex addressing modes  complicates the processor  slows down the simple, common instructions  contradicts Make The Common Case Fast u Compilers don’t use complex instructions / indexing methods u Variable length instructions are real pain in the neck  Difficult to decode few instructions in parallel As long as instruction is not decoded, its length is unknown  It is unknown where the instruction ends  It is unknown where the next instruction starts  An instruction may be over more than a single cache line  An instruction may be over more than a single page

22 Computer Design – Introduction 22 RISC Processors u RISC - Reduced Instruction Set Computer  The idea: simple instructions enable fast hardware u Characteristic  A small instruction set, with only a few instructions formats  Simple instructions execute simple tasks Most of them require a single cycle (with pipeline)  A few indexing methods  ALU operations on registers only Memory is accessed using Load and Store instructions only Many orthogonal registers Three address machine: Add dst, src1, src2  Fixed length instructions u Examples: MIPS TM, Sparc TM, Alpha TM, Power TM

23 Computer Design – Introduction 23 RISC Processors (Cont.) u Simple architecture  Simple micro-architecture  Simple, small and fast control logic  Simpler to design and validate  Room for large on die caches  Shorten time-to-market u Using a smart compiler  Better pipeline usage  Better register allocation u Existing RISC processor are not “pure” RISC  e.g., support division which takes many cycles

24 Computer Design – Introduction 24 Compilers and ISA u Ease of compilation  Orthogonality: no special registers few special cases all operand modes available with any data type or instruction type  Regularity: no overloading for the meanings of instruction fields  streamlined resource needs easily determined u Register Assignment is critical too  Easier if lots of registers

25 Computer Design – Introduction 25 CISC Is Dominant u The x86 architecture, which is a CISC architecture, dominates the processor market  A vast amount of existing software  Intel, AMD, Microsoft and others benefit from this Intel and AMD put a lot of money to make high performance x86 processors, despite the architectural disadvantage Current x86 processor give the best cost/performance  CISC processors use  arch ideas from the RISC world  Starting at Pentium  II and K6 , x86 processors translate CISC instructions into RISC-like operations internally the inside core looks much like that of a RISC processor

26 Computer Design – Introduction 26 Software Specific Extensions u Extend arch to accelerate exec of specific apps u Example: SSE TM – Streaming SIMD Extensions  128-bit packed (vector) / scalar single precision FP (4×32)  Introduced on Pentium® III on ’99  8 new 128 bit registers (XMM0 – XMM7)  Accelerates graphics, video, scientific calculations, … u Packed:Scalar: x0x1x2x3 y0y1y2y3 x0+y0x1+y1x2+y2 x3+y3 + 128-bits x0x1x2x3 y0y1y2y3 x0+y0y1y2 y3 + 128-bits


Download ppt "Computer Design – Introduction 1 MAMAS – Computer Architecture 234267 Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson, Randi Katz,"

Similar presentations


Ads by Google