Download presentation
Presentation is loading. Please wait.
1
Computer Architecture 2011 – Introduction (lec1) 1 Computer Architecture (“MAMAS”, 234267) Spring 2011 Lecturer: Dan Tsafrir Reception: Mon 18:30, Taub 611 28/2/2011 Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz
2
Computer Architecture 2011 – Introduction (lec1) 2 General Info u Grade 20% Exercise (mandatory) תקף 80% Final exam u Textbook “Computer Architecture: A Quantitative Approach” (4 th Edition) by: Patterson & Hennessy u Other course information Course web site: http://webcourse.cs.technion.ac.il/234267/Spring2011 Lectures will be upload to the web a day before the class
3
Computer Architecture 2011 – Introduction (lec1) 3 Computer System Structure
4
Computer Architecture 2011 – Introduction (lec1) 4 Classical Motherboard Diagram CPU PCI North Bridge DDR2 or DDR3 Channel 1 mouse LAN Lan Adap External Graphics Card Mem BUS CPU BUS Cache Sound Card speakers South Bridge PCI express 2.0 IO Controller Hard Disk Parallel Port Serial Port Floppy Drive keybrd DDR2 or DDR3 Channel 2 USB controller SATA controller PCI express ×1 Memory controller On-board Graphics DVD Drive IOMMU More to the “north” = closer to the CPU = faster
5
Computer Architecture 2011 – Introduction (lec1) 5 Intel Core2 Duo Northbridge = MCH = mem controller hub Southbridge = ICH = I/O controller hub Notice bandwidths
6
Computer Architecture 2011 – Introduction (lec1) 6 Intel Core i Series (i3 i5 i7) For high-end i-Series chips, Northbridge functionality moved onto processor (=> made faster)
7
Computer Architecture 2011 – Introduction (lec1) 7 Intel - Sandy Bridge The trend continues
8
Computer Architecture 2011 – Introduction (lec1) 8 Course Focus u Start from CPU (=processor) Instruction set, performance Pipeline, hazards Branch prediction Out-of-order execution u Move on to Memory Hierarchy Caching Main memory Virtual Memory u Move on to PC Architecture Motherboard & chipset, DRAM, I/O, Disk, peripherals u End with some Advanced Topics
9
Computer Architecture 2011 – Introduction (lec1) 9 The Processor
10
Computer Architecture 2011 – Introduction (lec1) 10 Architecture vs. Microarchitecture u Architecture: = The processor features as seen by its user = Interface Instruction set, number of registers, addressing modes,… u Microarchitecture: = Manner by which the processor is implemented = Implementation details Caches size and structure, number of execution units, … u Note: different processors with different u-archs can support the same arch Example: Intel Pentium-IV vs. Intel Core2 Duo u We will address both
11
Computer Architecture 2011 – Introduction (lec1) 11 Why Should We Care? u Abstractions enhance productivity, so: If we know the arch (=interface), Why should we care about the u-arch (=internals)? u Same goes for arch Just details for a programmer of a high-level language u Abstractions only work so long as what’s below works
12
Computer Architecture 2011 – Introduction (lec1) 12 Recent Processor Trends Source: http://www.scidacreview.org/0904/html/multicore.htmlhttp://www.scidacreview.org/0904/html/multicore.html
13
Computer Architecture 2011 – Introduction (lec1) 13 Well-Known Moore’s Law Graph taken from: http://www.intel.com/technology/mooreslaw/index.htm
14
Computer Architecture 2011 – Introduction (lec1) 14 The Story in a Nutshell Transistors (1000s) clock speed (MHz) power (W) Instructions/cycle (ILP)
15
Computer Architecture 2011 – Introduction (lec1) 15 Took the Industry by Surprise
16
Computer Architecture 2011 – Introduction (lec1) 16 Dire Implications: Performance
17
Computer Architecture 2011 – Introduction (lec1) 17 Dire Implications: Sales
18
Computer Architecture 2011 – Introduction (lec1) 18 Dire Implications: Sales
19
Computer Architecture 2011 – Introduction (lec1) 19 Dire Implications: Programmers
20
Computer Architecture 2011 – Introduction (lec1) 20 Supercomputing: “Top 500 list”
21
Computer Architecture 2011 – Introduction (lec1) 21 Dire Implications: Supercomputing
22
Computer Architecture 2011 – Introduction (lec1) 22 Processor Performance
23
Computer Architecture 2011 – Introduction (lec1) 23 Metrics: IC, CPI, IPC u CPUs work according to a clock signal Clock cycle: measured in nanoseconds (10 -9 of a second) Clock frequency = 1/|clock cycle|: in GHz (10 9 cycles/sec) u Instruction Count (IC) Total number of instructions executed in the program u Cycles Per Instruction (CPI) Average #cycles per Instruction (in a given program) IPC (= 1/CPI) : Instructions per cycles. Can be > 1; see the “story in a nutshell slide” CPI = #cycles required to execute the program IC
24
Computer Architecture 2011 – Introduction (lec1) 24 Minimizing Execution Time u CPU Time - time required to execute a program CPU Time = IC CPI clock cycle u Our goal: minimize CPU Time (any of above components) Minimize clock cycle: increase GHz (processor design) Minimize CPI: u-arch (e.g.: more execution units) Minimize IC: arch + u-arch (e.g.: SSE TM ) SSE = streaming SIMD extension (Intel)
25
Computer Architecture 2011 – Introduction (lec1) 25 Alternative Way to Calculate CPI u ICi = #times instruction of type-i is executed in program u IC = #instruction executed in program = u Fi = relative frequency of type-i instruction = ICi/IC u CPI i = #cycles to execute type-i instruction e.g.: CPI add = 1, CPI mul = 3 u #cycles required to execute the program: u CPI:
26
Computer Architecture 2011 – Introduction (lec1) 26 Measure & Compare Performance u Use benchmarks & measure how long it takes Mathematical analysis typically impossible Use real applications (=> no absolute answers) u Preferably standardized benchmarks (+input), e.g., SPEC INT: integer apps Compression, C complier, Perl, text-processing, … SPEC FP: floating point apps (mostly scientific) TPC benchmarks: measure transaction throughput (DB) SPEC JBB: models wholesale company (Java server, DB) u Sometimes you see FLOPS (“pick” or “sustained”) Supercomputers (top500 list), against LINPACK
27
Computer Architecture 2011 – Introduction (lec1) 27 Evaluating Performance u Use a performance simulator to evaluate the performance of a new feature / algorithm Models the uarch to a great detail Run 100’s of representative applications u Produce the performance s-curve Sort the applications according to the IPC increase Baseline (0) is the processor without the new feature Negative outliers Positive outliers Bad S-curve Small negative outliers Positive outliers Good S-curve
28
Computer Architecture 2011 – Introduction (lec1) 28 Amdahl’s Law u Suppose we accelerate the computation such that P = proportion of computation we make faster S = speedup experienced by the proportion we improved u For example If an improvement can speedup 40% of the computation => P = 0.4 If the improvement makes the portion run twice as fast => S = 2 u Then overall speedup =
29
Computer Architecture 2011 – Introduction (lec1) 29 Amdahl’s Law - Example u FP operations improved to run 2x faster S = 2, but… P = only affects 10% of the program Speedup: u Conclusion Better to make common case fast…
30
Computer Architecture 2011 – Introduction (lec1) 30 Amdahl’s Law – Parallelism u When parallelizing a program P = proportion of program that can be made parallel 1 - P = inherently serial N = number of processing elements (say, cores) Speedup: u Serial component imposes a hard limit
31
Computer Architecture 2011 – Introduction (lec1) 31 The ISA is what the user / compiler see The HW implements the ISA instruction set software hardware Instruction Set Design
32
Computer Architecture 2011 – Introduction (lec1) 32 Considerations in ISA Design u Instruction size Long instructions take more time to fetch from memory Longer instructions require a larger memory Important for small (embedded) devices, e.g., cell phones u Number of instructions (IC) Reduce IC => reduce runtime (at a given CPI & frequency) u Virtues of instructions simplicity Simpler HW allows for: higher frequency & lower power Optimization can be applied better to simpler code Cheaper HW
33
Computer Architecture 2011 – Introduction (lec1) 33 Basing Design Decisions on Workload Immediate argument’s size in bits (histogram) 1% of data values > 16-bits Having 16 bits is likely good enough 0% 10% 20% 30% 0 1 2 3456789 10 11 12131415 Immediate data bits Int. Avg. FP Avg.
34
Computer Architecture 2011 – Introduction (lec1) 34 CISC Processors u CISC - Complex Instruction Set Computer Example: x86 The idea: a high level machine language Once people programmed in assembly, CISC supposedly easier u Characteristic Many instruction types, with a many addressing modes Some of the instructions are complex Execute complex tasks Require many cycles ALU operations directly on memory (e.g., arr[j] = arr[i]+n) Registers not used (and, accordingly, only a few registers exist) Variable length instructions common instructions get short codes save code length
35
Computer Architecture 2011 – Introduction (lec1) 35 Rankinstruction% of total executed 1load22% 2conditional branch20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move register-register4% 9call1% 10return1% Total96% Simple instructions dominate instruction frequency But it Turns Out…
36
Computer Architecture 2011 – Introduction (lec1) 36 CISC Drawbacks u Complex instructions and complex addressing modes complicates the processor slows down the simple, common instructions contradicts Make The Common Case Fast u Compilers don’t use complex instructions / indexing methods u Variable length instructions are real pain in the neck Difficult to decode few instructions in parallel As long as instruction is not decoded, its length is unknown It is unknown where the instruction ends It is unknown where the next instruction starts An instruction may be over more than a single cache line An instruction may be over more than a single page
37
Computer Architecture 2011 – Introduction (lec1) 37 RISC Processors u RISC - Reduced Instruction Set Computer The idea: simple instructions enable fast hardware u Characteristic A small instruction set, with only a few instructions formats Simple instructions execute simple tasks Most of them require a single cycle (with pipeline) A few indexing methods ALU operations on registers only Memory is accessed using Load and Store instructions only Many orthogonal registers Three address machine: Add dst, src1, src2 Fixed length instructions u Examples: MIPS TM, Sparc TM, Alpha TM, Power TM
38
Computer Architecture 2011 – Introduction (lec1) 38 RISC Processors (Cont.) u Simple arch => simple u-arch Room for larger on die caches Smaller => faster Easier to design & validation (=> cheaper to manufacture) Shorten time-to-market More general-purpose registers (=> less memory refs) u Compiler can be smarter Better pipeline usage Better register allocation u Existing RISC processor are not “pure” RISC e.g., support division which takes many cycles
39
Computer Architecture 2011 – Introduction (lec1) 39 Compilers and ISA u Ease of compilation Orthogonality: no special registers few special cases all operand modes available with any data type or instruction type Regularity: no overloading for the meanings of instruction fields streamlined resource needs easily determined u Register assignment is critical too Easier if lots of registers
40
Computer Architecture 2011 – Introduction (lec1) 40 Still, CISC Is Dominant u x86 (CISC) dominates the processor market u Legacy A vast amount of existing software Intel, AMD, Microsoft benefit But put lot of money to compensate for disadvantage u CISC internally arch emulates RISC Starting at Pentium II and K6 , x86 processors translate CISC instructions into RISC-like operations internally Inside core looks much like that of a RISC processor
41
Computer Architecture 2011 – Introduction (lec1) 41 Software Specific Extensions u Extend arch to accelerate exec of specific apps u Example: SSE TM – Streaming SIMD Extensions 128-bit packed (vector) / scalar single precision FP (4×32) Introduced on Pentium® III on ’99 8 new 128 bit registers (XMM0 – XMM7) Accelerates graphics, video, scientific calculations, … u Packed:Scalar: x0x1x2x3 y0y1y2y3 x0+y0x1+y1x2+y2 x3+y3 + 128-bits x0x1x2x3 y0y1y2y3 x0+y0y1y2 y3 + 128-bits
42
Computer Architecture 2011 – Introduction (lec1) 42 BACKUP
43
Computer Architecture 2011 – Introduction (lec1) 43 Compatibility u Backward compatibility (HW responsibility) When buying new hardware, it can run existing software: i5 can run SW written for Core2 Duo, Pentium 4, Pentium M, Pentium III, Pentium II, Pentium , 486, 386, 268 BTW: u Forward compatibility (SW responsibility) For example: MS Word 2003 can open MS Word 2010 doc Commonly supports one or two generations behind u Architecture-independent SW Run SW on top of VM that does JIT (just in time compiler): JVM for Java and CLR for.NET Interpreted languages: Perl, Python
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.