Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture 2015 – Introduction 1 Computer Architecture (“MAMAS”, 234267) Spring 2015 Lecturer: Yoav Etsion Reception: Mon 15:00, Fishbach 306-8.

Similar presentations


Presentation on theme: "Computer Architecture 2015 – Introduction 1 Computer Architecture (“MAMAS”, 234267) Spring 2015 Lecturer: Yoav Etsion Reception: Mon 15:00, Fishbach 306-8."— Presentation transcript:

1 Computer Architecture 2015 – Introduction 1 Computer Architecture (“MAMAS”, 234267) Spring 2015 Lecturer: Yoav Etsion Reception: Mon 15:00, Fishbach 306-8 TAs: Tomer Gurevich, Franck Sala, Andrey Zhitnikov Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, Adi Yoaz and Dan Tsafrir

2 Computer Architecture 2015 – Introduction 2 General Info u Grade  25% Exercise (mandatory) תקף  75% Final exam u Textbook  “Computer Architecture: A Quantitative Approach” (4 th Edition) by: Patterson & Hennessy u Other course information  Course web site: http://webcourse.cs.technion.ac.il/234267/Spring2013  Lectures will be upload to the web a day before the class

3 Computer Architecture 2015 – Introduction 3 Assignments u Five mandatory assignments during the semester  Only programming assignments u NO CHEATING!  Suspected parties will be sent to a Technion trial  Typical outcome: course disqualified, which means your graduation will be postponed by at least one semester u Possible examples of cheating (a few of many):  Copying any part of the assignments from another student or using a reference from a previous year  Letting someone else copy from you  Posting code/solutions to a shared forum  We can track the code back to the posters! u Be honest and work by yourselves!

4 Computer Architecture 2015 – Introduction 4 The Computer System

5 Computer Architecture 2015 – Introduction 5 Classical System Diagram CPU PCI North Bridge DDR2 or DDR3 Channel 1 mouse LAN Lan Adap External Graphics Card Mem BUS CPU BUS Cache Sound Card speakers South Bridge PCI express 2.0 IO Controller Hard Disk Parallel Port Serial Port Floppy Drive keybrd DDR2 or DDR3 Channel 2 USB controller SATA controller PCI express ×1 Memory controller On-board Graphics DVD Drive IOMMU More to the “north” = closer to the CPU = faster

6 Computer Architecture 2015 – Introduction 6 Intel Nehalem Core i3 i5 i7 For high-end i-Series chips, Northbridge functionality moved onto processor (=> made faster) 45 to 32 nm

7 Computer Architecture 2015 – Introduction 7 Intel Sandy Bridge Core i3 i5 i7 The trend continues 32 to 22 nm

8 Computer Architecture 2015 – Introduction 8 Computer and System Architecture u The physical design of the computer system is commonly known as Computer Architecture u The definition of computer architecture: Prof. Christos Kozyrakis, Stanford Computer theorists invent algorithms that solve important problems and analyze their asymptotic behavior (e.g. O(NlogN) or O(N 3 )). Computer architects set the constant factors for these algorithms…

9 Computer Architecture 2015 – Introduction 9 Computer and System Architecture u Axiom: The ultimate goal is performance (ops/sec) u Constraints:  Power, Number of transistors, Bandwidth, … u Practical goal: Highest performance in light of constraints u Today, the most common goal is Performance/Watt  e.g. ops/Joule

10 Computer Architecture 2015 – Introduction 10 Course Focus u Start from CPU (=processor)  Instruction set, performance  Pipeline, hazards  Branch prediction  Out-of-order execution u Move on to Memory Hierarchy  Caching and Main memory  Virtual Memory u Finish with latency tolerance and parallelism  Hardware multithreading

11 Computer Architecture 2015 – Introduction 11 The Processor

12 Computer Architecture 2015 – Introduction 12 Architecture vs. Microarchitecture u Architecture: = The processor features as seen by its user = Interface  Instruction set, number of registers, addressing modes,… u Microarchitecture: = Manner by which the processor is implemented = Implementation details  Caches size and structure, number of execution units, … u Note: different processors with different u-archs can support the same arch  Intel Pentium-IV vs. Intel Core2 Duo  ARMv9 implemented by Qualcomm, TI, Samsung

13 Computer Architecture 2015 – Introduction 13 Why Should We Care? u Abstractions enhance productivity, so:  If we know the arch (=interface),  Why should we care about the u-arch (=internals)? u Same goes for arch  Just details for a programmer of a high-level language u Abstractions only work so long as what’s below works  The taxi story: http://vimeo.com/11478146 (4:50-6:00)http://vimeo.com/11478146

14 Computer Architecture 2015 – Introduction 14 Recent Processor Trends Source: http://www.scidacreview.org/0904/html/multicore.htmlhttp://www.scidacreview.org/0904/html/multicore.html

15 Computer Architecture 2015 – Introduction 15 Well-Known Moore’s Law Graph taken from: http://www.intel.com/technology/mooreslaw/index.htm

16 Computer Architecture 2015 – Introduction 16

17 Computer Architecture 2015 – Introduction 17 The Story in a Nutshell Transistors (1000s) clock speed (MHz) power (W) Instructions/cycle (ILP)

18 Computer Architecture 2015 – Introduction 18 Took the Industry by Surprise

19 Computer Architecture 2015 – Introduction 19 Dire Implications: Performance

20 Computer Architecture 2015 – Introduction 20 Dire Implications: Sales

21 Computer Architecture 2015 – Introduction 21 Dire Implications: Sales

22 Computer Architecture 2015 – Introduction 22 Dire Implications: Programmers

23 Computer Architecture 2015 – Introduction 23 Supercomputing: “Top 500 list”

24 Computer Architecture 2015 – Introduction 24 Dire Implications: Supercomputing

25 Computer Architecture 2015 – Introduction 25 Processor Performance

26 Computer Architecture 2015 – Introduction 26 Metrics: IC, CPI, IPC u CPUs work according to a clock signal  Clock cycle: measured in nanoseconds (10 -9 of a second)  Clock frequency = 1/|clock cycle|: in GHz (10 9 cycles/sec) u Instruction Count (IC)  Total number of instructions executed in the program u Cycles Per Instruction (CPI)  Average #cycles per Instruction (in a given program)  IPC (= 1/CPI) : Instructions per cycles. Can be > 1; see the “story in a nutshell slide” CPI = #cycles required to execute the program IC

27 Computer Architecture 2015 – Introduction 27 Minimizing Execution Time u CPU Time - time required to execute a program CPU Time = IC  CPI  clock cycle u Our goal: minimize CPU Time (any of above components)  Minimize clock cycle: increase GHz (processor design)  Minimize CPI: u-arch (e.g.: more execution units)  Minimize IC: arch + u-arch (e.g.: SSE TM ) SSE = streaming SIMD extension (Intel)

28 Computer Architecture 2015 – Introduction 28 Alternative Way to Calculate CPI u ICi = #times instruction of type-i is executed in program u IC = #instruction executed in program = u Fi = relative frequency of type-i instruction = ICi/IC u CPI i = #cycles to execute type-i instruction  e.g.: CPI add = 1, CPI mul = 3 u #cycles required to execute the program: u CPI:

29 Computer Architecture 2015 – Introduction 29 Performance Evaluation: How? u No simple answer u Performance depends on  Application  Input u Mathematical analysis u Typically impossible u Systems is too complex to model accurately u So how do we evaluate systems? u Empirical analysis

30 Computer Architecture 2015 – Introduction 30 Benchmarks u Use benchmarks & measure how long it takes  Use real applications (=> no absolute answers) u Preferably standardized benchmarks (+input), e.g.,  SPEC INT: integer apps Compression, C complier, Perl, text-processing, …  SPEC FP: floating point apps (mostly scientific)  TPC benchmarks: measure transaction throughput (DB)  SPEC JBB: models wholesale company (Java server, DB) u Sometimes you see FLOPS (“peak” or “sustained”)  Supercomputers (top500 list), against LINPACK

31 Computer Architecture 2015 – Introduction 31 Evaluating Performance u Use a performance simulator to evaluate the performance of a new feature / algorithm  Models the uarch to a great detail  Run 100’s of representative applications u Produce the performance s-curve  Sort the applications according to the IPC increase  Baseline (0%) is the processor without the new feature Negative outliers Positive outliers Bad S-curve Small negative outliers Positive outliers Good S-curve

32 Computer Architecture 2015 – Introduction 32 Amdahl’s Law u Suppose we accelerate the computation such that  P = portion of computation we make faster  S = speedup experienced by the portion we improved u For example  If an improvement can speedup 40% of the computation => P = 0.4  If the improvement makes the portion run twice as fast => S = 2 u Then overall speedup =

33 Computer Architecture 2015 – Introduction 33 Amdahl’s Law - Example u FP operations improved to run 2x faster  S = 2, but…  P = only affects 10% of the program  Speedup: u Conclusion  Better to make common case fast…

34 Computer Architecture 2015 – Introduction 34 Amdahl’s Law – Parallelism u When parallelizing a program  P = proportion of program that can be made parallel  1 - P = inherently serial  N = number of processing elements (say, cores)  Speedup: u Serial component imposes a hard limit

35 Computer Architecture 2015 – Introduction 35 The ISA is what the user & compiler see The HW implements the ISA instruction set software hardware Instruction Set Design

36 Computer Architecture 2015 – Introduction 36 Considerations in ISA Design u Instruction size  Long instructions take more time to fetch from memory  Longer instructions require a larger memory Important for small (embedded) devices u Number of instructions (IC)  Reduce IC => reduce runtime (at a given CPI & frequency) u Virtues of instructions simplicity  Simpler HW allows for: higher frequency & lower power  Optimization can be applied better to simpler code  Cheaper HW

37 Computer Architecture 2015 – Introduction 37 Basing Design Decisions on Workload Immediate argument’s size in bits (histogram)  1% of data values > 16-bits  Having 16 bits is likely good enough 0% 10% 20% 30% 0 1 2 3456789 10 11 12131415 Immediate data bits Int. Avg. FP Avg.

38 Computer Architecture 2015 – Introduction 38 CISC Processors u CISC - Complex Instruction Set Computer  Example: x86  The idea: a high level machine language When people programmed in assembly, CISC supposedly easier u Characteristic  Many instruction types, with a many addressing modes  Some of the instructions are complex Execute complex tasks Require many cycles  ALU operations directly on memory (e.g., arr[j] = arr[i]+n) Registers not used (and, accordingly, only a few registers exist)  Variable length instructions common instructions get short codes  save code length

39 Computer Architecture 2015 – Introduction 39 Rankinstruction% of total executed 1load22% 2conditional branch20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move register-register4% 9call1% 10return1% Total96% Simple instructions dominate instruction frequency But it Turns Out…

40 Computer Architecture 2015 – Introduction 40 CISC Drawbacks u Complex instructions and complex addressing modes  complicates the processor  slows down the simple, common instructions  contradicts Make The Common Case Fast u Compilers don’t use complex instructions / indexing methods u Variable length instructions are real pain in the neck  Difficult to decode few instructions in parallel As long as instruction is not decoded, its length is unknown  It is unknown where the instruction ends  It is unknown where the next instruction starts  An instruction may be longer than a cache line Or even longer longer than a page (in theory)

41 Computer Architecture 2015 – Introduction 41 RISC Processors u RISC - Reduced Instruction Set Computer  The idea: simple instructions enable fast hardware u Characteristic  A small instruction set, with only a few instructions formats  A few indexing methods  Load/Store machine: operate only on registers Memory is accessed using Load and Store instructions only Many orthogonal registers Three address machine: Add dst, src1, src2  Fixed length instructions u Examples: ARM TM MIPS TM, Sparc TM, Alpha TM, Power TM

42 Computer Architecture 2015 – Introduction 42 RISC Processors (Cont.) u Simple arch => simple u-arch  Smaller => faster  Easier to design & validate (=> cheaper to manufacture)  Shorten time-to-market  More general-purpose registers (=> less memory refs) u Compiler can be smarter  Better pipeline usage  Better register allocation u Existing RISC processor are not “pure” RISC  Various complex operations added along the way

43 Computer Architecture 2015 – Introduction 43 Compilers and ISA u Ease of compilation  Orthogonality: no special registers few special cases all operand modes available with any data type or instruction type  Regularity: no overloading for the meanings of instruction fields  streamlined resource needs easily determined u Register assignment is critical too  Easier if lots of registers

44 Computer Architecture 2015 – Introduction 44 CISC or RISC? u ARM (RISC) dominates the personal market u Not necessarily because it is RISC… u x86 (CISC) dominates the PC and server market  Not necessarily because it is CISC… u Intel processors have a RISC  arch  CISC instructions are broken down to RISC micro-ops u Pro CISC:  Programs are more compact (fewer instructions)  Dynamic optimizations u Pro RISC:  Simpler and faster uarch

45 Computer Architecture 2015 – Introduction 45 u Extend arch to accelerate exec of specific apps u Example: SSE TM – Streaming SIMD Extensions  128-bit packed (vector) / scalar single precision FP (4×32)  Introduced on Pentium® III on ’99  8 new 128 bit registers (XMM0 – XMM7)  Accelerates graphics, video, scientific calculations, … ISA Extensions x0x1x2x3 y0y1y2y3 x0+y0x1+y1x2+y2 x3+y3 + 128-bits

46 Computer Architecture 2015 – Introduction 46 BACKUP

47 Computer Architecture 2015 – Introduction 47 Compatibility u Backward compatibility (HW responsibility)  When buying new hardware, it can run existing software: i5 can run SW written for Core2 Duo, Pentium  4, Pentium  M, Pentium  III, Pentium  II, Pentium , 486, 386, 268 BTW: u Forward compatibility (SW responsibility)  For example: MS Word 2003 can open MS Word 2010 doc  Commonly supports one or two generations behind u Architecture-independent SW  Run SW on top of VM that does JIT (just in time compiler): JVM for Java and CLR for.NET  Interpreted languages: Perl, Python


Download ppt "Computer Architecture 2015 – Introduction 1 Computer Architecture (“MAMAS”, 234267) Spring 2015 Lecturer: Yoav Etsion Reception: Mon 15:00, Fishbach 306-8."

Similar presentations


Ads by Google