Computer Architecture 2012 – Introduction (lec1) 1 Computer Architecture (“MAMAS”, 234267) Spring 2012 Lecturer: Dan Tsafrir Reception: Mon 18:30, Taub.

Slides:



Advertisements
Similar presentations
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
Advertisements

1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
Computer Abstractions and Technology
RISC vs CISC CS 3339 Lecture 3.2 Apan Qasem Texas State University Spring 2015 Some slides adopted from Milo Martin at UPenn.
Computer Design – Introduction 1 MAMAS – Computer Architecture Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson, Randi Katz,
Computer Architecture 2009 – Introduction 1 MAMAS – Computer Architecture Lecturer: Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson,
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
Introduction Lihu Rappoport, 10/ MAMAS – Computer Architecture Dr. Lihu Rappoport Some of the slides were taken from: (1) Avi Mendelson (2)
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
CIS 314 : Computer Organization Lecture 1 – Introduction.
Computer Architecture 2011 – Introduction (lec1) 1 Computer Architecture (“MAMAS”, ) Spring 2011 Lecturer: Dan Tsafrir Reception: Mon 18:30, Taub.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
MAMAS – Computer Structure
Introduction to Computer Architecture SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING SUMMER 2015 RAMYAR SAEEDI.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Computer Architecture 2015 – Introduction 1 Computer Architecture (“MAMAS”, ) Spring 2015 Lecturer: Yoav Etsion Reception: Mon 15:00, Fishbach
Introduction CSE 410, Spring 2008 Computer Systems
Computer Structure 2012 – Introduction 1 MAMAS – Computer Structure Lecturers: Lihu Rappoport Adi Yoaz Some of the slides were taken from Avi Mendelson,
Computer Architecture 2014 – Introduction 1 Computer Architecture (“MAMAS”, ) Spring 2014 Lecturer: Yoav Etsion Reception: Mon 15:00, Fishbach
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Chapter 2 The CPU and the Main Board  2.1 Components of the CPU 2.1 Components of the CPU 2.1 Components of the CPU  2.2Performance and Instruction Sets.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
What have mr aldred’s dirty clothes got to do with the cpu
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Computer Structure 2013 – Introduction 1 MAMAS – Computer Structure Lecturers: Lihu Rappoport Adi Yoaz Some of the slides were taken from Avi Mendelson,
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
 Introduction to SUN SPARC  What is CISC?  History: CISC  Advantages of CISC  Disadvantages of CISC  RISC vs CISC  Features of SUN SPARC  Architecture.
Computer Organization (1) تنظيم الحاسبات (1)
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Performance Performance
1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
Introduction CSE 410, Spring 2005 Computer Systems
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Lecturer: Roni Kupershtok Prepared by Lihu Rappoport
Lecture 3: MIPS Instruction Set
How do we evaluate computer architectures?
A Closer Look at Instruction Set Architectures
Lecturers: Lihu Rappoport Adi Yoaz
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
Computer Architecture CSCE 350
Chapter 1 Fundamentals of Computer Design
Central Processing Unit
CMSC 611: Advanced Computer Architecture
Performance of computer systems
Lecture 3: MIPS Instruction Set
COMS 361 Computer Organization
Performance of computer systems
CMSC 611: Advanced Computer Architecture
A Level Computer Science Topic 5: Computer Architecture and Assembly
Lecturers: Lihu Rappoport Adi Yoaz
Presentation transcript:

Computer Architecture 2012 – Introduction (lec1) 1 Computer Architecture (“MAMAS”, ) Spring 2012 Lecturer: Dan Tsafrir Reception: Mon 18:30, Taub /3/2012 Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz

Computer Architecture 2012 – Introduction (lec1) 2 General Info u Grade  20% Exercise (mandatory) תקף  80% Final exam u Textbook  “Computer Architecture: A Quantitative Approach” (4 th Edition) by: Patterson & Hennessy u Other course information  Course web site:  Lectures will be upload to the web a day before the class

Computer Architecture 2012 – Introduction (lec1) 3 Computer System Structure

Computer Architecture 2012 – Introduction (lec1) 4 Classical Motherboard Diagram CPU PCI North Bridge DDR2 or DDR3 Channel 1 mouse LAN Lan Adap External Graphics Card Mem BUS CPU BUS Cache Sound Card speakers South Bridge PCI express 2.0 IO Controller Hard Disk Parallel Port Serial Port Floppy Drive keybrd DDR2 or DDR3 Channel 2 USB controller SATA controller PCI express ×1 Memory controller On-board Graphics DVD Drive IOMMU More to the “north” = closer to the CPU = faster

Computer Architecture 2012 – Introduction (lec1) 5 Intel Core 2 Northbridge = MCH = mem controller hub Southbridge = ICH = I/O controller hub Notice bandwidths 65 to 45 nm

Computer Architecture 2012 – Introduction (lec1) 6 Intel Nehalem Core i3 i5 i7 For high-end i-Series chips, Northbridge functionality moved onto processor (=> made faster) 45 to 32 nm

Computer Architecture 2012 – Introduction (lec1) 7 Intel Sandy Bridge Core i3 i5 i7 The trend continues 32 to 22 nm

Computer Architecture 2012 – Introduction (lec1) 8

9 Course Focus u Start from CPU (=processor)  Instruction set, performance  Pipeline, hazards  Branch prediction  Out-of-order execution u Move on to Memory Hierarchy  Caching  Main memory  Virtual Memory u Move on to PC Architecture  Motherboard & chipset, DRAM, I/O, Disk, peripherals u End with some Advanced Topics

Computer Architecture 2012 – Introduction (lec1) 10 The Processor

Computer Architecture 2012 – Introduction (lec1) 11 Architecture vs. Microarchitecture u Architecture: = The processor features as seen by its user = Interface  Instruction set, number of registers, addressing modes,… u Microarchitecture: = Manner by which the processor is implemented = Implementation details  Caches size and structure, number of execution units, … u Note: different processors with different u-archs can support the same arch  Example: Intel Pentium-IV vs. Intel Core2 Duo u We will address both

Computer Architecture 2012 – Introduction (lec1) 12 Why Should We Care? u Abstractions enhance productivity, so:  If we know the arch (=interface),  Why should we care about the u-arch (=internals)? u Same goes for arch  Just details for a programmer of a high-level language u Abstractions only work so long as what’s below works  The taxi story: (4:50-6:00)

Computer Architecture 2012 – Introduction (lec1) 13 Recent Processor Trends Source:

Computer Architecture 2012 – Introduction (lec1) 14 Well-Known Moore’s Law Graph taken from:

Computer Architecture 2012 – Introduction (lec1) 15

Computer Architecture 2012 – Introduction (lec1) 16 The Story in a Nutshell Transistors (1000s) clock speed (MHz) power (W) Instructions/cycle (ILP)

Computer Architecture 2012 – Introduction (lec1) 17 Took the Industry by Surprise

Computer Architecture 2012 – Introduction (lec1) 18 Dire Implications: Performance

Computer Architecture 2012 – Introduction (lec1) 19 Dire Implications: Sales

Computer Architecture 2012 – Introduction (lec1) 20 Dire Implications: Sales

Computer Architecture 2012 – Introduction (lec1) 21 Dire Implications: Programmers

Computer Architecture 2012 – Introduction (lec1) 22 Supercomputing: “Top 500 list”

Computer Architecture 2012 – Introduction (lec1) 23 Dire Implications: Supercomputing

Computer Architecture 2012 – Introduction (lec1) 24 Processor Performance

Computer Architecture 2012 – Introduction (lec1) 25 Metrics: IC, CPI, IPC u CPUs work according to a clock signal  Clock cycle: measured in nanoseconds (10 -9 of a second)  Clock frequency = 1/|clock cycle|: in GHz (10 9 cycles/sec) u Instruction Count (IC)  Total number of instructions executed in the program u Cycles Per Instruction (CPI)  Average #cycles per Instruction (in a given program)  IPC (= 1/CPI) : Instructions per cycles. Can be > 1; see the “story in a nutshell slide” CPI = #cycles required to execute the program IC

Computer Architecture 2012 – Introduction (lec1) 26 Minimizing Execution Time u CPU Time - time required to execute a program CPU Time = IC  CPI  clock cycle u Our goal: minimize CPU Time (any of above components)  Minimize clock cycle: increase GHz (processor design)  Minimize CPI: u-arch (e.g.: more execution units)  Minimize IC: arch + u-arch (e.g.: SSE TM ) SSE = streaming SIMD extension (Intel)

Computer Architecture 2012 – Introduction (lec1) 27 Alternative Way to Calculate CPI u ICi = #times instruction of type-i is executed in program u IC = #instruction executed in program = u Fi = relative frequency of type-i instruction = ICi/IC u CPI i = #cycles to execute type-i instruction  e.g.: CPI add = 1, CPI mul = 3 u #cycles required to execute the program: u CPI:

Computer Architecture 2012 – Introduction (lec1) 28 Performance Evaluation: How? u No simple answer u Performance depends on  Application  Input u Mathematical analysis u Typically impossible u What to do?

Computer Architecture 2012 – Introduction (lec1) 29 Benchmarks u Use benchmarks & measure how long it takes  Use real applications (=> no absolute answers) u Preferably standardized benchmarks (+input), e.g.,  SPEC INT: integer apps Compression, C complier, Perl, text-processing, …  SPEC FP: floating point apps (mostly scientific)  TPC benchmarks: measure transaction throughput (DB)  SPEC JBB: models wholesale company (Java server, DB) u Sometimes you see FLOPS (“pick” or “sustained”)  Supercomputers (top500 list), against LINPACK

Computer Architecture 2012 – Introduction (lec1) 30 Evaluating Performance u Use a performance simulator to evaluate the performance of a new feature / algorithm  Models the uarch to a great detail  Run 100’s of representative applications u Produce the performance s-curve  Sort the applications according to the IPC increase  Baseline (0%) is the processor without the new feature Negative outliers Positive outliers Bad S-curve Small negative outliers Positive outliers Good S-curve

Computer Architecture 2012 – Introduction (lec1) 31 Amdahl’s Law u Suppose we accelerate the computation such that  P = proportion of computation we make faster  S = speedup experienced by the proportion we improved u For example  If an improvement can speedup 40% of the computation => P = 0.4  If the improvement makes the portion run twice as fast => S = 2 u Then overall speedup =

Computer Architecture 2012 – Introduction (lec1) 32 Amdahl’s Law - Example u FP operations improved to run 2x faster  S = 2, but…  P = only affects 10% of the program  Speedup: u Conclusion  Better to make common case fast…

Computer Architecture 2012 – Introduction (lec1) 33 Amdahl’s Law – Parallelism u When parallelizing a program  P = proportion of program that can be made parallel  1 - P = inherently serial  N = number of processing elements (say, cores)  Speedup: u Serial component imposes a hard limit

Computer Architecture 2012 – Introduction (lec1) 34 The ISA is what the user & compiler see The HW implements the ISA instruction set software hardware Instruction Set Design

Computer Architecture 2012 – Introduction (lec1) 35 Considerations in ISA Design u Instruction size  Long instructions take more time to fetch from memory  Longer instructions require a larger memory Important for small (embedded) devices, e.g., cell phones u Number of instructions (IC)  Reduce IC => reduce runtime (at a given CPI & frequency) u Virtues of instructions simplicity  Simpler HW allows for: higher frequency & lower power  Optimization can be applied better to simpler code  Cheaper HW

Computer Architecture 2012 – Introduction (lec1) 36 Basing Design Decisions on Workload Immediate argument’s size in bits (histogram)  1% of data values > 16-bits  Having 16 bits is likely good enough 0% 10% 20% 30% Immediate data bits Int. Avg. FP Avg.

Computer Architecture 2012 – Introduction (lec1) 37 CISC Processors u CISC - Complex Instruction Set Computer  Example: x86  The idea: a high level machine language Once people programmed in assembly, CISC supposedly easier u Characteristic  Many instruction types, with a many addressing modes  Some of the instructions are complex Execute complex tasks Require many cycles  ALU operations directly on memory (e.g., arr[j] = arr[i]+n) Registers not used (and, accordingly, only a few registers exist)  Variable length instructions common instructions get short codes  save code length

Computer Architecture 2012 – Introduction (lec1) 38 Rankinstruction% of total executed 1load22% 2conditional branch20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move register-register4% 9call1% 10return1% Total96% Simple instructions dominate instruction frequency But it Turns Out…

Computer Architecture 2012 – Introduction (lec1) 39 CISC Drawbacks u Complex instructions and complex addressing modes  complicates the processor  slows down the simple, common instructions  contradicts Make The Common Case Fast u Compilers don’t use complex instructions / indexing methods u Variable length instructions are real pain in the neck  Difficult to decode few instructions in parallel As long as instruction is not decoded, its length is unknown  It is unknown where the instruction ends  It is unknown where the next instruction starts  An instruction may be over more than a single cache line  An instruction may be over more than a single page

Computer Architecture 2012 – Introduction (lec1) 40 RISC Processors u RISC - Reduced Instruction Set Computer  The idea: simple instructions enable fast hardware u Characteristic  A small instruction set, with only a few instructions formats  Simple instructions execute simple tasks Most of them require a single cycle (with pipeline)  A few indexing methods  ALU operations on registers only Memory is accessed using Load and Store instructions only Many orthogonal registers Three address machine: Add dst, src1, src2  Fixed length instructions u Examples: MIPS TM, Sparc TM, Alpha TM, Power TM

Computer Architecture 2012 – Introduction (lec1) 41 RISC Processors (Cont.) u Simple arch => simple u-arch  Room for larger on die caches  Smaller => faster  Easier to design & validate (=> cheaper to manufacture)  Shorten time-to-market  More general-purpose registers (=> less memory refs) u Compiler can be smarter  Better pipeline usage  Better register allocation u Existing RISC processor are not “pure” RISC  e.g., support division which takes many cycles

Computer Architecture 2012 – Introduction (lec1) 42 Compilers and ISA u Ease of compilation  Orthogonality: no special registers few special cases all operand modes available with any data type or instruction type  Regularity: no overloading for the meanings of instruction fields  streamlined resource needs easily determined u Register assignment is critical too  Easier if lots of registers

Computer Architecture 2012 – Introduction (lec1) 43 Still, CISC Is Dominant u x86 (CISC) dominates the processor market u Legacy  A vast amount of existing software  Intel, AMD, Microsoft benefit  But put lot of money to compensate for disadvantage u CISC internally  arch emulates RISC  Starting at Pentium  II and K6 , x86 processors translate CISC instructions into RISC-like operations internally  Inside core looks much like that of a RISC processor

Computer Architecture 2012 – Introduction (lec1) 44 Software Specific Extensions u Extend arch to accelerate exec of specific apps u Example: SSE TM – Streaming SIMD Extensions  128-bit packed (vector) / scalar single precision FP (4×32)  Introduced on Pentium® III on ’99  8 new 128 bit registers (XMM0 – XMM7)  Accelerates graphics, video, scientific calculations, … u Packed:Scalar: x0x1x2x3 y0y1y2y3 x0+y0x1+y1x2+y2 x3+y bits x0x1x2x3 y0y1y2y3 x0+y0y1y2 y bits

Computer Architecture 2012 – Introduction (lec1) 45 BACKUP

Computer Architecture 2012 – Introduction (lec1) 46 Compatibility u Backward compatibility (HW responsibility)  When buying new hardware, it can run existing software: i5 can run SW written for Core2 Duo, Pentium  4, Pentium  M, Pentium  III, Pentium  II, Pentium , 486, 386, 268 BTW: u Forward compatibility (SW responsibility)  For example: MS Word 2003 can open MS Word 2010 doc  Commonly supports one or two generations behind u Architecture-independent SW  Run SW on top of VM that does JIT (just in time compiler): JVM for Java and CLR for.NET  Interpreted languages: Perl, Python