(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

Computer Organization and Architecture
Chapter 2 Data Manipulation Dr. Farzana Rahman Assistant Professor Department of Computer Science James Madison University 1 Some sldes are adapted from.
Computer Organization and Architecture
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Architecture and Data Manipulation Chapter 3.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Computer Science: An Overview Tenth Edition by J. Glenn Brookshear Chapter.
Processor Technology and Architecture
Chapter 12 Pipelining Strategies Performance Hazards.
Data Manipulation Computer System consists of the following parts:
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Chapter 12 CPU Structure and Function. Example Register Organizations.
RISC and CISC. Dec. 2008/Dec. and RISC versus CISC The world of microprocessors and CPUs can be divided into two parts:
Processor Structure & Operations of an Accumulator Machine
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
Parallelism Processing more than one instruction at a time. Pipelining
CS 1308 Computer Literacy and the Internet Computer Systems Organization.
Computer Systems Organization CS 1428 Foundations of Computer Science.
What have mr aldred’s dirty clothes got to do with the cpu
Chapter 4 MARIE: An Introduction to a Simple Computer.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Lecture 14 Today’s topics MARIE Architecture Registers Buses
ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
Introduction to Microprocessors
Computer Architecture 2 nd year (computer and Information Sc.)
Pipelining and Parallelism Mark Staveley
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Data Manipulation Brookshear, J.G. (2012) Computer Science: an Overview.
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Processor Types And Instruction sets Chapter- 5.
Overview von Neumann Architecture Computer component Computer function
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 2: Data Manipulation
CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:
CMSC 104, Lecture 061 Stored Programs A look at how programs are executed.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
The Processor & its components. The CPU The brain. Performs all major calculations. Controls and manages the operations of other components of the computer.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
PipeliningPipelining Computer Architecture (Fall 2006)
Advanced Architectures
Computer Organization and Architecture + Networks
Computer Organization
Central Processing Unit Architecture
A Closer Look at Instruction Set Architectures
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Overview Introduction General Register Organization Stack Organization
Central Processing Unit CPU
Morgan Kaufmann Publishers Computer Organization and Assembly Language
MARIE: An Introduction to a Simple Computer
Chapter 2: Data Manipulation
Chapter 5: Computer Systems Organization
Chapter 2: Data Manipulation
Introduction to Microprocessor Programming
Information Representation: Machine Instructions
COMPUTER ORGANIZATION AND ARCHITECTURE
Chapter 2: Data Manipulation
Presentation transcript:

(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers – pipelining – superscalar and VLIW  CISC vs. RISC

(6.2) Computer Architecture  Major components of a computer – Central Processing Unit (CPU) – memory – peripheral devices  Architecture is concerned with – internal structures of each – interconnections » speed and width – relative speeds of components  Want maximum execution speed – Balance is often critical issue

(6.3) Computer Architecture (continued)  CPU – performs arithmetic and logical operations – synchronous operation – may consider instruction set architecture » how machine looks to a programmer – detailed hardware design

(6.4) Computer Architecture (continued)  Memory – stores programs and data – organized as » bit » byte = 8 bits (smallest addressable location) » word = 4 bytes (typically; machine dependent) – instructions consist of operation codes and addresses oprn addr 1 addr 2 addr 3addr 2 addr 1

(6.5) Computer Architecture (continued)  Numeric data representations – integer (exact representation) » sign-magnitude » 2’s complement negative values change 0 to 1, add 1 – floating point (approximate representation) » scientific notation: x 10 6 » inherently imprecise » IEEE Standard smagnitude sexpsignificand

(6.6) Simple Machine Organization  Institute for Advanced Studies machine (1947) – “von Neumann machine” » ALU performs transfers between memory and I/O devices » note two instructions per memory word main memory Input- Output Equipment Arithmetic - Logic Unit Program Control Unit op code address

(6.7) Simple Machine Organization (continued)  ALU does arithmetic and logical comparisons – AC = accumulator holds results – MQ = memory-quotient holds second portion of long results – MBR = memory buffer register holds data while operation executes

(6.8) Simple Machine Organization (continued)  Program control determines what computer does based on instruction read from memory – MAR = memory address register holds address of memory cell to be read – PC = program counter; address of next instruction to be read – IR = instruction register holds instruction being executed – IBR holds right half of instruction read from memory

(6.9) Simple Machine Organization (continued)  Machine operates on fetch-execute cycle  Fetch – PC MAR – read M(MAR) into MBR – copy left and right instructions into IR and IBR  Execute – address part of IR MAR – read M(MAR) into MBR – execute opcode

(6.10) Simple Machine Organization (continued)

(6.11) Architecture Families  Before mid-60’s, every new machine had a different instruction set architecture – programs from previous generation didn’t run on new machine – cost of replacing software became too large  IBM System/360 created family concept – single instruction set architecture – wide range of price and performance with same software  Performance improvements based on different detailed implementations – memory path width (1 byte to 8 bytes) – faster, more complex CPU design – greater I/O throughput and overlap  “Software compatibility” now a major issue – partially offset by high level language (HLL) software

(6.12) Architecture Families

(6.13) Multiple Register Machines  Initially, machines had only a few registers – 2 to 8 or 16 common – registers more expensive than memory  Most instructions operated between memory locations – results had to start from and end up in memory, so fewer instructions » although more complex – means smaller programs and (supposedly) faster execution » fewer instructions and data to move between memory and ALU  But registers are much faster than memory – 30 times faster

(6.14) Multiple Register Machines (continued)  Also, many operands are reused within a short time – waste time loading operand again the next time it’s needed  Depending on mix of instructions and operand use, having many registers may lead to less traffic to memory and faster execution  Most modern machines use a multiple register architecture – maximum number about 512, common number 32 integer, 32 floating point

(6.15) Pipelining  One way to speed up CPU is to increase clock rate – limitations on how fast clock can run to complete instruction  Another way is to execute more than one instruction at one time

(6.16) Pipelining  Pipelining breaks instruction execution down into several stages – put registers between stages to “buffer” data and control – execute one instruction – as first starts second stage, execute second instruction, etc. – speedup same as number of stages as long as pipe is full

(6.17) Pipelining (continued)  Consider an example with 6 stages – FI = fetch instruction – DI = decode instruction – CO = calculate location of operand – FO = fetch operand – EI = execute instruction – WO = write operand (store result)

(6.18) Pipelining Example  Executes 9 instructions in 14 cycles rather than 54 for sequential execution

(6.19) Pipelining (continued)  Hazards to pipelining – conditional jump » instruction 3 branches to instruction 15 » pipeline must be flushed and restarted – later instruction needs operand being calculated by instruction still in pipeline » pipeline stalls until result ready

(6.20) Pipelining Problem Example  Is this really a problem?

(6.21) Real-life Problem  Not all instructions execute in one clock cycle – floating point takes longer than integer – fp divide takes longer than fp multiply which takes longer than fp add – typical values » integer add/subtract1 » memory reference1 » fp add2 (make 2 stages) » fp (or integer) multiply6 (make 2 stages) » fp (or integer) divide15  Break floating point unit into a sub-pipeline – execute up to 6 instructions at once

(6.22) Pipelining (continued)  This is not simple to implement – note all 6 instructions could finish at the same time!!

(6.23) More Speedup  Pipelined machines issue one instruction each clock cycle – how to speed up CPU even more?  Issue more than one instruction per clock cycle

(6.24) Superscalar Architectures  Superscalar machines issue a variable number of instructions each clock cycle, up to some maximum – instructions must satisfy some criteria of independence » simple choice is maximum of one fp and one integer instruction per clock » need separate execution paths for each possible simultaneous instruction issue – compiled code from non-superscalar implementation of same architecture runs unchanged, but slower

(6.25) Superscalar Example  Each instruction path may be pipelined clock

(6.26) Superscalar Problem  Instruction-level parallelism – what if two successive instructions can’t be executed in parallel? » data dependencies, or two instructions of slow type  Design machine to increase multiple execution opportunities

(6.27) VLIW Architectures  Very Long Instruction Word (VLIW) architectures store several simple instructions in one long instruction fetched from memory – number and type are fixed » e.g., 2 memory reference, 2 floating point, one integer – need one functional unit for each possible instruction » 2 fp units, 1 integer unit, 2 MBRs » all run synchronized – each instruction is stored in a single word » requires wider memory communication paths » many instructions may be empty, meaning wasted code space

(6.28) VLIW Example

(6.29) Instruction Level Parallelism  Success of superscalar and VLIW machines depends on number of instructions that occur together that can be issued in parallel – no dependencies – no branches  Compilers can help create parallelism  Speculation techniques try to overcome branch problems – assume branch is taken – execute instructions but don’t let them store results until status of branch is known

(6.30) CISC vs. RISC  CISC = Complex Instruction Set Computer  RISC = Reduced Instruction Set Computer

(6.31) CISC vs. RISC (continued)  Historically, machines tend to add features over time – instruction opcodes » IBM 70X, 70X0 series went from 24 opcodes to 185 in 10 years » same time performance increased 30 times – addressing modes – special purpose registers  Motivations are to – improve efficiency, since complex instructions can be implemented in hardware and execute faster – make life easier for compiler writers – support more complex higher-level languages

(6.32) CISC vs. RISC  Examination of actual code indicated many of these features were not used  RISC advocates proposed – simple, limited instruction set – large number of general purpose registers » and mostly register operations – optimized instruction pipeline  Benefits should include – faster execution of instructions commonly used – faster design and implementation

(6.33) CISC vs. RISC  Comparing some architectures

(6.34) CISC vs. RISC  Which approach is right?  Typically, RISC takes about 1/5 the design time – but CISC have adopted RISC techniques