Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Instruction Set Design
Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Computer Organization and Architecture
Processor Technology and Architecture
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Chapter 12 Pipelining Strategies Performance Hazards.
Chapter 12 Three System Examples The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Cisc Complex Instruction Set Computing By Christopher Wong 1.
CH12 CPU Structure and Function
Processor Organization and Architecture
Advanced Computer Architectures
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
1 4.2 MARIE This is the MARIE architecture shown graphically.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
Cis303a_chapt04.ppt Chapter 4 Processor Technology and Architecture Internal Components CPU Operation (internal components) Control Unit Move data and.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
RISC and CISC. What is CISC? CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CISC. What is it?  CISC - Complex Instruction Set Computer  CISC is a design philosophy that:  1) uses microcode instruction sets  2) uses larger.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Topics to be covered Instruction Execution Characteristics
Advanced Architectures
Central Processing Unit Architecture
5.2 Eleven Advanced Optimizations of Cache Performance
Chapter 14 Instruction Level Parallelism and Superscalar Processors
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Central Processing Unit
Central Processing Unit
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Morgan Kaufmann Publishers Computer Organization and Assembly Language
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
* From AMD 1996 Publication #18522 Revision E
Chapter 12 Pipelining and RISC
Presentation transcript:

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander John Wiley and Sons  2003

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-2 CPU Architecture Overview  CISC – Complex Instruction Set Computer  RISC – Reduced Instruction Set Computer  CISC vs. RISC Comparisons  VLIW – Very Long Instruction Word  EPIC – Explicitly Parallel Instruction Computer

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-3 CISC Architecture  Examples  Intel x86, IBM Z-Series Mainframes, older CPU architectures  Characteristics  Few general purpose registers  Many addressing modes  Large number of specialized, complex instructions  Instructions are of varying sizes

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-4 Limitations of CISC Architecture  Complex instructions are infrequently used by programmers and compilers  Memory references, loads and stores, are slow and account for a significant fraction of all instructions  Procedure and function calls are a major bottleneck  Passing arguments  Storing and retrieving values in registers

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-5 RISC Features  Examples  Power PC, Sun Sparc, Motorola  Limited and simple instruction set  Fixed length, fixed format instruction words  Enable pipelining, parallel fetches and executions  Limited addressing modes  Reduce complicated hardware  Register-oriented instruction set  Reduce memory accesses  Large bank of registers  Reduce memory accesses  Efficient procedure calls

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-6 CISC vs. RISC Processing

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-7 Circular Register Buffer

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-8 Circular Register Buffer - After Procedure Call

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-9 CISC vs. RISC Performance Comparison  RISC  Simpler instructions  more instructions  more memory accesses  RISC  more bus traffic and increased cache memory misses  More registers would improve CISC performance but no space available for them  Modern CISC and RISC architectures are becoming similar

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-10 VLIW Architecture  Transmeta Crusoe CPU  128-bit instruction bundle = molecule  4 32-bit atoms (atom = instruction)  Parallel processing of 4 instructions  64 general purpose registers  Code morphing layer  Translates instructions written for other CPUs into molecules  Instructions are not written directly for the Crusoe CPU

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-11 EPIC Architecture  Intel Itanium CPU  128-bit instruction bundle  3 41-bit instructions  5 bits to identify type of instructions in bundle  bit general purpose registers  bit floating point registers  Intel X86 instruction set included  Programmers and compilers follow guidelines to ensure parallel execution of instructions

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-12 Paging  Managed by the operating system  Built into the hardware  Independent of application

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-13 Logical vs. Physical Addresses  Logical addresses are relative locations of data, instructions and branch target and are separate from physical addresses  Logical addresses mapped to physical addresses  Physical addresses do not need to be consecutive

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-14 Logical vs. Physical Address

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-15 Page Address Layout

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-16 Page Translation Process

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-17 Memory Enhancements  Memory is slow compared to CPU processing speeds!  2Ghz CPU = 1 cycle in ½ of a billionth of a second  70ns DRAM = 1 access in 70 millionth of a second  Methods to improvement memory accesses  Wide Path Memory Access  Retrieve multiple bytes instead of 1 byte at a time  Memory Interleaving  Partition memory into subsections, each with its own address register and data register  Cache Memory

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-18 Memory Interleaving

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-19 Why Cache?  Even the fastest hard disk has an access time of about 10 milliseconds  2Ghz CPU waiting 10 milliseconds wastes 20 million clock cycles!

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-20 Cache Memory  Blocks: 8 or 16 bytes  Tags: location in main memory  Cache controller  hardware that checks tags  Cache Line  Unit of transfer between storage and cache memory  Hit Ratio: ratio of hits out of total requests  Synchronizing cache and memory  Write through  Write back

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-21 Step-by-Step Use of Cache

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-22 Step-by-Step Use of Cache

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-23 Performance Advantages  Hit ratios of 90% common  50%+ improved execution speed  Locality of reference is why caching works  Most memory references confined to small region of memory at any given time  Well-written program in small loop, procedure or function  Data likely in array  Variables stored together

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-24 Two-level Caches  Why do the sizes of the caches have to be different?

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-25 Cache vs. Virtual Memory  Cache speeds up memory access  Virtual memory increases amount of perceived storage  independence from the configuration and capacity of the memory system  low cost per bit

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-26 Modern CPU Processing Methods  Timing Issues  Separate Fetch/Execute Units  Pipelining  Scalar Processing  Superscalar Processing

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-27 Timing Issues  Computer clock used for timing purposes  MHz – million steps per second  GHz – billion steps per second  Instructions can (and often) take more than one step  Data word width can require multiple steps

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-28 Separate Fetch-Execute Units  Fetch Unit  Instruction fetch unit  Instruction decode unit  Determine opcode  Identify type of instruction and operands  Several instructions are fetched in parallel and held in a buffer until decoded and executed  IP – Instruction Pointer register  Execute Unit  Receives instructions from the decode unit  Appropriate execution unit services the instruction

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-29 Alternative CPU Organization

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-30 Instruction Pipelining  Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions  Only one instruction is being executed to completion at a time  Scalar processing  Average instruction execution is approximately equal to the clock speed of the CPU  Problems from stalling  Instructions have different numbers of steps  Problems from branching

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-31 Branch Problem Solutions  Separate pipelines for both possibilities  Probabilistic approach  Requiring the following instruction to not be dependent on the branch  Instruction Reordering (superscalar processing)

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-32 Pipelining Example

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-33 Superscalar Processing  Process more than one instruction per clock cycle  Separate fetch and execute cycles as much as possible  Buffers for fetch and decode phases  Parallel execution units

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-34 Superscalar CPU Block Diagram

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-35 Scalar vs. Superscalar Processing

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-36 Superscalar Issues  Out-of-order processing – dependencies (hazards)  Data dependencies  Branch (flow) dependencies and speculative execution  Parallel speculative execution or branch prediction  Branch History Table  Register access conflicts  Logical registers

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-37 Hardware Implementation  Hardware – operations are implemented by logic gates  Advantages  Speed  RISC designs are simple and typically implemented in hardware

Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-38 Microprogrammed Implementation  Microcode are tiny programs stored in ROM that replace CPU instructions  Advantages  More flexible  Easier to implement complex instructions  Can emulate other CPUs  Disadvantage  Requires more clock cycles