Computer Architecture

Computer Architecture
Chang-Bum Lee Dept. of Computer Engineering Youngsan University Computer Architecture

Course Content(1) Lecture #1 Course Overview Course Contents Course Schedule\ Grading Guidelines Test and Assignments Lecture #2 Basic Architecture of Computer Basic Architecture System Configuration Lecture # 3 Instruction Execution Fetch Cycle Execution Cycle Interrupt Cycle Computer Architecture 2

Course Content(2) Lecture #4, 5 Instruction Set Program Control Instruction Formats Addressing Modes Pentium Processors Lecture #6, 7 Arithmetic and Logical Operations Arithmetic and Logical Unit Integer Representation Logic Operations Shift Operations Arithmetic Operations of Integer (Addition, Subtraction, Multiplication, and Division) Computer Architecture 2

Course Content(3) Lecture #8, 9 Real Numbers Representation of Floating Point Numbers Arithmetic Operations of Floating Point Numbers (Addition, Subtraction, Multiplication, and Division) Lecture #10 Control Unit Structure of Control Unit Microinstruction Microprogram Lecture #11 Memory Devices Memory Hierarchy RAM ROM Design of Memory Device Modules Computer Architecture 2

Course Content(4) Lecture #12 Cache Memory Cache Size Fetch Method Mapping Computer Architecture 2

Computer Architecture: Course Overview
Lecture #1 Computer Architecture

Course Objectives Understand role & relationship of hardware and software Exposure to. . . Machine organization Assembly language programming C programming Able to actually build entire (slow) computing system Hardware and software Be distinguished from mere programmers Computer Architecture 2

Course Schedule The complete course, including Lectures and Seminars, will be covered in 90 hours(15 weeks). The total duration of the course will be 4 months. Lecture 3 hours (2 hours + 1 hour) weekly Computer Architecture 2

Grading Guidelines Attendance : 20% Depending on students class participation Final Exam : 40% Textbook based in class final exam Midterm Exam : 30% Textbook based in class mid-term exam Assignments : 10% Based on submitting assignments Computer Architecture 2

Course References Computer Architecture Computer Architecture/Jong-Hyun Kim By Sang Lung Publishing Corp. The course slides will be available at Computer Architecture 2

Course Summary Introduction to computer architecture How is data represented? What are the pieces of a computer? How do computers work? Programming How do I "talk" directly to the machine? How do I program in C? Computer Systems and Computation How do simple HW/SW elements come together to realize complex computations? Computer Architecture 2

Computer Architecture: Basic Architecture

Introduction - Architecture (1)
Architecture is those attributes visible to the programmer Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques. e.g. Is there a multiply instruction? Organization is how features are implemented Control signals, interfaces, memory technology. e.g. Is there a hardware multiply unit or is it done by repeated addition? Computer Architecture

Introduction - Architecture (2)
All Intel x86 family share the same basic architecture. The IBM System/370 family share the same basic architecture. This gives code compatibility. At least backwards Organization differs between different versions. Computer Architecture

Structure & Function Structure is the way in which components relate to each other. Function is the operation of individual components as part of the structure. All computer functions are: Data processing Data storage Data movement Control Computer Architecture

ENIAC Electronic Numerical Integrator And Computer Eckert and Mauchly in University of Pennsylvania Trajectory tables for weapons Started 1943, Finished 1946 Too late for war effort Used until 1955 Decimal (not binary) 20 accumulators of 10 digits Programmed manually by switches 18,000 vacuum tubes, 30 tons 15,000 square feet 140 kW power consumption 5,000 additions per second Computer Architecture

Structure of von Neumann Machine
Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Princeton Institute for Advanced Studies IAS Completed 1952 Computer Architecture

Transistor Based Computers
Transistors Replaced vacuum tubes Smaller Cheaper Less heat dissipation Solid State device Made from Silicon (Sand) Invented 1947 at Bell Labs William Shockley et al. Transistor based computers Second generation machines NCR & RCA produced small transistor machines IBM 7000, DEC Produced PDP-1 Computer Architecture

Speeding It Up & Performance Mismatch
Pipelining On board cache(L1 & L2 cache) Branch prediction Data flow analysis Speculative execution Performance Mismatch Processor speed increased Memory capacity increased Memory speed lags behind processor speed Computer Architecture

Solutions Increase number of bits retrieved at one time. Make DRAM “wider” rather than “deeper” Change DRAM interface. Cache Reduce frequency of memory access. More complex cache and cache on chip Increase interconnection bandwidth. High speed buses Hierarchy of buses Computer Architecture

Program Concept Hardwired systems are inflexible. General purpose hardware can do different tasks, given correct control signals. Instead of re-wiring, supply a new set of control signals. A sequence of steps For each step, an arithmetic or logical operation is done. For each operation, a different set of control signals is needed. Computer Architecture

Computer Components The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit. Data and instructions need to get into the system and results out. Input/output Temporary storage of code and results is needed. Main memory Computer Architecture

Computer Architecture: CPU Structures and Functions

CPU Structure Registers ALU Control Unit CPU Internal Bus Address Bus Data System Bus CPU must: Fetch instructions Interpret instructions Fetch data, process data, and write data Registers CPU must have some working space (temporary storage) Number and function vary between processor designs One of the major design decisions Top level of memory hierarchy Control Unit Control unit coordinates sequence of execution steps ALU ALU performs arithmetic and logical processing Computer Architecture

CPU Structure Software Instruction Set Hardware Computer Architecture

Fetch Cycle(1) Program Counter (PC) holds address of next instruction to fetch. Processor fetches instruction from memory location pointed to by PC. Increment PC Unless told otherwise Instruction loaded into Instruction Register (IR) to: MAR <- PC t1: MBR <-M[MAR], PC <- PC+1 t2: IR <-MBR Processor interprets instruction and performs required actions Computer Architecture

Fetch Cycle(2) Micro operation to: MAR <- PC t1: MBR <-M[MAR], PC <- PC+1 t2: IR <-MBR Address and Instruction Flow in fetch cycle Address Bus Data Bus Control Bus Memory Devices Control Unit Computer Architecture

Execute Cycle(1) Processor-memory data transfer between CPU and main memory Processor I/O Data transfer between CPU and I/O module Data processing Some arithmetic or logical operation on data Control Alteration of sequence of operations e.g. jump Combination of above Computer Architecture

Execute Cycle(2) Example LOAD addr : to: MAR <- IR(addr) t1: MBR <-M[MAR] t2: AC <-MBR STA addr ADD addr Address Bus Data Bus Control Bus Memory Devices Control Unit Computer Architecture

Interrupt Cycle Added to instruction cycle Processor checks for interrupt Indicated by an interrupt signal If no interrupt, fetch next instruction If interrupt pending: Suspend execution of current program Save context Set PC to start address of interrupt handler routine Process interrupt Restore context and continue interrupted program Computer Architecture

Multiple Interrupts(1)
Disable interrupts Processor will ignore further interrupts while processing one interrupt Interrupts remain pending and are checked after first interrupt has been processed Interrupts handled in sequence as they occur Computer Architecture

Multiple Interrupts(2)
Define priorities - Low priority interrupts can be interrupted by higher priority interrupts - When higher priority interrupt has been processed, processor returns to previous interrupt Main Program Computer Architecture

Indirect Cycle May require memory access to fetch operands Indirect addressing requires more memory accesses Can be thought of as additional instruction subcycle Computer Architecture

Prefetch Fetch accessing main memory Execution usually does not access main memory Can fetch next instruction during execution of current instruction Called instruction prefetch Computer Architecture 36

Improved Performance But not doubled: Fetch usually shorter than execution Prefetch more than one instruction? Any jump or branch means that prefetched instructions are not the required instructions Add more stages to improve performance Computer Architecture 37

Pipelining Fetch instruction Decode instruction Calculate operands (i.e. EAs) Fetch operands Execute instructions Write result Overlap these operations Computer Architecture 38

Two Stage Instruction Pipeline
Fetch Execute Instruction Result (a) Simplified View (b) Expanded View Discard New Address Wait Computer Architecture

Memory Connection Receives and sends data Receives addresses (of locations) Receives control signals Read Write Timing Computer Architecture

Input/Output Connection
Similar to memory from computer’s viewpoint Output Receive data from computer Send data to peripheral Input Receive data from peripheral Send data to computer Receive control signals from computer Send control signals to peripherals e.g. spin disk Receive addresses from computer e.g. port number to identify peripheral Send interrupt signals (control) Computer Architecture

CPU Connection Reads instruction and data Writes out data (after processing) Sends control signals to other units Receives (& acts on) interrupts Buses There are a number of possible interconnection systems Single and multiple BUS structures are most common e.g. Control/Address/Data bus (PC) e.g. Unibus (DEC-PDP) Computer Architecture

What is a Bus? A communication pathway connecting two or more devices Usually broadcast Often grouped A number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels. Power lines may not be shown Computer Architecture

Data Bus and Address Bus
Carries data Remember that there is no difference between “data” and “instruction” at this level Width is a key determinant of performance 8, 16, 32, 64 bit Address Bus Identify the source or destination of data e.g. CPU needs to read an instruction (data) from a given location in memory Bus width determines maximum memory capacity of system e.g has 16 bit address bus giving 64k address space Computer Architecture

Control Bus Control and timing information Memory read/write signal Interrupt request Clock signals Computer Architecture

Single Bus Problems Lots of devices on one bus leads to: Propagation delays Long data paths mean that co-ordination of bus use can adversely affect performance. If aggregate data transfer approaches bus capacity. Most systems use multiple buses to overcome these problems. Computer Architecture

Bus Types and Arbitration
Dedicated Separate data & address lines Multiplexed Shared lines Address valid or data valid control line Advantage - fewer lines Disadvantages More complex control Ultimate performance Bus Arbitration More than one module controlling the bus e.g. CPU and DMA controller Only one module may control bus at one time Arbitration may be centralised or distributed Computer Architecture

Timing Co-ordination of events on bus Synchronous Events determined by clock signals Control Bus includes clock line A single 1-0 is a bus cycle All devices can read clock line Usually sync on leading edge Usually a single cycle for an event Asynchronous Read, Write Computer Architecture

Memory Hierarchy & Physical Types
Registers Exist In CPU Internal or Main memory May include one or more levels of cache Mainly “RAM” External memory Backing store Physical Types Semiconductor types are mainly RAM Magnetic types are Disk & Tape Optical types are CD & DVD Others are Bubble, Hologram, etc. Computer Architecture

Performance Access time Time between presenting the address and getting the valid data Memory Cycle time Time may be required for the memory to “recover” before next access. Cycle time is access + recovery. Transfer Rate Rate at which data can be moved. Computer Architecture

Instruction Representation
In machine code each instruction has a unique bit pattern. For human consumption (well, programmers anyway) a symbolic representation is used. e.g. ADD, SUB, LOAD Operands can also be represented in this way. ADD A,B Computer Architecture 5

Computer Architecture: Instruction Types and Addressing Modes
Lecture #4, #5 Computer Architecture

Instruction Format and Types
Simple Instruction Format Instruction Types Data processing Data storage (main memory) Data movement (I/O) Program flow control 4 bits Opcode Operand Reference 6 bits 16 bits Computer Architecture

Number of Addresses (1) 3 addresses Operand 1, Operand 2, Result a = b + c; May be a forth - next instruction (usually implicit) Not common Needs very long words to hold everything Computer Architecture 7

Number of Addresses (2) 2 addresses One address doubles as operand and result. a = a + b Reduces length of instruction Requires some extra work Temporary storage to hold some results 1 address Implicit second address Usually a register (accumulator) Common on early machines Computer Architecture 8

Number of Addresses (3) 0 (zero) addresses All addresses implicit Uses a stack e.g. push a push b add pop c c = a + b Computer Architecture 10

Design Decisions (1) Operation repertoire How many ops? What can they do? How complex are they? Data types Instruction formats Length of op code field Number of addresses Computer Architecture 12

Addressing Modes Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack Computer Architecture 2

Immediate Addressing Operand is part of instruction Operand = address field e.g. ADD 5 Add 5 to contents of accumulator 5 is operand No memory reference to fetch data Fast Limited range Computer Architecture 3

Immediate Addressing Diagram
Instruction Opcode Operand Computer Architecture 4

Direct Addressing Address field contains address of operand. Effective address (EA) = address field (A) e.g. ADD A Add contents of address A to accumulator Single memory reference to access data No additional calculations to work out effective address Limited address space Computer Architecture 5

Direct Addressing Diagram
Address A Opcode Instruction Operand Memory Computer Architecture 6

Indirect Addressing Memory cell pointed to by address field contains the address of (pointer to) the operand. EA = (A) Look in A, find address (A) and look there for operand. e.g. ADD (A) Add contents of cell pointed to by contents of A to accumulator. Large address space 2n where n = word length May be nested, multilevel, cascaded e.g. EA = (((A))) Draw the diagram yourself Multiple memory accesses to find operand Hence slower Computer Architecture 7

Indirect Addressing Diagram
Instruction Opcode Address A Memory Pointer to operand Operand Computer Architecture 9

Register Addressing (1)
Operand is held in register named in address filed. EA = R Limited number of registers Very small address field needed Shorter instructions Faster instruction fetch Computer Architecture 10

Register Addressing (2)
No memory access Very fast execution Very limited address space Multiple registers helps performance Requires good assembly programming or compiler writing N.B. C programming register int a; c.f. Direct addressing Computer Architecture 11

Register Addressing Diagram
Instruction Opcode Register Address R Registers Operand Computer Architecture 12

Register Indirect Addressing
C.f. indirect addressing EA = (R) Operand is in memory cell pointed to by contents of register R Large address space (2n) One fewer memory access than indirect addressing Computer Architecture 13

Register Indirect Addressing Diagram
Instruction Opcode Register Address R Memory Registers Pointer to Operand Operand Computer Architecture 14

Displacement Addressing
EA = A + (R) Address field hold two values A = base value R = register that holds displacement or vice versa Computer Architecture 15

Displacement Addressing Diagram
Instruction Opcode Register R Address A Memory Registers Pointer to Operand Operand + Computer Architecture 16

Relative Addressing A version of displacement addressing R = Program counter, PC EA = A + (PC) i.e. get operand from A cells from current location pointed to by PC c.f locality of reference & cache usage Computer Architecture 17

Base-Register Addressing
A holds displacement R holds pointer to base address R may be explicit or implicit e.g. segment registers in 80x86 Computer Architecture 18

Indexed Addressing A = base R = displacement EA = A + R Good for accessing arrays R++ Computer Architecture 19

Combinations Postindex EA = (A) + (R) Preindex EA = (A+(R)) (Draw the diagrams) Computer Architecture 20

Stack Addressing Operand is (implicitly) on top of stack e.g. ADD Pop top two items from stack and add Computer Architecture 21

Pentium Addressing Modes
Virtual or effective address is offset into segment. Starting address plus offset gives linear address. This goes through page translation if paging enabled. 12 addressing modes available Immediate Register operand Displacement Base Base with displacement Scaled index with displacement Base with index and displacement Base scaled index with displacement Relative Computer Architecture

Instruction Types Instruction generally four types. Data processing Data storage (main memory) Data movement (I/O) Program flow control Computer Architecture 6

Design Decisions (1) Operation repertoire How many ops? What can they do? How complex are they? Data types Instruction formats Length of op code field Number of addresses Computer Architecture 12

Design Decisions (2) Registers Number of CPU registers available Which operations can be performed on which registers? Addressing modes (later…) RISC v CISC Computer Architecture 13

Types of Operation There are several types of operations as follows. Data Transfer Arithmetic Logical Conversion I/O System Control Transfer of Control Computer Architecture 18

Arithmetic Arithmetic operations include Add, Subtract, Multiply, Divide. Can use signed integer. Can arithmetic operations process floating point ? May include. Increment (a++) Decrement (a--) Negate (-a) Computer Architecture 20

Shift and Rotate Operations
Logical right shift Logical left shift Arithmetic right shift Arithmetic left shift Right rotate Left rotate Computer Architecture

Logical and Conversion
Has bitwise operations. Logical operations are AND, OR, NOT, etc. Conversion E.g. Binary to Decimal Computer Architecture 21

Input/Output May be specific instructions. May be done using data movement instructions. (memory mapped) May be done by a separate controller (DMA). Computer Architecture 23

Transfer of Control Branch e.g. branch to x if result is zero Skip e.g. increment and skip if zero ISZ Register1: Skip if zero Branch xxxx Subroutine call c.f. interrupt call: jump to interrupt service routine Computer Architecture 25

Branch Instruction Unconditional Branch Jump to 211 unconditionally. Conditional Branch 1 Jump to 211 if accumulator is zero. Conditional Branch 2 Jump to 235 if R1 equals to R2. Computer Architecture

Nested Procedure Calls
If a main program calls procedure 1, it goes to Proc.1 and it’s procedure is processed. If the Proc.1 calls another procedure(Proc.2), it goes to Proc.2 and it’s procedure is processed. If Proc.2 meets RETURN instruction, it returns to Proc.1. Computer Architecture

Computer Architecture: Arithmetic and Logical Operations of Computer

Arithmetic & Logic Unit
Does the calculations. Everything else in the computer is there to service this unit. Handles integers. May handle floating point (real) numbers. May be separate FPU (maths co-processor). Computer Architecture

Integer Representation
Only have 0 & 1 to represent everything Positive numbers stored in binary e.g. 41= Has no minus sign Has no period Has sign-magnitude Use one’s or two’s compliment Computer Architecture

Sign-Magnitude Left most bit is sign bit. 0 means positive. 1 means negative. +18 = -18 = Problems Need to consider both sign and magnitude in arithmetic Two representations of zero (+0 and -0) Computer Architecture

Two’s Compliment +3 = , +2 = +1 = , +0 = -1 = , -2 = -3 = Benefits Two’s compliment has one representation of zero. Arithmetic works easily (see later). Negating is fairly easy. 3 = Boolean complement gives Add 1 to LSB Computer Architecture

Logical Operations AND, OR, XOR, NOT Selective-set, Selective-complement Masking, Insert, Compare Bitwise operations Logical Shift Circular Shift Arithmetic Shift Shift with Carry Computer Architecture 21

Shift and Rotate Operations
Computer Architecture

Addition and Subtraction
Normal binary addition Monitor sign bit for overflow Take two’s compliment of substahend and add to minuend. i.e. a - b = a + (-b) So we only need addition and complement circuits. Computer Architecture

Hardware for Addition and Subtraction
B Register Complementer SW Adder A Register OF OF: overflow bit SW: Switch (select addition or subtraction) Computer Architecture

Multiplication Is complex Work out partial product for each digit Take care with place value (column) Add partial products Computer Architecture

Multiplication Example
Multiplicand (11 dec) x Multiplier (13 dec) Partial products 0000 1011 Product (143 dec) Note: if multiplier bit is 1, copy multiplicand (place value), otherwise zero Note: need double length result Computer Architecture

Booth’s Algorithm START A←0, Q-1 ← 0 M ← Multiplicand Q ← Multiplier Counter ← n = 10 = 01 Q0, Q-1 A← A - M = 11 = 00 A← A + M Arithmetic Shift Right of A, Q, Q-1 Counter ← Counter-1 No Yes Counter=0? END Computer Architecture

Division More complex than multiplication Negative numbers are really bad! Based on long division Division of Unsigned Binary Integers Quotient Divisor 1011 Dividend 1011 001110 Partial Remainders 1011 001111 1011 Remainder 100 Computer Architecture

Computer Architecture: Real Numbers

Real Numbers Numbers with fractions Could be done in pure binary = =9.625 Where is the binary point? Fixed? Very limited Moving? How do you show where it is? Computer Architecture

Floating Point Biased Exponent Sign bit Mantissa +/- .significand x 2exponent Point is actually fixed between sign bit and body of mantissa. Exponent indicates place value (point position). Computer Architecture

Floating Point Examples
32-bit floating point format 1 bit bits bits S E field Mantissa field (b) Examples of a data representation Sign(S) bit = 0 Exponent(E) field = Mantissa(M) field = Computer Architecture

Signs for Floating Point
Mantissa is stored in 2s complement. Exponent is in excess or biased notation. e.g. Excess (bias) 128 means 8 bit exponent field Pure value range 0-255 Subtract 128 to get correct value Range -128 to +127 Computer Architecture

Normalization FP numbers are usually normalized. i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1. Since it is always 1 there is no need to store it. c.f. Scientific notation where numbers are normalized to give a single digit before the decimal point. e.g x 103 Computer Architecture

FP Ranges For a 32 bit number 8 bit exponent +/  1.5 x 1077 Accuracy The effect of changing lsb of mantissa 23 bit mantissa 2-23  1.2 x 10-7 Computer Architecture

Expressible Numbers Computer Architecture

IEEE 754 Standard for floating point storage 32 and 64 bit standards 8 and 11 bit exponent respectively Computer Architecture

Floating Point Arithmetic
FP Arithmetic +/- Check for zeros Align significands (adjusting exponents) Add or subtract significands Normalize result FP Arithmetic x/ Check for zero Add/subtract exponents Multiply/divide significands (watch sign) Normalize Round All intermediate results should be in double length storage Computer Architecture

Floating Point Multiplication

Computer Architecture: Control Unit

Control Unit Functions of control unit Decoding of an instruction code Generation of control signals for instruction execution Micro-instruction : Control word Micro-program : Set of micro-instructions Routine Groups of micro-instructions for special functions of CPU ex. Fetch cycle routine, Execution cycle routine, Interrupt cycle routine Computer Architecture

Structure of Control Unit
Configuration elements Instruction decoder Control address register: CAR Control memory) : Internal Memory to store the micro programs control buffer register: CBR subroutine register: SBR sequencing module Computer Architecture

Internal Structure of Control Unit
Instruction Register Instruction Decoder Sequencing Module Condition Flags SBR CAR Control Memory Device CBR Decoder Internal Control Signals External Control Signals Computer Architecture

Internal Structure of the Control Memory Device
Example Capacity of CMD = 512 words The first half (Address 0 ~ 63) : Store common routines The second half (Address 64 ~ 127) : Store execution routines of each instruction Fetch Cycle Routine Indirect Cycle Routine Interrupt Cycle Routine Execution Cycle Routine 1 Execution Cycle Routine 2 . 63 64 127 Computer Architecture

Mapping Instruction Code Mapping Function Computer Architecture

Binary Codes and Symbols for Micro Operations(Examples)
Op field 1 Code Micro-operation Symbol None NOP MAR PC PCTAR MAR  IR(addr) IRTAR AC  AC+MBR ADD MBR  M[MAR] READ AC  MBR BRTAC IR  MBR BRTIR M[MAR]  MBR WRITE Computer Architecture

Binary Codes and Symbols for Micro Operations(Examples)
Op field 2 Code Micro-operation Symbol None NOP PC PC INCPC MBR  AC ACTBR MBR  PC PCTBR PC  MBR BRTPC MAR  SP SPTAR AC  AC-MBR SUB PC  IR(addr) IRTPC Computer Architecture

Micro-programming Fetch Cycle Routine ORG O FETCH: PCTAR U JMP NEXT ; MAR <-PC Execution of next instruction READ, INCPC U JMP NEXT ; BR <-M[MAR], PC =PC+1 Execution of next instruction BRTIR U MAP; IR<-MBR Branch to the execution cycle Binary Bit Pattern Computer Architecture

Indirect Cycle Routine
Micro instruction routine of the indirect cycle Binary Bit Pattern Execution of next instruction Execution of next instruction Return to the execution cycle Computer Architecture

Execution Cycle Routine
Instruction Op code Staring address of the routine Computer Architecture

Execution Cycle Routines for each instruction
; Call the indirect cycle routine if I=1 ; Call the indirect cycle routine if I=1 Computer Architecture

Computer Architecture: Memory Devices

Memory Classification
Main memory : Internal memory Auxiliary storage device External memory Computer Architecture

Memory Hierarchy Registers In CPU Internal or Main memory May include one or more levels of cache “RAM” External memory Backing store Computer Architecture

Semiconductor Memory Types

Semiconductor Memory RAM Misnamed as all semiconductor memory is random access Read/Write Volatile Temporary storage Static or dynamic Computer Architecture

Memory Cell Operation Cell Select Control Data In Cell Select Control Sense (a) Write (b) Read Computer Architecture

Dynamic RAM Bits stored as charge in capacitors Charges leak Need refreshing even when powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory Essentially analogue Level of charge determines value Computer Architecture

Refreshing Refresh circuit included on chip Disable chip Count through rows Read & Write back Takes time Slows down apparent performance Computer Architecture

Dynamic RAM Structure Address Line Transistor Storage Capacitor Ground Bit Line B Computer Architecture

DRAM Operation Address line active when bit read or written Transistor switch closed (current flows) Write Voltage to bit line High for 1 low for 0 Then signal address line Transfers charge to capacitor Read Address line selected transistor turns on Charge from capacitor fed via bit line to sense amplifier Compares with reference value to determine 0 or 1 Capacitor charge must be restored Computer Architecture

Typical 16 Mb DRAM (4M x 4) Computer Architecture

Static RAM Bits stored as on/off switches No charges to leak No refreshing needed when powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache Digital Uses flip-flops Computer Architecture

Static RAM Structure dc voltage T3 T4 T5 C2 T6 C1 T1 T2 Ground Bit Line B Address Line Bit Line B Computer Architecture

SRAM and DRAM Both volatile Power needed to preserve data DRAM Simpler to build, smaller More dense Less expensive Needs refresh Larger memory units SRAM Faster Used in cache Computer Architecture

Read Only Memory (ROM) Permanent storage Nonvolatile Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables Computer Architecture

Types of ROM Written during manufacture Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory Erase whole memory electrically Computer Architecture

Packaging Computer Architecture

Design of Memory Device Module
[Example] Design of 1Kx32 bit memory device module using 1K×8 bit RAM chips Method : parallel connection of 4 RAM chips Capacity of module: (1K×8) × 4 = 1K×32 bits = 1K words Address bits(10 bits: A9∼A0) : Common connection to all chips Address area: 000H ∼ 3FFH (H: Hexadecimal) Data Store: 8 bits/chip Computer Architecture

Design of 1K×32 bits Memory Device Module
Address(A9-0) Data Bus(32 bits) Computer Architecture

Design of Memory Device Module(con’t)
[Example] Design of 4Kx8 bit memory device module using 1K×8 bit RAM chips Method : serial connection of 4 RAM chips Capacity of module: (1K×8) × 4 = 4K×8 bits = 4K bytes Address bits(12 bits: A11∼A0) : upper 2 bits : generation of 4 chip select signals using address decoder lower 10 bits : common connection to all chips Address area: 000H ∼ FFFH (H: Hexadecimal) Data Store: 8 bits/address Computer Architecture

Design of 4K×8 bits Memory Device Module
2×4 Decoder Data(D7-0) Computer Architecture

Address Areas of each RAM
RAM Address area Address Area Chip No from to from to from to from to Computer Architecture

Design Procedure of Memory Module
Decision of memory capacity for computer system Chip decision and design of address map Circuit design in detail Computer Architecture

Memory Design for 8-bit Micro Computer
Capacity : 1K bytes RAM, 512 bytes ROM Address: RAM = 0 ~, ROM = 800H ~ Useful chips: 256×8 bits RAM, 512×8 bits ROM Address table Address Area (Hexadecimal) Address bits Memory Chip Computer Architecture

Design Example of Memory Device for 8-bit Micro Computer
Address Data Decoder Computer Architecture

Cache Memory [Wikipedia definition] A cache is a component that improves performance by transparently storing data such that future requests for that data can be served faster Purpose for use: high-speed memory which is installed between CPU and memory to minimize the CPU waiting time because of the speed difference between CPU and memory. Characteristics Use of memory chips which have a higher access speed than that of main memory Small capacity because of the price and limited space CPU Main Memory Cache Computer Architecture

Cache Memory cache hit : data which CPU wants to access already exists in cache cache miss : data which CPU wants to access doesn’t exist in cache Cache hit ratio(H) : The ratio(or percentage) of accesses that result in cache hits is known as the hit ratio of the cache number of times to be hit to cache H = number of times of total memory access Cache miss ratio = (1 - H) Average access time of memory device (Ta) : Ta = H × Tc + (1 - H) × Tm Tc: cache access time, Tm: main memory access time Computer Architecture

Computer Architecture: Cache Memory

So you want fast? It is possible to build a computer which uses only static RAM (see later). This would be very fast. This would need no cache. How can you cache cache? This would cost a very large amount. Computer Architecture

Locality of Reference During the course of the execution of a program, memory references tend to cluster. e.g. loops Computer Architecture

Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module Word Transfer Block Transfer Main Memory CPU Cache Computer Architecture

Cache operation - overview
CPU requests contents of memory location. Check cache for this data. If present, get from cache (fast). If not present, read required block from main memory to cache. Then deliver from cache to CPU. Cache includes tags to identify which block of main memory is in each cache slot. Computer Architecture

Size does matter Cost More cache is expensive. Speed More cache is faster (up to a point). Checking cache for data takes time. Computer Architecture

Typical Cache Organization

Mapping Function Cache of 64kByte Cache block of 4 bytes i.e. cache is 16k (214) lines of 4 bytes 16MBytes main memory 24 bit address (224=16M) Computer Architecture

Direct Mapping Each block of main memory maps to only one cache line. i.e. if a block is in cache, it must be in one specific place Address is in two parts. Least Significant w bits identify unique word. Most Significant s bits specify one memory block. The MSBs are split into a cache line field r and a tag of s-r (most significant). Computer Architecture

Direct Mapping-Address Structure
Tag Field (t) Slot Field (s) Word Field(w) 8 14 2 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line No two blocks in the same line have the same Tag field. Check contents of cache by finding line and checking Tag. 159

Direct Mapping - Cache Slot Table
Cache Slot Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1 Computer Architecture

Direct Mapping Cache Organization
Memory Address Cache Data Tag Slot Word Tag Slot(0) Slot(i) Comparator (Cache hit) Slot(m-1) (Cache miss) Main Memory Computer Architecture

Direct Mapping Summary
Address length = (t+ s + w) bits Number of addressable units = 2s+w words or bytes Block size = 2w words or bytes Number of blocks in main memory = 2t+s+w/2w = 2t+s Number of slots in cache = m = 2s Size of tag = t bits Computer Architecture

Direct Mapping Characteristics
Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high. Computer Architecture

Associative Mapping A main memory block can load into any line of cache. Memory address is interpreted as tag and word Tag uniquely identifies block of memory. Every line’s tag is examined for a match. Cache searching gets expensive. Computer Architecture

Fully Associative Cache Organization
Tag Field Word Field Memory Address Cache Tag Word Tag Data Slot(0) Slot(i) Comparator (Cache hit) Slot(m-1) (Cache miss) Main Memory Computer Architecture

Associative Mapping Example
Address Tag Word Data Tag data slot # 5 bits bits Cache(32 bytes) Main Memory (128 bytes) Computer Architecture

Associative Mapping-Address Structure
Word 2 bit Tag 5 bit 5 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block Computer Architecture

Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Slot size = 2w words or bytes Number of tags in main memory = 2t+ w/2w = 2t Number of slots in cache = undetermined Size of tag = t bits Computer Architecture

Set Associative Mapping
Cache is divided into a number of sets. Each set contains a number of lines. A given block maps to any line in a given set. e.g. Block B can be in any line of set i. e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set. Tag Field Set Field Word Field Computer Architecture

Set Associative Mapping Example
3 2 2 Tag Set Word Cache Tag Data Memory Address Slot(0) Slot(1) Set(0) Tag Set Word Slot(0) Slot(1) Set(i) Comparator Slot(0) Slot(1) Set(m-1) (Cache hit) (Cache miss) Main Memory Computer Architecture

Set Associative Mapping -Address Structure
Tag 9 bit Set 13 bit Word 2 bit Use set field to determine cache set to look in. Compare tag field to see if we have a hit. e.g Address Tag Data Set number 1FF 7FFC 1FF FFF 001 7FFC FFF Computer Architecture

Set Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s – d) bits Computer Architecture

Pentium 4 Cache 80386 – no on chip cache 80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions Pentium 4 – L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative Computer Architecture

Pentium 4 Core Processor
Fetch/Decode Unit Fetches instructions from L2 cache Decode into micro-ops Store micro-ops in L1 cache Out of order execution logic Schedules micro-ops Based on data dependence and resources May speculatively execute Execution units Execute micro-ops Data from L1 cache Results in registers Memory subsystem L2 cache and systems bus Computer Architecture

Pentium 4 Design Decodes instructions into RISC like micro-ops before L1 cache Micro-ops fixed length Superscalar pipelining and scheduling Pentium instructions long & complex Performance improved by separating decoding from scheduling & pipelining (More later – ch14) Data cache is write back Can be configured to write through L1 cache controlled by 2 bits in register CD = cache disable NW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate Computer Architecture

DRAM Synchronous DRAM (SDRAM) Add a clock signal to DRAM interface, so that the repeated transfers would not bear overhead to synchronize with DRAM controller Double Data Rate (DDR SDRAM) Transfer data on both the rising edge and falling edge of the DRAM clock signal  doubling the peak data rate DDR2 lowers power by dropping the voltage from 2.5 to 1.8 volts + offers higher clock rates: up to 400 MHz DDR3 drops to 1.5 volts + higher clock rates: up to 800 MHz Improved Bandwidth, not Latency Computer Architecture

DRAM Standard Clock Rate (MHz) M transfers / second DRAM Name Mbytes/s/ DIMM DIMM Name DDR 133 266 DDR266 2128 PC2100 150 300 DDR300 2400 PC2400 200 400 DDR400 3200 PC3200 DDR2 533 DDR2-533 4264 PC4300 333 667 DDR2-667 5336 PC5300 800 DDR2-800 6400 PC6400 DDR3 1066 DDR3-1066 8528 PC8500 666 1333 DDR3-1333 10664 PC10700 1600 DDR3-1600 12800 PC12800 x 2 x 8 Computer Architecture

Error Correction Motivation: Failures/time proportional to number of bits! As DRAM cells shrink, more vulnerable Went through period in which failure rate was low enough without error correction that people didn’t do correction DRAM banks too large now Servers always corrected memory systems Basic idea: add redundancy through parity bits Common configuration: Random error correction SEC-DED (single error correct, double error detect) One example: 64 data bits + 8 parity bits (11% overhead) Really want to handle failures of physical components as well Organization is multiple DRAMs/DIMM, multiple DIMMs Want to recover from failed DRAM and failed DIMM! “Chip kill” handle failures width of single DRAM chip Computer Architecture

Computer Architecture

Similar presentations

Presentation on theme: "Computer Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Architecture

Similar presentations

Presentation on theme: "Computer Architecture"— Presentation transcript:

Similar presentations

About project

Feedback