Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture

Similar presentations


Presentation on theme: "Computer Architecture"— Presentation transcript:

1 Computer Architecture
Chang-Bum Lee Dept. of Computer Engineering Youngsan University Computer Architecture

2 Computer Architecture
Course Content(1) Lecture #1 Course Overview Course Contents Course Schedule\ Grading Guidelines Test and Assignments Lecture #2 Basic Architecture of Computer Basic Architecture System Configuration Lecture # 3 Instruction Execution Fetch Cycle Execution Cycle Interrupt Cycle Computer Architecture 2

3 Computer Architecture
Course Content(2) Lecture #4, 5 Instruction Set Program Control Instruction Formats Addressing Modes Pentium Processors Lecture #6, 7 Arithmetic and Logical Operations Arithmetic and Logical Unit Integer Representation Logic Operations Shift Operations Arithmetic Operations of Integer (Addition, Subtraction, Multiplication, and Division) Computer Architecture 2

4 Computer Architecture
Course Content(3) Lecture #8, 9 Real Numbers Representation of Floating Point Numbers Arithmetic Operations of Floating Point Numbers (Addition, Subtraction, Multiplication, and Division) Lecture #10 Control Unit Structure of Control Unit Microinstruction Microprogram Lecture #11 Memory Devices Memory Hierarchy RAM ROM Design of Memory Device Modules Computer Architecture 2

5 Computer Architecture
Course Content(4) Lecture #12 Cache Memory Cache Size Fetch Method Mapping Computer Architecture 2

6 Computer Architecture: Course Overview
Lecture #1 Computer Architecture

7 Computer Architecture
Course Objectives Understand role & relationship of hardware and software Exposure to. . . Machine organization Assembly language programming C programming Able to actually build entire (slow) computing system Hardware and software Be distinguished from mere programmers Computer Architecture 2

8 Computer Architecture
Course Schedule The complete course, including Lectures and Seminars, will be covered in 90 hours(15 weeks). The total duration of the course will be 4 months. Lecture 3 hours (2 hours + 1 hour) weekly Computer Architecture 2

9 Computer Architecture
Grading Guidelines Attendance : 20% Depending on students class participation Final Exam : 40% Textbook based in class final exam Midterm Exam : 30% Textbook based in class mid-term exam Assignments : 10% Based on submitting assignments Computer Architecture 2

10 Computer Architecture
Course References Computer Architecture Computer Architecture/Jong-Hyun Kim By Sang Lung Publishing Corp. The course slides will be available at Computer Architecture 2

11 Computer Architecture
Course Summary Introduction to computer architecture How is data represented? What are the pieces of a computer? How do computers work? Programming How do I "talk" directly to the machine? How do I program in C? Computer Systems and Computation How do simple HW/SW elements come together to realize complex computations? Computer Architecture 2

12 Computer Architecture: Basic Architecture
Lecture #2 Computer Architecture

13 Introduction - Architecture (1)
Architecture is those attributes visible to the programmer Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques. e.g. Is there a multiply instruction? Organization is how features are implemented Control signals, interfaces, memory technology. e.g. Is there a hardware multiply unit or is it done by repeated addition? Computer Architecture

14 Introduction - Architecture (2)
All Intel x86 family share the same basic architecture. The IBM System/370 family share the same basic architecture. This gives code compatibility. At least backwards Organization differs between different versions. Computer Architecture

15 Computer Architecture
Structure & Function Structure is the way in which components relate to each other. Function is the operation of individual components as part of the structure. All computer functions are: Data processing Data storage Data movement Control Computer Architecture

16 Computer Architecture
ENIAC Electronic Numerical Integrator And Computer Eckert and Mauchly in University of Pennsylvania Trajectory tables for weapons Started 1943, Finished 1946 Too late for war effort Used until 1955 Decimal (not binary) 20 accumulators of 10 digits Programmed manually by switches 18,000 vacuum tubes, 30 tons 15,000 square feet 140 kW power consumption 5,000 additions per second Computer Architecture

17 Structure of von Neumann Machine
Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Princeton Institute for Advanced Studies IAS Completed 1952 Computer Architecture

18 Transistor Based Computers
Transistors Replaced vacuum tubes Smaller Cheaper Less heat dissipation Solid State device Made from Silicon (Sand) Invented 1947 at Bell Labs William Shockley et al. Transistor based computers Second generation machines NCR & RCA produced small transistor machines IBM 7000, DEC Produced PDP-1 Computer Architecture

19 Speeding It Up & Performance Mismatch
Pipelining On board cache(L1 & L2 cache) Branch prediction Data flow analysis Speculative execution Performance Mismatch Processor speed increased Memory capacity increased Memory speed lags behind processor speed Computer Architecture

20 Computer Architecture
Solutions Increase number of bits retrieved at one time. Make DRAM “wider” rather than “deeper” Change DRAM interface. Cache Reduce frequency of memory access. More complex cache and cache on chip Increase interconnection bandwidth. High speed buses Hierarchy of buses Computer Architecture

21 Computer Architecture
Program Concept Hardwired systems are inflexible. General purpose hardware can do different tasks, given correct control signals. Instead of re-wiring, supply a new set of control signals. A sequence of steps For each step, an arithmetic or logical operation is done. For each operation, a different set of control signals is needed. Computer Architecture

22 Computer Architecture
Computer Components The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit. Data and instructions need to get into the system and results out. Input/output Temporary storage of code and results is needed. Main memory Computer Architecture

23 Computer Architecture: CPU Structures and Functions
Lecture #3 Computer Architecture

24 Computer Architecture
CPU Structure Registers ALU Control Unit CPU Internal Bus Address Bus Data System Bus CPU must: Fetch instructions Interpret instructions Fetch data, process data, and write data Registers CPU must have some working space (temporary storage) Number and function vary between processor designs One of the major design decisions Top level of memory hierarchy Control Unit Control unit coordinates sequence of execution steps ALU ALU performs arithmetic and logical processing Computer Architecture

25 Computer Architecture
CPU Structure Software Instruction Set Hardware Computer Architecture

26 Computer Architecture
Fetch Cycle(1) Program Counter (PC) holds address of next instruction to fetch. Processor fetches instruction from memory location pointed to by PC. Increment PC Unless told otherwise Instruction loaded into Instruction Register (IR) to: MAR <- PC t1: MBR <-M[MAR], PC <- PC+1 t2: IR <-MBR Processor interprets instruction and performs required actions Computer Architecture

27 Computer Architecture
Fetch Cycle(2) Micro operation to: MAR <- PC t1: MBR <-M[MAR], PC <- PC+1 t2: IR <-MBR Address and Instruction Flow in fetch cycle Address Bus Data Bus Control Bus Memory Devices Control Unit Computer Architecture

28 Computer Architecture
Execute Cycle(1) Processor-memory data transfer between CPU and main memory Processor I/O Data transfer between CPU and I/O module Data processing Some arithmetic or logical operation on data Control Alteration of sequence of operations e.g. jump Combination of above Computer Architecture

29 Computer Architecture
Execute Cycle(2) Example LOAD addr : to: MAR <- IR(addr) t1: MBR <-M[MAR] t2: AC <-MBR STA addr ADD addr Address Bus Data Bus Control Bus Memory Devices Control Unit Computer Architecture

30 Computer Architecture
Interrupt Cycle Added to instruction cycle Processor checks for interrupt Indicated by an interrupt signal If no interrupt, fetch next instruction If interrupt pending: Suspend execution of current program Save context Set PC to start address of interrupt handler routine Process interrupt Restore context and continue interrupted program Computer Architecture

31 Multiple Interrupts(1)
Disable interrupts Processor will ignore further interrupts while processing one interrupt Interrupts remain pending and are checked after first interrupt has been processed Interrupts handled in sequence as they occur Computer Architecture

32 Multiple Interrupts(2)
Define priorities - Low priority interrupts can be interrupted by higher priority interrupts - When higher priority interrupt has been processed, processor returns to previous interrupt Main Program Computer Architecture

33 Computer Architecture
Indirect Cycle May require memory access to fetch operands Indirect addressing requires more memory accesses Can be thought of as additional instruction subcycle Computer Architecture

34 Computer Architecture
Prefetch Fetch accessing main memory Execution usually does not access main memory Can fetch next instruction during execution of current instruction Called instruction prefetch Computer Architecture 36

35 Computer Architecture
Improved Performance But not doubled: Fetch usually shorter than execution Prefetch more than one instruction? Any jump or branch means that prefetched instructions are not the required instructions Add more stages to improve performance Computer Architecture 37

36 Computer Architecture
Pipelining Fetch instruction Decode instruction Calculate operands (i.e. EAs) Fetch operands Execute instructions Write result Overlap these operations Computer Architecture 38

37 Two Stage Instruction Pipeline
Fetch Execute Instruction Result (a) Simplified View (b) Expanded View Discard New Address Wait Computer Architecture

38 Computer Architecture
Memory Connection Receives and sends data Receives addresses (of locations) Receives control signals Read Write Timing Computer Architecture

39 Input/Output Connection
Similar to memory from computer’s viewpoint Output Receive data from computer Send data to peripheral Input Receive data from peripheral Send data to computer Receive control signals from computer Send control signals to peripherals e.g. spin disk Receive addresses from computer e.g. port number to identify peripheral Send interrupt signals (control) Computer Architecture

40 Computer Architecture
CPU Connection Reads instruction and data Writes out data (after processing) Sends control signals to other units Receives (& acts on) interrupts Buses There are a number of possible interconnection systems Single and multiple BUS structures are most common e.g. Control/Address/Data bus (PC) e.g. Unibus (DEC-PDP) Computer Architecture

41 Computer Architecture
What is a Bus? A communication pathway connecting two or more devices Usually broadcast Often grouped A number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels. Power lines may not be shown Computer Architecture

42 Data Bus and Address Bus
Carries data Remember that there is no difference between “data” and “instruction” at this level Width is a key determinant of performance 8, 16, 32, 64 bit Address Bus Identify the source or destination of data e.g. CPU needs to read an instruction (data) from a given location in memory Bus width determines maximum memory capacity of system e.g has 16 bit address bus giving 64k address space Computer Architecture

43 Computer Architecture
Control Bus Control and timing information Memory read/write signal Interrupt request Clock signals Computer Architecture

44 Computer Architecture
Single Bus Problems Lots of devices on one bus leads to: Propagation delays Long data paths mean that co-ordination of bus use can adversely affect performance. If aggregate data transfer approaches bus capacity. Most systems use multiple buses to overcome these problems. Computer Architecture

45 Bus Types and Arbitration
Dedicated Separate data & address lines Multiplexed Shared lines Address valid or data valid control line Advantage - fewer lines Disadvantages More complex control Ultimate performance Bus Arbitration More than one module controlling the bus e.g. CPU and DMA controller Only one module may control bus at one time Arbitration may be centralised or distributed Computer Architecture

46 Computer Architecture
Timing Co-ordination of events on bus Synchronous Events determined by clock signals Control Bus includes clock line A single 1-0 is a bus cycle All devices can read clock line Usually sync on leading edge Usually a single cycle for an event Asynchronous Read, Write Computer Architecture

47 Memory Hierarchy & Physical Types
Registers Exist In CPU Internal or Main memory May include one or more levels of cache Mainly “RAM” External memory Backing store Physical Types Semiconductor types are mainly RAM Magnetic types are Disk & Tape Optical types are CD & DVD Others are Bubble, Hologram, etc. Computer Architecture

48 Computer Architecture
Performance Access time Time between presenting the address and getting the valid data Memory Cycle time Time may be required for the memory to “recover” before next access. Cycle time is access + recovery. Transfer Rate Rate at which data can be moved. Computer Architecture

49 Instruction Representation
In machine code each instruction has a unique bit pattern. For human consumption (well, programmers anyway) a symbolic representation is used. e.g. ADD, SUB, LOAD Operands can also be represented in this way. ADD A,B Computer Architecture 5

50 Computer Architecture: Instruction Types and Addressing Modes
Lecture #4, #5 Computer Architecture

51 Instruction Format and Types
Simple Instruction Format Instruction Types Data processing Data storage (main memory) Data movement (I/O) Program flow control 4 bits Opcode Operand Reference 6 bits 16 bits Computer Architecture

52 Computer Architecture
Number of Addresses (1) 3 addresses Operand 1, Operand 2, Result a = b + c; May be a forth - next instruction (usually implicit) Not common Needs very long words to hold everything Computer Architecture 7

53 Computer Architecture
Number of Addresses (2) 2 addresses One address doubles as operand and result. a = a + b Reduces length of instruction Requires some extra work Temporary storage to hold some results 1 address Implicit second address Usually a register (accumulator) Common on early machines Computer Architecture 8

54 Computer Architecture
Number of Addresses (3) 0 (zero) addresses All addresses implicit Uses a stack e.g. push a push b add pop c c = a + b Computer Architecture 10

55 Computer Architecture
Design Decisions (1) Operation repertoire How many ops? What can they do? How complex are they? Data types Instruction formats Length of op code field Number of addresses Computer Architecture 12

56 Computer Architecture
Addressing Modes Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack Computer Architecture 2

57 Computer Architecture
Immediate Addressing Operand is part of instruction Operand = address field e.g. ADD 5 Add 5 to contents of accumulator 5 is operand No memory reference to fetch data Fast Limited range Computer Architecture 3

58 Immediate Addressing Diagram
Instruction Opcode Operand Computer Architecture 4

59 Computer Architecture
Direct Addressing Address field contains address of operand. Effective address (EA) = address field (A) e.g. ADD A Add contents of address A to accumulator Single memory reference to access data No additional calculations to work out effective address Limited address space Computer Architecture 5

60 Direct Addressing Diagram
Address A Opcode Instruction Operand Memory Computer Architecture 6

61 Computer Architecture
Indirect Addressing Memory cell pointed to by address field contains the address of (pointer to) the operand. EA = (A) Look in A, find address (A) and look there for operand. e.g. ADD (A) Add contents of cell pointed to by contents of A to accumulator. Large address space 2n where n = word length May be nested, multilevel, cascaded e.g. EA = (((A))) Draw the diagram yourself Multiple memory accesses to find operand Hence slower Computer Architecture 7

62 Indirect Addressing Diagram
Instruction Opcode Address A Memory Pointer to operand Operand Computer Architecture 9

63 Register Addressing (1)
Operand is held in register named in address filed. EA = R Limited number of registers Very small address field needed Shorter instructions Faster instruction fetch Computer Architecture 10

64 Register Addressing (2)
No memory access Very fast execution Very limited address space Multiple registers helps performance Requires good assembly programming or compiler writing N.B. C programming register int a; c.f. Direct addressing Computer Architecture 11

65 Register Addressing Diagram
Instruction Opcode Register Address R Registers Operand Computer Architecture 12

66 Register Indirect Addressing
C.f. indirect addressing EA = (R) Operand is in memory cell pointed to by contents of register R Large address space (2n) One fewer memory access than indirect addressing Computer Architecture 13

67 Register Indirect Addressing Diagram
Instruction Opcode Register Address R Memory Registers Pointer to Operand Operand Computer Architecture 14

68 Displacement Addressing
EA = A + (R) Address field hold two values A = base value R = register that holds displacement or vice versa Computer Architecture 15

69 Displacement Addressing Diagram
Instruction Opcode Register R Address A Memory Registers Pointer to Operand Operand + Computer Architecture 16

70 Computer Architecture
Relative Addressing A version of displacement addressing R = Program counter, PC EA = A + (PC) i.e. get operand from A cells from current location pointed to by PC c.f locality of reference & cache usage Computer Architecture 17

71 Base-Register Addressing
A holds displacement R holds pointer to base address R may be explicit or implicit e.g. segment registers in 80x86 Computer Architecture 18

72 Computer Architecture
Indexed Addressing A = base R = displacement EA = A + R Good for accessing arrays R++ Computer Architecture 19

73 Computer Architecture
Combinations Postindex EA = (A) + (R) Preindex EA = (A+(R)) (Draw the diagrams) Computer Architecture 20

74 Computer Architecture
Stack Addressing Operand is (implicitly) on top of stack e.g. ADD Pop top two items from stack and add Computer Architecture 21

75 Pentium Addressing Modes
Virtual or effective address is offset into segment. Starting address plus offset gives linear address. This goes through page translation if paging enabled. 12 addressing modes available Immediate Register operand Displacement Base Base with displacement Scaled index with displacement Base with index and displacement Base scaled index with displacement Relative Computer Architecture

76 Computer Architecture
Instruction Types Instruction generally four types. Data processing Data storage (main memory) Data movement (I/O) Program flow control Computer Architecture 6

77 Computer Architecture
Design Decisions (1) Operation repertoire How many ops? What can they do? How complex are they? Data types Instruction formats Length of op code field Number of addresses Computer Architecture 12

78 Computer Architecture
Design Decisions (2) Registers Number of CPU registers available Which operations can be performed on which registers? Addressing modes (later…) RISC v CISC Computer Architecture 13

79 Computer Architecture
Types of Operation There are several types of operations as follows. Data Transfer Arithmetic Logical Conversion I/O System Control Transfer of Control Computer Architecture 18

80 Computer Architecture
Arithmetic Arithmetic operations include Add, Subtract, Multiply, Divide. Can use signed integer. Can arithmetic operations process floating point ? May include. Increment (a++) Decrement (a--) Negate (-a) Computer Architecture 20

81 Shift and Rotate Operations
Logical right shift Logical left shift Arithmetic right shift Arithmetic left shift Right rotate Left rotate Computer Architecture

82 Logical and Conversion
Has bitwise operations. Logical operations are AND, OR, NOT, etc. Conversion E.g. Binary to Decimal Computer Architecture 21

83 Computer Architecture
Input/Output May be specific instructions. May be done using data movement instructions. (memory mapped) May be done by a separate controller (DMA). Computer Architecture 23

84 Computer Architecture
Transfer of Control Branch e.g. branch to x if result is zero Skip e.g. increment and skip if zero ISZ Register1: Skip if zero Branch xxxx Subroutine call c.f. interrupt call: jump to interrupt service routine Computer Architecture 25

85 Computer Architecture
Branch Instruction Unconditional Branch Jump to 211 unconditionally. Conditional Branch 1 Jump to 211 if accumulator is zero. Conditional Branch 2 Jump to 235 if R1 equals to R2. Computer Architecture

86 Nested Procedure Calls
If a main program calls procedure 1, it goes to Proc.1 and it’s procedure is processed. If the Proc.1 calls another procedure(Proc.2), it goes to Proc.2 and it’s procedure is processed. If Proc.2 meets RETURN instruction, it returns to Proc.1. Computer Architecture

87 Computer Architecture: Arithmetic and Logical Operations of Computer
Lecture #6, #7 Computer Architecture

88 Arithmetic & Logic Unit
Does the calculations. Everything else in the computer is there to service this unit. Handles integers. May handle floating point (real) numbers. May be separate FPU (maths co-processor). Computer Architecture

89 Integer Representation
Only have 0 & 1 to represent everything Positive numbers stored in binary e.g. 41= Has no minus sign Has no period Has sign-magnitude Use one’s or two’s compliment Computer Architecture

90 Computer Architecture
Sign-Magnitude Left most bit is sign bit. 0 means positive. 1 means negative. +18 = -18 = Problems Need to consider both sign and magnitude in arithmetic Two representations of zero (+0 and -0) Computer Architecture

91 Computer Architecture
Two’s Compliment +3 = , +2 = +1 = , +0 = -1 = , -2 = -3 = Benefits Two’s compliment has one representation of zero. Arithmetic works easily (see later). Negating is fairly easy. 3 = Boolean complement gives Add 1 to LSB Computer Architecture

92 Computer Architecture
Logical Operations AND, OR, XOR, NOT Selective-set, Selective-complement Masking, Insert, Compare Bitwise operations Logical Shift Circular Shift Arithmetic Shift Shift with Carry Computer Architecture 21

93 Shift and Rotate Operations
Computer Architecture

94 Addition and Subtraction
Normal binary addition Monitor sign bit for overflow Take two’s compliment of substahend and add to minuend. i.e. a - b = a + (-b) So we only need addition and complement circuits. Computer Architecture

95 Hardware for Addition and Subtraction
B Register Complementer SW Adder A Register OF OF: overflow bit SW: Switch (select addition or subtraction) Computer Architecture

96 Computer Architecture
Multiplication Is complex Work out partial product for each digit Take care with place value (column) Add partial products Computer Architecture

97 Multiplication Example
Multiplicand (11 dec) x Multiplier (13 dec) Partial products 0000 1011 Product (143 dec) Note: if multiplier bit is 1, copy multiplicand (place value), otherwise zero Note: need double length result Computer Architecture

98 Computer Architecture
Booth’s Algorithm START A←0, Q-1 ← 0 M ← Multiplicand Q ← Multiplier Counter ← n = 10 = 01 Q0, Q-1 A← A - M = 11 = 00 A← A + M Arithmetic Shift Right of A, Q, Q-1 Counter ← Counter-1 No Yes Counter=0? END Computer Architecture

99 Computer Architecture
Division More complex than multiplication Negative numbers are really bad! Based on long division Division of Unsigned Binary Integers Quotient Divisor 1011 Dividend 1011 001110 Partial Remainders 1011 001111 1011 Remainder 100 Computer Architecture

100 Computer Architecture: Real Numbers
Lecture #8, #9 Computer Architecture

101 Computer Architecture
Real Numbers Numbers with fractions Could be done in pure binary = =9.625 Where is the binary point? Fixed? Very limited Moving? How do you show where it is? Computer Architecture

102 Computer Architecture
Floating Point Biased Exponent Sign bit Mantissa +/- .significand x 2exponent Point is actually fixed between sign bit and body of mantissa. Exponent indicates place value (point position). Computer Architecture

103 Floating Point Examples
32-bit floating point format 1 bit bits bits S E field Mantissa field (b) Examples of a data representation Sign(S) bit = 0 Exponent(E) field = Mantissa(M) field = Computer Architecture

104 Signs for Floating Point
Mantissa is stored in 2s complement. Exponent is in excess or biased notation. e.g. Excess (bias) 128 means 8 bit exponent field Pure value range 0-255 Subtract 128 to get correct value Range -128 to +127 Computer Architecture

105 Computer Architecture
Normalization FP numbers are usually normalized. i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1. Since it is always 1 there is no need to store it. c.f. Scientific notation where numbers are normalized to give a single digit before the decimal point. e.g x 103 Computer Architecture

106 Computer Architecture
FP Ranges For a 32 bit number 8 bit exponent +/  1.5 x 1077 Accuracy The effect of changing lsb of mantissa 23 bit mantissa 2-23  1.2 x 10-7 Computer Architecture

107 Computer Architecture
Expressible Numbers Computer Architecture

108 Computer Architecture
IEEE 754 Standard for floating point storage 32 and 64 bit standards 8 and 11 bit exponent respectively Computer Architecture

109 Floating Point Arithmetic
FP Arithmetic +/- Check for zeros Align significands (adjusting exponents) Add or subtract significands Normalize result FP Arithmetic x/ Check for zero Add/subtract exponents Multiply/divide significands (watch sign) Normalize Round All intermediate results should be in double length storage Computer Architecture

110 Floating Point Multiplication
Computer Architecture

111 Computer Architecture: Control Unit
Lecture #10 Computer Architecture

112 Computer Architecture
Control Unit Functions of control unit Decoding of an instruction code Generation of control signals for instruction execution Micro-instruction : Control word Micro-program : Set of micro-instructions Routine Groups of micro-instructions for special functions of CPU ex. Fetch cycle routine, Execution cycle routine, Interrupt cycle routine Computer Architecture

113 Structure of Control Unit
Configuration elements Instruction decoder Control address register: CAR Control memory) : Internal Memory to store the micro programs control buffer register: CBR subroutine register: SBR sequencing module Computer Architecture

114 Internal Structure of Control Unit
Instruction Register Instruction Decoder Sequencing Module Condition Flags SBR CAR Control Memory Device CBR Decoder Internal Control Signals External Control Signals Computer Architecture

115 Internal Structure of the Control Memory Device
Example Capacity of CMD = 512 words The first half (Address 0 ~ 63) : Store common routines The second half (Address 64 ~ 127) : Store execution routines of each instruction Fetch Cycle Routine Indirect Cycle Routine Interrupt Cycle Routine Execution Cycle Routine 1 Execution Cycle Routine 2 . 63 64 127 Computer Architecture

116 Computer Architecture
Mapping Instruction Code Mapping Function Computer Architecture

117 Binary Codes and Symbols for Micro Operations(Examples)
Op field 1 Code Micro-operation Symbol None NOP MAR PC PCTAR MAR  IR(addr) IRTAR AC  AC+MBR ADD MBR  M[MAR] READ AC  MBR BRTAC IR  MBR BRTIR M[MAR]  MBR WRITE Computer Architecture

118 Binary Codes and Symbols for Micro Operations(Examples)
Op field 2 Code Micro-operation Symbol None NOP PC PC INCPC MBR  AC ACTBR MBR  PC PCTBR PC  MBR BRTPC MAR  SP SPTAR AC  AC-MBR SUB PC  IR(addr) IRTPC Computer Architecture

119 Computer Architecture
Micro-programming Fetch Cycle Routine ORG O FETCH: PCTAR U JMP NEXT ; MAR <-PC Execution of next instruction READ, INCPC U JMP NEXT ; BR <-M[MAR], PC =PC+1 Execution of next instruction BRTIR U MAP; IR<-MBR Branch to the execution cycle Binary Bit Pattern Computer Architecture

120 Indirect Cycle Routine
Micro instruction routine of the indirect cycle Binary Bit Pattern Execution of next instruction Execution of next instruction Return to the execution cycle Computer Architecture

121 Execution Cycle Routine
Instruction Op code Staring address of the routine Computer Architecture

122 Execution Cycle Routines for each instruction
; Call the indirect cycle routine if I=1 ; Call the indirect cycle routine if I=1 Computer Architecture

123 Computer Architecture: Memory Devices
Lecture #11 Computer Architecture

124 Memory Classification
Main memory : Internal memory Auxiliary storage device External memory Computer Architecture

125 Computer Architecture
Memory Hierarchy Registers In CPU Internal or Main memory May include one or more levels of cache “RAM” External memory Backing store Computer Architecture

126 Semiconductor Memory Types
Computer Architecture

127 Computer Architecture
Semiconductor Memory RAM Misnamed as all semiconductor memory is random access Read/Write Volatile Temporary storage Static or dynamic Computer Architecture

128 Computer Architecture
Memory Cell Operation Cell Select Control Data In Cell Select Control Sense (a) Write (b) Read Computer Architecture

129 Computer Architecture
Dynamic RAM Bits stored as charge in capacitors Charges leak Need refreshing even when powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory Essentially analogue Level of charge determines value Computer Architecture

130 Computer Architecture
Refreshing Refresh circuit included on chip Disable chip Count through rows Read & Write back Takes time Slows down apparent performance Computer Architecture

131 Computer Architecture
Dynamic RAM Structure Address Line Transistor Storage Capacitor Ground Bit Line B Computer Architecture

132 Computer Architecture
DRAM Operation Address line active when bit read or written Transistor switch closed (current flows) Write Voltage to bit line High for 1 low for 0 Then signal address line Transfers charge to capacitor Read Address line selected transistor turns on Charge from capacitor fed via bit line to sense amplifier Compares with reference value to determine 0 or 1 Capacitor charge must be restored Computer Architecture

133 Computer Architecture
Typical 16 Mb DRAM (4M x 4) Computer Architecture

134 Computer Architecture
Static RAM Bits stored as on/off switches No charges to leak No refreshing needed when powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache Digital Uses flip-flops Computer Architecture

135 Computer Architecture
Static RAM Structure dc voltage T3 T4 T5 C2 T6 C1 T1 T2 Ground Bit Line B Address Line Bit Line B Computer Architecture

136 Computer Architecture
SRAM and DRAM Both volatile Power needed to preserve data DRAM Simpler to build, smaller More dense Less expensive Needs refresh Larger memory units SRAM Faster Used in cache Computer Architecture

137 Computer Architecture
Read Only Memory (ROM) Permanent storage Nonvolatile Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables Computer Architecture

138 Computer Architecture
Types of ROM Written during manufacture Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory Erase whole memory electrically Computer Architecture

139 Computer Architecture
Packaging Computer Architecture

140 Design of Memory Device Module
[Example] Design of 1Kx32 bit memory device module using 1K×8 bit RAM chips Method : parallel connection of 4 RAM chips Capacity of module: (1K×8) × 4 = 1K×32 bits = 1K words Address bits(10 bits: A9∼A0) : Common connection to all chips Address area: 000H ∼ 3FFH (H: Hexadecimal) Data Store: 8 bits/chip Computer Architecture

141 Design of 1K×32 bits Memory Device Module
Address(A9-0) Data Bus(32 bits) Computer Architecture

142 Design of Memory Device Module(con’t)
[Example] Design of 4Kx8 bit memory device module using 1K×8 bit RAM chips Method : serial connection of 4 RAM chips Capacity of module: (1K×8) × 4 = 4K×8 bits = 4K bytes Address bits(12 bits: A11∼A0) : upper 2 bits : generation of 4 chip select signals using address decoder lower 10 bits : common connection to all chips Address area: 000H ∼ FFFH (H: Hexadecimal) Data Store: 8 bits/address Computer Architecture

143 Design of 4K×8 bits Memory Device Module
2×4 Decoder Data(D7-0) Computer Architecture

144 Address Areas of each RAM
RAM Address area Address Area Chip No from to from to from to from to Computer Architecture

145 Design Procedure of Memory Module
Decision of memory capacity for computer system Chip decision and design of address map Circuit design in detail Computer Architecture

146 Memory Design for 8-bit Micro Computer
Capacity : 1K bytes RAM, 512 bytes ROM Address: RAM = 0 ~, ROM = 800H ~ Useful chips: 256×8 bits RAM, 512×8 bits ROM Address table Address Area (Hexadecimal) Address bits Memory Chip Computer Architecture

147 Design Example of Memory Device for 8-bit Micro Computer
Address Data Decoder Computer Architecture

148 Computer Architecture
Cache Memory [Wikipedia definition] A cache is a component that improves performance by transparently storing data such that future requests for that data can be served faster Purpose for use: high-speed memory which is installed between CPU and memory to minimize the CPU waiting time because of the speed difference between CPU and memory. Characteristics Use of memory chips which have a higher access speed than that of main memory Small capacity because of the price and limited space CPU Main Memory Cache Computer Architecture

149 Computer Architecture
Cache Memory cache hit : data which CPU wants to access already exists in cache cache miss : data which CPU wants to access doesn’t exist in cache Cache hit ratio(H) : The ratio(or percentage) of accesses that result in cache hits is known as the hit ratio of the cache number of times to be hit to cache H = number of times of total memory access Cache miss ratio = (1 - H) Average access time of memory device (Ta) : Ta = H × Tc + (1 - H) × Tm Tc: cache access time, Tm: main memory access time Computer Architecture

150 Computer Architecture: Cache Memory
Lecture #12 Computer Architecture

151 Computer Architecture
So you want fast? It is possible to build a computer which uses only static RAM (see later). This would be very fast. This would need no cache. How can you cache cache? This would cost a very large amount. Computer Architecture

152 Computer Architecture
Locality of Reference During the course of the execution of a program, memory references tend to cluster. e.g. loops Computer Architecture

153 Computer Architecture
Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module Word Transfer Block Transfer Main Memory CPU Cache Computer Architecture

154 Cache operation - overview
CPU requests contents of memory location. Check cache for this data. If present, get from cache (fast). If not present, read required block from main memory to cache. Then deliver from cache to CPU. Cache includes tags to identify which block of main memory is in each cache slot. Computer Architecture

155 Computer Architecture
Size does matter Cost More cache is expensive. Speed More cache is faster (up to a point). Checking cache for data takes time. Computer Architecture

156 Typical Cache Organization
Computer Architecture

157 Computer Architecture
Mapping Function Cache of 64kByte Cache block of 4 bytes i.e. cache is 16k (214) lines of 4 bytes 16MBytes main memory 24 bit address (224=16M) Computer Architecture

158 Computer Architecture
Direct Mapping Each block of main memory maps to only one cache line. i.e. if a block is in cache, it must be in one specific place Address is in two parts. Least Significant w bits identify unique word. Most Significant s bits specify one memory block. The MSBs are split into a cache line field r and a tag of s-r (most significant). Computer Architecture

159 Direct Mapping-Address Structure
Tag Field (t) Slot Field (s) Word Field(w) 8 14 2 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line No two blocks in the same line have the same Tag field. Check contents of cache by finding line and checking Tag. 159

160 Direct Mapping - Cache Slot Table
Cache Slot Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1 Computer Architecture

161 Direct Mapping Cache Organization
Memory Address Cache Data Tag Slot Word Tag Slot(0) Slot(i) Comparator (Cache hit) Slot(m-1) (Cache miss) Main Memory Computer Architecture

162 Direct Mapping Summary
Address length = (t+ s + w) bits Number of addressable units = 2s+w words or bytes Block size = 2w words or bytes Number of blocks in main memory = 2t+s+w/2w = 2t+s Number of slots in cache = m = 2s Size of tag = t bits Computer Architecture

163 Direct Mapping Characteristics
Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high. Computer Architecture

164 Computer Architecture
Associative Mapping A main memory block can load into any line of cache. Memory address is interpreted as tag and word Tag uniquely identifies block of memory. Every line’s tag is examined for a match. Cache searching gets expensive. Computer Architecture

165 Fully Associative Cache Organization
Tag Field Word Field Memory Address Cache Tag Word Tag Data Slot(0) Slot(i) Comparator (Cache hit) Slot(m-1) (Cache miss) Main Memory Computer Architecture

166 Associative Mapping Example
Address Tag Word Data Tag data slot # 5 bits bits Cache(32 bytes) Main Memory (128 bytes) Computer Architecture

167 Associative Mapping-Address Structure
Word 2 bit Tag 5 bit 5 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block Computer Architecture

168 Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Slot size = 2w words or bytes Number of tags in main memory = 2t+ w/2w = 2t Number of slots in cache = undetermined Size of tag = t bits Computer Architecture

169 Set Associative Mapping
Cache is divided into a number of sets. Each set contains a number of lines. A given block maps to any line in a given set. e.g. Block B can be in any line of set i. e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set. Tag Field Set Field Word Field Computer Architecture

170 Set Associative Mapping Example
3 2 2 Tag Set Word Cache Tag Data Memory Address Slot(0) Slot(1) Set(0) Tag Set Word Slot(0) Slot(1) Set(i) Comparator Slot(0) Slot(1) Set(m-1) (Cache hit) (Cache miss) Main Memory Computer Architecture

171 Set Associative Mapping -Address Structure
Tag 9 bit Set 13 bit Word 2 bit Use set field to determine cache set to look in. Compare tag field to see if we have a hit. e.g Address Tag Data Set number 1FF 7FFC 1FF FFF 001 7FFC FFF Computer Architecture

172 Set Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s – d) bits Computer Architecture

173 Computer Architecture
Pentium 4 Cache 80386 – no on chip cache 80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions Pentium 4 – L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative Computer Architecture

174 Pentium 4 Core Processor
Fetch/Decode Unit Fetches instructions from L2 cache Decode into micro-ops Store micro-ops in L1 cache Out of order execution logic Schedules micro-ops Based on data dependence and resources May speculatively execute Execution units Execute micro-ops Data from L1 cache Results in registers Memory subsystem L2 cache and systems bus Computer Architecture

175 Computer Architecture
Pentium 4 Design Decodes instructions into RISC like micro-ops before L1 cache Micro-ops fixed length Superscalar pipelining and scheduling Pentium instructions long & complex Performance improved by separating decoding from scheduling & pipelining (More later – ch14) Data cache is write back Can be configured to write through L1 cache controlled by 2 bits in register CD = cache disable NW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate Computer Architecture

176 Computer Architecture
DRAM Synchronous DRAM (SDRAM) Add a clock signal to DRAM interface, so that the repeated transfers would not bear overhead to synchronize with DRAM controller Double Data Rate (DDR SDRAM) Transfer data on both the rising edge and falling edge of the DRAM clock signal  doubling the peak data rate DDR2 lowers power by dropping the voltage from 2.5 to 1.8 volts + offers higher clock rates: up to 400 MHz DDR3 drops to 1.5 volts + higher clock rates: up to 800 MHz Improved Bandwidth, not Latency Computer Architecture

177 Computer Architecture
DRAM Standard Clock Rate (MHz) M transfers / second DRAM Name Mbytes/s/ DIMM DIMM Name DDR 133 266 DDR266 2128 PC2100 150 300 DDR300 2400 PC2400 200 400 DDR400 3200 PC3200 DDR2 533 DDR2-533 4264 PC4300 333 667 DDR2-667 5336 PC5300 800 DDR2-800 6400 PC6400 DDR3 1066 DDR3-1066 8528 PC8500 666 1333 DDR3-1333 10664 PC10700 1600 DDR3-1600 12800 PC12800 x 2 x 8 Computer Architecture

178 Computer Architecture
Error Correction Motivation: Failures/time proportional to number of bits! As DRAM cells shrink, more vulnerable Went through period in which failure rate was low enough without error correction that people didn’t do correction DRAM banks too large now Servers always corrected memory systems Basic idea: add redundancy through parity bits Common configuration: Random error correction SEC-DED (single error correct, double error detect) One example: 64 data bits + 8 parity bits (11% overhead) Really want to handle failures of physical components as well Organization is multiple DRAMs/DIMM, multiple DIMMs Want to recover from failed DRAM and failed DIMM! “Chip kill” handle failures width of single DRAM chip Computer Architecture


Download ppt "Computer Architecture"

Similar presentations


Ads by Google