Computer Systems & Networks

Computer Systems & Networks
I – Computer Systems Roger Webb (13DJ01)

Syllabus Computer Organisation & Architecture – William Stallings
chapter 2 chapter 3 chapter 4 chapter 6 chapter 7 chapter 8 chapter 9 .chapter 10 chapter 11 chapter 14 Introduction Computer Interconnection Structures Internal & External Memory Input & Output Operating Systems – part 1 Operating Systems – part 2 Computer Arithmetic – part 1 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function Instruction Sets: Addressing Modes & Formats CPU Structure & Function – part 1 CPU Structure & Function – part 2 Control Unit Operation Computer Organisation & Architecture – William Stallings

Introduction History of Computing:
Computer:- a person who performs computations Early Computing Devices: (8000BC – 1600) 8000 BC Tally Bones/Sticks 5000 BC – Sumeria – Abaq (writing in dust) 3000 BC – China - the Abacus 800 BC – Abax arrives in Europe 1594 – John Napier natural logs

Introduction History of Computing: Mechanical Computers: (1600-1800)
1615 – Slide Rule – William Oughtred 1617 – Napier’s Bones 1642 – Pascaline - Blaise Pascal 1804 – Punch Card Loom – Jacquard

Electro-Mechanical Computers: ( ) 1833 – Difference Engine – Babbage - to make accurate trig and log tables - never actually finished – “of no use to science” 1842 – Analytic Engine (programmable version) (1991 Science Museum spent £400,000 to build working model of simplified Difference Engine – 3tons, 6’ high and 10’ long) 1840 – Lady Ada Lovelace – 1st programmer - suggested that Babbage’s engine could be programmed using punch cards

Introduction History of Computing: Mechanical Computers: (1890-1920)
1890 – Holerith’s Tabulating Machine - punch card data for sorting census data 1906 – Lee De Forest invents vacuum tube (triode) 1911 – Holerith founds Tabulating Machines Company 1924 – Renamed International Business Machines Corporation (IBM) by Thomas J. Watson

Electro-Mechanical Computers: ( ) 1920 – Punch Card Technology 1944 – Mark I (Harvard) similar to Babbage engine. IBM: “electro-mechanical computers will never replace punched card-equipment” 1943 – Colossus (Bletchley Park)

Introduction History of Computing: Electronic Computers: (1940-1950)
1942- First Electronic Computer - ABC uses base-2 number system, memory and logic circuits built from vacuum tubes IBM: “we will never be interested in an electronic computing machine” 1945: “I think there is a world market for maybe 5 computers” Chairman IBM 1946 – ENIAC (Electronic Numerical Integrator & Computer) to compute trajectory tables for the army (only 20% of bombs got within 1000ft) ENIAC’s 18,000 vacuum tubes dimmed the lights of Philadelphia when switched on Used in building first atomic bomb tests at Los Alamos Calculated pi to 2000 places in seventy hours Operated in decimal

ENIAC - details Decimal (not binary) 20 accumulators of 10 digits
Programmed manually by switches 18,000 vacuum tubes 30 tons 15,000 square feet 140 kW power consumption 5,000 additions per second

von Neumann/Turing Stored Program concept
Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Princeton Institute for Advanced Studies - IAS Completed 1952 Main Memory Arithmetic and Logic Unit Program Control Unit Input Output Equipment

1947- First Computer “Bug” moth found in Mark II relay causing malfunction – hence “debugging” 1947 – Transistor Schockley, Bardeen & Brattain paves the way for smaller, low power and more stable systems to be built using same designs but replacing valve technology with transistor technology

IAS Computer IAS – Institute of Advanced Study (Princeton) 1952
Basic Von Neuman Architecture: 1000 x 40 bit words Binary number 2 x 20 bit instructions Set of registers (storage in CPU) Memory Buffer Register Memory Address Register Instruction Register Instruction Buffer Register Program Counter Accumulator Multiplier Quotient

Structure of IAS - detail
Main Memory Arithmetic and Logic Unit Program Control Unit Input Output Equipment MBR Arithmetic & Logic Circuits MQ Accumulator MAR Control Circuits IBR IR PC Address Instructions & Data Central Processing Unit

First commercial silicon transistor Texas Instruments 1958- First Integrated Circuit Jack Kilby from Texas Instruments made simple oscillator 1971 – 1st Microprocessor -Intel Intel (1968) make 4004 processor to do job of 12 chips 1972 – Pioneer 10 uses 4004 chips intel 4004 is first uprocessor to asteroid belt...

Introduction History of Computing: 1st Generation Computers: (1950-60)
Using Valve Technology First Commercial Computer UNIVAC–1 correctly forecast US election in 1951 after only 5% of vote was in 1953- IBM 701 1st attempt from IBM 1954 – IBM 650 2nd attempt made as upgrade of punch card machines estimated total market as 50 – sold

Introduction History of Computing: 2nd Generation Computers: (1960-65)
Using transistor technology 1958 – Univac – NTDS 32kwords memory, 10,702 transistors, $500,000, 25kw 1960 – PDP 1 4kwords memory – CRT display first two player computer game – spacewar (1962) 1963 – PDP 8 $18,000 by companies making mini-comp

Introduction History of Computing: 3rd Generation Computers: (1965-70)
Using Integrated Circuits 1964 – IBM System 360 Upward compatibility - no need to redevelop when upgrading 1966 – Funding for ARPA net The beginnings of the internet 1967 – First Handheld Electronic Calculator

Introduction History of Computing: 4th Generation Computers: (1971- )
Using Large Scale Integrated Circuits 1971 – LSI ICs several thousand transistors on a chip 1971 – Intel invent Microprocessor 1980’s– VLSI ICs several tens of thousands of transistors Pentium 4 80486

Introduction Multi Processor Systems – the way ahead
Fastest machine on the planet – as of November 2002 – the Earth Simulator in Japan

Generations of Computer
Vacuum tube Transistor Small scale integration on Up to 100 devices on a chip Medium scale integration - to 1971 100-3,000 devices on a chip Large scale integration 3, ,000 devices on a chip Very large scale integration to date 100, ,000,000 devices on a chip Ultra large scale integration Over 100,000,000 devices on a chip

Moore’s Law Increased density of components on chip
Gordon Moore – co-founder of Intel Number of transistors on a chip will double every year Since 1970’s development has slowed a little Number of transistors doubles every 18 months Cost of a chip has remained almost unchanged Higher packing density means shorter electrical paths, giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements Fewer interconnections increases reliability

Growth in CPU Transistor Count
Itanium 2 Pentium 4

Semiconductor Memory 1970 Fairchild produce SRAM Intel produce DRAM
Size of a single core i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read – 6 transistors Much faster than core Intel produce DRAM Cheaper but slower – 1 transistor Capacity approximately doubles each year Core Memory DRAM

Speeding it up Pipelining On board cache On board L1 & L2 cache
Branch prediction Data flow analysis Speculative execution

Performance Mismatch Processor speed increased
Memory capacity increased Memory speed lags behind processor speed

Trends in DRAM use

Solutions Increase number of bits retrieved at one time
Make DRAM “wider” rather than “deeper” Change DRAM interface Cache Reduce frequency of memory access More complex cache and cache on chip Increase interconnection bandwidth High speed buses Hierarchy of buses

Internet Resources http://www.intel.com/ http://www.ibm.com
Search for the Intel Museum

von Neumann/Turing Stored Program concept
Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Princeton Institute for Advanced Studies - IAS Completed 1952 Main Memory Arithmetic and Logic Unit Program Control Unit Input Output Equipment

Function of Control Unit
Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given correct control signals Instead of re-wiring, supply a new set of control signals What is a program? A sequence of steps For each step, an arithmetic or logical operation is done For each operation, a different set of control signals is needed Function of Control Unit For each operation a unique code is provided e.g. ADD, MOVE A hardware segment accepts the code and issues the control signals We have a computer!

Components The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit Data and instructions need to get into the system and results out Input/output Temporary storage of code and results is needed Main memory Top Level View

Instruction Cycle Two steps: Fetch Execute

Central Processing Unit
Fetch Cycle Program Counter (PC) holds address of next instruction to fetch Processor fetches instruction from memory location pointed to by PC Increment PC Unless told otherwise Instruction loaded into Instruction Register (IR) Processor interprets instruction and performs required actions Main Memory Arithmetic and Logic Unit Program Control Unit Input Output MBR Arithmetic & Logic Circuits MQ Accumulator MAR Control Circuits IBR IR PC Address Instructions & Data Central Processing Unit

Central Processing Unit
Execute Cycle Processor-memory data transfer between CPU and main memory Processor I/O Data transfer between CPU and I/O module Data processing Some arithmetic or logical operation on data Control Alteration of sequence of operations e.g. jump Combination of above Main Memory Arithmetic and Logic Unit Program Control Unit Input Output MBR Arithmetic & Logic Circuits MQ Accumulator MAR Control Circuits IBR IR PC Address Instructions & Data Central Processing Unit

Example of Program Execution

Connecting Memory Connection
All the units must be connected Different type of connection for different type of unit Memory Input/Output CPU Memory Connection Receives and sends data Receives addresses (of locations) Receives control signals Read Write Timing

Input/Output Connection
Similar to memory from computer’s viewpoint Output Receive data from computer Send data to peripheral Input Receive data from peripheral Send data to computer Receive control signals from computer Send control signals to peripherals e.g. spin disk Receive addresses from computer e.g. port number to identify peripheral Send interrupt signals (control)

CPU Connection Buses Reads instruction and data
Writes out data (after processing) Sends control signals to other units Receives (& acts on) interrupts Buses There are a number of possible interconnection systems Single and multiple BUS structures are most common e.g. Control/Address/Data bus (PC) e.g. Unibus (DEC-PDP)

Data Bus Address bus Control Bus
Carries data Remember that there is no difference between “data” and “instruction” at this level “Width” is a key determinant of performance 8, 16, 32, 64 bit – number of parallel lines Address bus Identify the source or destination of data e.g. CPU needs to read an instruction (data) from a given location in memory Bus width determines maximum memory capacity of system e.g has 16 bit address bus giving 64k address space Control Bus Control and timing information Memory read/write signal Interrupt request Clock signals

Bus Interconnection Scheme

Big, Red & Double Decker? Single Bus Problems
What do buses look like? Parallel lines on circuit boards Ribbon cables Strip connectors on mother boards e.g. PCI Sets of wires Single Bus Problems Lots of devices on one bus leads to: Propagation delays Long data paths mean that co-ordination of bus use can adversely affect performance If aggregate data transfer approaches bus capacity Most systems use multiple buses to overcome these problems

Traditional Bus Strtucture

High Performance Bus

Bus Types Dedicated Multiplexed Separate data & address lines
Shared lines Address valid or data valid control line Advantage - fewer lines Disadvantages More complex control Ultimate performance

Centralised Arbitration Distributed Arbitration
Bus Arbitration More than one module controlling the bus e.g. CPU and DMA controller Only one module may control bus at one time Arbitration may be centralised or distributed Centralised Arbitration Single hardware device controlling bus access Bus Controller or Arbiter May be part of CPU or separate Distributed Arbitration Each module may claim the bus Control logic on all modules

Timing Co-ordination of events on bus Synchronous
Events determined by clock signals Control Bus includes clock line A single 1-0 is a bus cycle All devices can read clock line Usually sync on leading edge Usually a single cycle for an event

Synchronous Timing Diagram

Asynchronous Timing Diagram

Memory Characteristics
Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation

Location Capacity Unit of Transfer
CPU – Internal - External Capacity Word size The natural unit of organisation Number of words or Bytes Unit of Transfer Internal Usually governed by data bus width External Usually a block which is much larger than a word Addressable unit Smallest location which can be uniquely addressed Word internally

Access Methods Sequential Direct Random Associative
Start at the beginning and read through in order Access time depends on location of data and previous location e.g. tape Direct Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location e.g. disk Random Individual addresses identify locations exactly Access time is independent of location or previous access e.g. RAM Associative Data is located by a comparison with contents of a portion of the store e.g. cache

Memory Hierarchy Registers Internal or Main memory External memory
In CPU Internal or Main memory May include one or more levels of cache “RAM” External memory Backing store

Performance Access time Memory Cycle time Transfer Rate
Time between presenting the address and getting the valid data Memory Cycle time Time may be required for the memory to “recover” before next access Cycle time is access + recovery Transfer Rate Rate at which data can be moved

Physical Types Semiconductor Magnetic Optical Others RAM Disk & Tape
CD & DVD Others Bubble Hologram

Physical Characteristics
Decay Volatility Erasable Power consumption Organisation Physical arrangement of bits into words Not always obvious e.g. interleaved The Bottom Line How much? Capacity How fast? Time is money How expensive?

Hierarchy List CPU Registers L1 Cache L2 Cache Main memory Disk cache
Optical Tape

Semiconductor Memory RAM
Misnamed as all semiconductor memory is random access Read/Write Volatile Temporary storage Static or dynamic

Dynamic RAM Bits stored as charge in capacitors Charges leak
Need refreshing even when powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory

Static RAM Bits stored as on/off switches No charges to leak
No refreshing needed when powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache

Read Only Memory (ROM) Permanent storage Microprogramming (see later)
Library subroutines Systems programs (BIOS) Function tables

Types of ROM Written during manufacture Programmable (once)
Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory Erase whole memory electrically

Organisation in detail
A 16Mbit chip can be organised as 1M of 16 bit words A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array Reduces number of address pins Multiplex row address and column address 11 pins to address (211=2048) Adding one more pin doubles range of values so x4 capacity

Refreshing Refresh circuit included on chip Disable chip
Count through rows Read & Write back Takes time Slows down apparent performance

Typical 16 Mb DRAM (4M x 4)

Module Organisation 1 bit per chip per word

Module Organisation (2)
Larger Memory Size

Cache Small amount of fast memory
Sits between normal main memory and CPU May be located on CPU chip or module

So you want fast? It is possible to build a computer which uses only static RAM This would be very fast This would need no cache How can you cache cache? This would cost a very large amount

Cache operation - overview
CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

Cache Design Size Mapping Function Replacement Algorithm Write Policy
Block Size Number of Caches

Size does matter Locality of Reference
Cost More cache is expensive Speed More cache is faster (up to a point) Checking cache for data takes time Locality of Reference During the course of the execution of a program, memory references tend to cluster e.g. loops

Newer RAM Technology (1)
Basic DRAM same since first RAM chips Enhanced DRAM Contains small SRAM as well SRAM holds last line read (c.f. Cache!) Cache DRAM Larger SRAM component Use as cache or serial buffer

Newer RAM Technology (2)
Synchronous DRAM (SDRAM) currently on DIMMs Access is synchronized with an external clock Address is presented to RAM RAM finds data (CPU waits in conventional DRAM) Since SDRAM moves data in time with system clock, CPU knows when data will be ready CPU does not have to wait, it can do something else Burst mode allows SDRAM to set up stream of data and fire it out in block

Input/Output Problems
Wide variety of peripherals Delivering different amounts of data At different speeds In different formats All slower than CPU and RAM Need I/O modules Interface to CPU and Memory Interface to one or more peripherals

I/O Module Function I/O Steps
Links to Peripheral Devices Control bus Data bus Address bus Control & Timing CPU Communication Device Communication Data Buffering Error Detection I/O Steps CPU checks I/O module device status I/O module returns status If ready, CPU requests data transfer I/O module gets data from device I/O module transfers data to CPU Variations for output, DMA, etc.

Input Output Techniques
I/O Module Decisions Hide or reveal device properties to CPU Support multiple or single device Control device functions or leave for CPU Also O/S decisions e.g. Unix treats everything it can as a file Input Output Techniques Programmed Interrupt driven Direct Memory Access (DMA)

Programmed I/O - detail
CPU has direct control over I/O Sensing status Read/write commands Transferring data CPU waits for I/O module to complete operation Wastes CPU time Programmed I/O - detail CPU requests I/O operation I/O module performs operation I/O module sets status bits CPU checks status bits periodically I/O module does not inform CPU directly I/O module does not interrupt CPU CPU may wait or come back later

Addressing I/O Devices
I/O Commands CPU issues address Identifies module (& device if >1 per module) CPU issues command Control - telling module what to do e.g. spin up disk Test - check status e.g. power? Error? Read/Write Module transfers data via buffer from/to device Addressing I/O Devices Under programmed I/O data transfer is very like memory access (CPU viewpoint) Each device given unique identifier CPU commands contain identifier (address)

I/O Mapping Memory mapped I/O
Devices and memory share an address space I/O looks just like memory read/write No special commands for I/O Large selection of memory access commands available Isolated I/O Separate address spaces Need I/O or memory select lines Special commands for I/O Limited set

Interrupt Driven I/O - Basic Operation
Overcomes CPU waiting No repeated CPU checking of device I/O module interrupts when ready Interrupt Driven I/O - Basic Operation CPU issues read command I/O module gets data from peripheral whilst CPU does other work I/O module interrupts CPU CPU requests data I/O module transfers data

CPU Viewpoint Design Issues
Issue read command Do other work Check for interrupt at end of each instruction cycle If interrupted:- Save context (registers) Process interrupt Fetch data & store See Operating Systems notes Design Issues How do you identify the module issuing the interrupt? How do you deal with multiple interrupts? i.e. an interrupt handler being interrupted

Identifying Interrupting Module
Different line for each module Limits number of devices Software poll CPU asks each module in turn - Slow Daisy Chain or Hardware poll Interrupt Acknowledge sent down a chain Module responsible places vector on bus CPU uses vector to identify handler routine Bus Master Module must claim the bus before it can raise interrupt e.g. PCI & SCSI Multiple Interrupts Each interrupt line has a priority Higher priority lines can interrupt lower priority lines If bus mastering only current master can interrupt

Direct Memory Access DMA Function
Interrupt driven and programmed I/O require active CPU intervention Transfer rate is limited CPU is tied up DMA is the answer DMA Function Additional Module (hardware) on bus DMA controller takes over from CPU for I/O

DMA Operation CPU tells DMA controller:-
Read/Write Device address Starting address of memory block for data Amount of data to be transferred CPU carries on with other work DMA controller deals with transfer DMA controller sends interrupt when finished

DMA Transfer - Cycle Stealing
DMA controller takes over bus for a cycle Transfer of one word of data Not an interrupt CPU does not switch context CPU suspended just before it accesses bus i.e. before an operand or data fetch or a data write Slows down CPU but not as much as CPU doing transfer

DMA Configurations (1) CPU DMA Controller I/O Device Main Memory
Single Bus, Detached DMA controller Each transfer uses bus twice I/O to DMA then DMA to memory CPU is suspended twice

DMA Configurations (2) CPU DMA Controller I/O Device Main Memory
Single Bus, Integrated DMA controller Controller may support >1 device Each transfer uses bus once DMA to memory CPU is suspended once

O/S Objectives and Functions
Convenience Making the computer easier to use Efficiency Allowing better use of computer resources Layers

Operating System Services
Program creation Program execution Access to I/O devices Controlled access to files System access Error detection and response Accounting

Types of Operating System
Interactive Batch Single program (Uni-programming) Multi-programming (Multi-tasking) Early Systems Late 1940s to mid 1950s No Operating System Programs interact directly with hardware Two main problems: Scheduling Setup time

Simple Batch Systems Resident Monitor program
Users submit jobs to operator Operator batches jobs Monitor controls sequence of events to process batch When one job is finished, control returns to Monitor which reads next job Monitor handles scheduling

Desirable Hardware Features
Memory protection To protect the Monitor (not screen saver!) Timer To prevent a job monopolizing the system Privileged instructions Only executed by Monitor e.g. I/O Interrupts Allows for relinquishing and regaining control Multi-programmed Batch Systems I/O devices very slow When one program is waiting for I/O, another can use the CPU

Multi-Programming with 2 Programs
Single Program Multi-Programming with 2 Programs

Multi-Programming with 3 Programs

Time Sharing Systems Process States
Allow users to interact directly with the computer i.e. Interactive Multi-programming allows a number of users to interact with the computer Process States

Process Control Block Identifier PID
State Running, Ready, Blocked, New, Exit Priority High, Medium, Low, etc…. Program Counter Where were we? Memory pointers Preserve our calculations Context data What was in the registers I/O status Reading, Writing, Null Accounting information How Much Used Already

Memory Management Uni-program Multi-program Memory split into two
One for Operating System (monitor) One for currently executing program Multi-program “User” part is sub-divided and shared among active processes

Swapping Problem: I/O is so slow compared with CPU that even in multi-programming system, CPU can be idle most of the time Solutions: Increase main memory Expensive Leads to larger programs Swapping

What is Swapping? Long term queue of processes stored on disk
Processes “swapped” in as space becomes available As a process completes it is moved out of main memory If none of the processes in memory are ready (i.e. all I/O blocked) Swap out a blocked process to intermediate queue Swap in a ready process or a new process But swapping is an I/O process...

Partitioning Splitting memory into sections to allocate to processes (including Operating System) Fixed-sized partitions May not be equal size Process is fitted into smallest hole that will take it (best fit) Some wasted memory Leads to variable sized partitions

Variable Sized Partitions (1)
Allocate exactly the required memory to a process This leads to a hole at the end of memory, too small to use Only one small hole - less waste When all processes are blocked, swap out a process and bring in another New process may be smaller than swapped out process Another hole

Variable Sized Partitions (2)
Eventually have lots of holes (fragmentation) Solutions: Coalesce - Join adjacent holes into one large hole Compaction - From time to time go through memory and move all hole into one free block (c.f. disk de-fragmentation)

Effect of Dynamic Partitioning

Relocation No guarantee that process will load into the same place in memory Instructions contain addresses Locations of data Addresses for instructions (branching) Logical address - relative to beginning of program Physical address - actual location in memory (this time) Automatic conversion using base address

Paging Split memory into equal sized, small chunks -page frames
Split programs (processes) into equal sized small chunks - pages Allocate the required number page frames to a process Operating System maintains list of free frames A process does not require contiguous page frames Use page table to keep track

Logical and Physical Addresses - Paging

Virtual Memory Demand paging Page fault
Do not require all pages of a process in memory Bring in pages as required Page fault Required page is not in memory Operating System must swap in required page May need to swap out a page to make space Select page to throw out based on recent history

Thrashing Too many processes in too little memory
Operating System spends all its time swapping Little or no real work is done Disk light is on all the time Solutions Good page replacement algorithms Reduce number of processes running Fit more memory

Bonus We do not need all of a process in memory for it to run
We can swap in pages as required So - we can now run processes that are bigger than total memory available! Main memory is called real memory User/programmer sees much bigger memory - virtual memory

Page Table Structure

Advantages of Segmentation
Paging is not (usually) visible to the programmer Segmentation is visible to the programmer Usually different segments allocated to program and data May be a number of program and data segments Advantages of Segmentation Simplifies handling of growing data structures Allows programs to be altered and recompiled independently, without re-linking and re-loading Lends itself to sharing among processes Lends itself to protection Some systems combine segmentation with paging

Arithmetic & Logic Unit
Does the calculations Everything else in the computer is there to service this unit Handles integers May handle floating point (real) numbers May be separate FPU (maths co-processor) May be on chip separate FPU (486DX +)

Integer Representation
Only have 0 & 1 to represent everything Positive numbers stored in binary e.g. 41= No minus sign No decimal point Instead we can use either: Sign-Magnitude Two’s compliment

Sign-Magnitude Left most bit is sign bit 0 means positive
1 means negative +18 = -18 = Problems Need to consider both sign and magnitude in arithmetic Two representations of zero (+0 and -0)

Two’s Compliment Benefits
+3 = +2 = +1 = +0 = -1 = -2 = -3 = Benefits One representation of zero Arithmetic works easily (see later) Negating is fairly easy 3 = Boolean complement gives Add 1 to LSB

Geometric Depiction of Twos Complement Integers

Negation Special Case 1 0 = 00000000 Bitwise not 11111111
0 = Bitwise not Add 1 to LSB Result Overflow is ignored, so: - 0 = 0 

Negation Special Case 2 -128 = 10000000 bitwise not 01111111
-128 = bitwise not Add 1 to LSB Result So: -(-128) = X Monitor MSB (sign bit) It should change during negation In this case 128 is too big to be represented

Range of Numbers 8 bit 2s compliment 16 bit 2s compliment
+127 = = 27 -1 -128 = = -27 16 bit 2s compliment = = = = -215

Conversion Between Lengths
Positive number pack with leading zeros +18 = +18 = Negative numbers pack with leading ones -18 = -18 = i.e. pack with MSB (sign bit)

Addition and Subtraction
Normal binary addition Monitor sign bit for overflow Take twos compliment of subtrahend and add to minuend i.e. a - b = a + (-b) So we only need addition and complement circuits

Hardware for Addition and Subtraction

Multiplication Complex Work out partial product for each digit
Take care with place value (column) Add partial products

Multiplication Example
Multiplicand (11 dec) x Multiplier (13 dec) Partial products Note: if multiplier bit is 1 copy multiplicand (place value) otherwise zero Product (143 dec) Note: need double length result

Unsigned Binary Multiplication
Initial Values If this digit is a one then add multiplicand Shift A and Q right

First Cycle If this digit is a one then add multiplicand Shift A and Q right

Second Cycle If this digit is a one then add multiplicand Shift A and Q right

Third Cycle If this digit is a one then add multiplicand Shift A and Q right

Fourth Cycle If this digit is a one then add multiplicand Shift A and Q right

Fourth Cycle 1 If this digit is a one then add multiplicand Shift A and Q right

Fourth Cycle If this digit is a one then add multiplicand Shift A and Q right

Flowchart for Unsigned Binary Multiplication

Multiplying Negative Numbers
This does not work! Solution 1 Convert to positive if required Multiply as above If signs were different, negate answer Solution 2 Booth’s algorithm

Example of Booth’s Algorithm
Multiplier and multiplicand placed in Q and M Additional 1 bit register Q-1 Results appear in A and Q A and Q initialised to 0 Q0 and Q-1 examined If both same (0-0 or 1-1) Shift A and Q right If different If 0-1 then Add M If 1-0 then Subtract M Shift preserves the sign An-1  An-2 and remains in An-1

Division More complex than multiplication
Negative numbers are really bad! Based on long division 001111 1011 001110 100 Quotient Dividend Remainder Partial Remainders Divisor

Real Numbers Numbers with fractions Could be done in pure binary
= =9.625 Where is the binary point? Fixed? Very limited Moving? How do you show where it is?

Floating Point Sign bit Biased Significand or Mantissa Exponent
+/- .significand x 2exponent Misnomer – not really a floating point Point is actually fixed between sign bit and body of mantissa Exponent indicates place value (point position)

Floating Point Examples
Mantissa is stored in 2s compliment Exponent is in excess or biased notation e.g. Excess (bias) 128 means 8 bit exponent field Pure value range 0-255 Subtract 128 to get correct value Range -128 to +127

FP Ranges For a 32 bit number Accuracy 8 bit exponent
The effect of changing lsb of mantissa 23 bit mantissa 2-23  1.2 x 10-7 About 6 decimal places

Expressible Numbers

IEEE 754 Standard for floating point storage 32 and 64 bit standards
8 and 11 bit exponent respectively Extended formats (both mantissa and exponent) for intermediate results

FP Arithmetic +/- Check for zeros
Align significands (adjusting exponents) Add or subtract significands Normalize

FP Arithmetic x/ Check for zero Add/subtract exponents
Multiply/divide significands (watch sign) Normalize Round All intermediate results should be in double length storage

Floating Point Multiplication

Floating Point Division

What is an instruction set?
The complete collection of instructions that are understood by a CPU Machine Code – set of binary digits Usually represented by assembly codes Elements of an Instruction Operation code (Op code) Do this Source Operand reference To this Result Operand reference Put the answer here Next Instruction Reference When you have done that, do this... 2

Instruction Representation
Operands – Where From? Main memory (or virtual memory or cache) CPU register I/O device Instruction Representation In machine code each instruction has a unique bit pattern For human consumption (well, programmers anyway) a symbolic representation is used e.g. ADD, SUB, LOAD Operands can also be represented in this way ADD A,B 4

Number of Addresses (a)
Instruction Types Data processing Data storage (main memory) Data movement (I/O) Program flow control Number of Addresses (a) 3 addresses Operand 1, Operand 2, Result a = b + c; May be a fourth - next instruction (usually implicit) Not common Needs very long “words” to hold everything 6

Number of Addresses (b)
One address doubles as operand and result a = a + b Reduces length of instruction Requires some extra work Temporary storage to hold some results Number of Addresses (c) 1 address Implicit second address Usually a register (accumulator) Common on early machines 8

Number of Addresses (d)
0 (zero) addresses All addresses implicit Uses a stack e.g. push a push b add pop c c = a + b 10

How Many Addresses is good?
More addresses More complex (powerful?) instructions More registers Inter-register operations are quicker Fewer instructions per program Fewer addresses Less complex (powerful?) instructions More instructions per program Faster fetch/execution of instructions 11

Design Decisions (1) Operation repertoire Data types
How many operations? What can they do? How complex are they? Data types Instruction formats Length of op code field Number of addresses/instruction Registers Number of CPU registers available Which operations can be performed on which registers? Addressing modes (later…) RISC v CISC - whole lecture on its own 12

Types of Operand Addresses Numbers Characters Logical Data
Integer/floating point Characters ASCII etc. Logical Data Bits or flags (Aside: Is there any difference between numbers and characters? ) 14

Specific Data Types General - arbitrary binary contents
Integer - single binary value Ordinal - unsigned integer Unpacked BCD - One digit per byte Packed BCD - 2 BCD digits per byte Near Pointer - 32 bit offset within segment Bit field Byte String Floating Point ………… 16

Types of Operation Data Transfer Arithmetic Logical Conversion I/O
System Control Transfer of Control 18

Data Transfer Specify Source Destination Amount of data May be different instructions for different movements e.g. IBM 370 L ; 32 bits from memory to register LH ; 16 bits from memory to register LE ; 32 bits from memory to fp register Etc… Or one instruction and different addresses e.g. VAX LD ;16 bits from anywhere to anywhere 19

Arithmetic Logical Conversion
Add, Subtract, Multiply, Divide Signed Integer Floating point ? May include Increment (a++) Decrement (a--) Negate (-a) Logical Bitwise operations AND, OR, NOT Conversion E.g. Binary to Decimal 20

Input/Output Systems Control
May be specific instructions May be done using data movement instructions (memory mapped) May be done by a separate controller (DMA) Systems Control Privileged instructions CPU needs to be in specific state For operating systems or compiler use 23

Transfer of Control Branch Skip Subroutine call
e.g. branch to x if result is zero Skip e.g. increment and skip if zero ISZ Register1 Branch xxxx ADD A Subroutine call c.f. interrupt call 25

Byte Order What order do we read numbers that occupy more than one byte? e.g. (numbers in hex to make it easy to read) can be stored in 4x8bit locations as follows Address Value (1) Value(2) i.e. read top down or bottom up? 27

Byte Order Names The problem is called Endian
The system on the left has the least significant byte in the lowest address This is called big-endian The system on the right has the least significant byte in the highest address This is called little-endian 12 34 56 78 184 185 186 187 12 34 56 78 187 186 185 184 29

Standard…What Standard?
Pentium (80x86), VAX are little-endian IBM 370, Moterola 680x0 (Mac), and most RISC are big-endian Internet is big-endian Makes writing Internet programs on PC more awkward! WinSock provides htoi and itoh (Host to Internet & Internet to Host) functions to convert 30

Addressing Modes Immediate Direct Indirect Register Register Indirect
Displacement (Indexed) Stack 2

Immediate Addressing Operand is part of instruction
Operand = address field e.g. ADD 5 Add 5 to contents of accumulator 5 is operand No memory reference to fetch data Fast Limited range Operand Opcode Instruction 3

Direct Addressing Address field contains address of operand
Effective address (EA) = address field (A) e.g. ADD A Add contents of cell A to accumulator Look in memory at address A for operand Single memory reference to access data No additional calculations to work out effective address Limited address space Address A Opcode Instruction Memory Operand 5

Indirect Addressing (1)
Memory cell pointed to by address field contains the address of (pointer to) the operand EA = (A) Look in A, find address (A) and look there for operand e.g. ADD (A) Add contents of cell pointed to by contents of A to accumulator Address A Opcode Instruction Memory Operand Pointer to operand 7

Indirect Addressing (2)
Large address space 2n where n = word length May be nested, multilevel, cascaded e.g. EA = (((A))) Draw the diagram yourself Multiple memory accesses to find operand Hence slower 8

Register Addressing (1)
Operand is held in register named in address filed EA = R Limited number of registers Very small address field needed Shorter instructions Faster instruction fetch Register Address R Opcode Instruction Registers Operand 10

Register Addressing (2)
No memory access Very fast execution Very limited address space Multiple registers helps performance Requires good assembly programming or compiler writing c.f. Direct addressing 11

Register Indirect Addressing
C.f. indirect addressing EA = (R) Operand is in memory cell pointed to by contents of register R Large address space (2n) One fewer memory access than indirect addressing Register Address R Opcode Instruction Memory Operand Pointer to Operand Registers 13

Displacement Addressing
EA = A + (R) Address field hold two values A = base value R = register that holds displacement or vice versa Register R Opcode Instruction Memory Operand Pointer to Operand Registers Address A + 15

Relative Addressing A version of displacement addressing
R = Program counter, PC EA = A + (PC) i.e. get operand from A cells from current location pointed to by PC c.f locality of reference & cache usage 17

Base-Register Addressing
A holds displacement R holds pointer to base address R may be explicit or implicit e.g. segment registers in 80x86 18

Indexed Addressing A = base R = displacement EA = A + R
Good for accessing arrays R++ 19

Combinations Postindex EA = (A) + (R) Preindex EA = (A+(R))
(Draw the diagrams) 20

Stack Addressing Operand is (implicitly) on top of stack e.g.
ADD Pop top two items from stack and add 21

Instruction Formats Layout of bits in an instruction Includes opcode
Includes (implicit or explicit) operand(s) Usually more than one instruction format in an instruction set

Instruction Length Affected by and affects:
Memory size Memory organization Bus structure CPU complexity CPU speed Trade off between powerful instruction repertoire and saving space

Allocation of Bits Number of addressing modes Number of operands
Register versus memory Number of register sets Address range Address granularity

CPU Structure CPU must: Fetch instructions Interpret instructions
Fetch data Process data Write data 22

User Visible Registers
CPU must have some working space (temporary storage) Called registers Number and function vary between processor designs One of the major design decisions Top level of memory hierarchy User Visible Registers General Purpose Data Address Condition Codes 23

General Purpose Registers
May be true general purpose May be restricted May be used for data or addressing Data Accumulator Addressing Segment Make them general purpose Increase flexibility and programmer options Increase instruction size & complexity Make them specialized Smaller (faster) instructions Less flexibility 25

How Many GP Registers? How big?
Between Fewer = more memory references More does not reduce memory references and takes up processor real estate How big? Large enough to hold full address Large enough to hold full word Often possible to combine two data registers C programming long int a; 27

Condition Code Registers
Sets of individual bits e.g. result of last operation was zero Can be read (implicitly) by programs e.g. Jump if zero Can not (usually) be set by programs Control & Status Registers Program Counter Instruction Decoding Register Memory Address Register Memory Buffer Register Revision: what do these all do? 29

Program Status Word A set of bits - Includes Condition Codes
Sign - of last result Zero – set when result of last op was zero Carry – set if last op resulted in a carry Equal – set if a logical compare was true Overflow – set to indicate arithmetic overflow Interrupt enable/disable Supervisor Mode Intel ring zero Kernel mode Allows privileged instructions to execute Used by operating system Not available to user programs 31

Other Registers May have registers pointing to:
Process control blocks (see O/S) Interrupt Vectors (see O/S) N.B. CPU design and operating system design are closely linked 33

Example Register Organizations

Instruction Cycle Indirect Cycle
May require memory access to fetch operands Indirect addressing requires more memory accesses Can be thought of as additional instruction subcycle 35

Instruction Cycle with Indirect

Instruction Cycle State Diagram

Data Flow (Instruction Fetch)
Depends on CPU design – in general: Fetch PC contains address of next instruction Address moved to MAR Address placed on address bus Control unit requests memory read Result placed on data bus, copied to MBR, then to IR Meanwhile PC incremented by 1

Data Flow (Data Fetch) IR is examined
If indirect addressing, indirect cycle is performed Right most N bits of MBR transferred to MAR Control unit requests memory read Result (address of operand) moved to MBR

Data Flow (Execute) May take many forms
Depends on instruction being executed May include Memory read/write Input/Output Register transfers ALU operations

Data Flow (Interrupt) Current PC saved to allow resumption after interrupt Contents of PC copied to MBR Special memory location (e.g. stack pointer) loaded to MAR MBR written to memory PC loaded with address of interrupt handling routine Next instruction (first of interrupt handler) can be fetched

Prefetch Improved Performance
Fetch accessing main memory Execution usually does not access main memory Can fetch next instruction during execution of current instruction Called instruction prefetch Improved Performance But not doubled: Fetch usually shorter than execution Prefetch more than one instruction? Any jump or branch means that prefetched instructions are not the required instructions Add more stages to improve performance 36

Pipelining Fetch instruction Decode instruction
Calculate operands (i.e. EAs) Fetch operands Execute instructions Write result Overlap these operations 38

Timing of Pipeline 39

Branch in a Pipeline 40

Dealing with Branches Multiple Streams
Prefetch Branch Target Loop buffer Branch prediction Delayed branching Multiple Streams Have two pipelines Prefetch each branch into a separate pipeline Use appropriate pipeline Leads to bus & register contention Multiple branches lead to further pipelines being needed 41

Prefetch Branch Target
Target of branch is prefetched in addition to instructions following branch Keep target until branch is executed Used by IBM 360/91 Loop Buffer Very fast memory Maintained by fetch stage of pipeline Check buffer before fetching from memory Very good for small loops or jumps c.f. cache Used by CRAY-1 43

Branch Prediction Predict never taken Predict always taken
Assume that jump will not happen Always fetch next instruction Predict always taken Assume that jump will happen Always fetch target instruction Predict by Opcode Some instructions more likely to result in a jump Can get up to 75% success Taken/Not taken switch Based on previous history Good for loops Delayed Branch Do not take jump until you have to Rearrange instructions 45

Branch Prediction State Diagram

Micro-Operations A computer executes a program Fetch/execute cycle
Each cycle has a number of steps see pipelining Called micro-operations Each step does very little Atomic operation of CPU

Constituent Elements of Program Execution

Fetch - 4 Registers Memory Address Register (MAR)
Connected to address bus Specifies address for read or write op Memory Buffer Register (MBR) Connected to data bus Holds data to write or last data read Program Counter (PC) Holds address of next instruction to be fetched Instruction Register (IR) Holds last instruction fetched

Fetch Sequence Address of next instruction is in PC
Address (MAR) is placed on address bus Control unit issues READ command Result (data from memory) appears on data bus Data from data bus copied into MBR PC incremented by 1 (in parallel with data fetch from memory) Data (instruction) moved from MBR to IR MBR is now free for further data fetches

Fetch Sequence (symbolic)
t1: MAR <- (PC) t2: MBR <- (memory) PC <- (PC) +1 t3: IR <- (MBR) (tx = time unit/clock cycle) Or (conceptually identical) t3: PC <- (PC) +1 IR <- (MBR)

Rules for Clock Cycle Grouping
Proper sequence must be followed MAR <- (PC) must precede MBR <- (memory) Conflicts must be avoided Must not read & write same register at same time MBR <- (memory) & IR <- (MBR) must not be in same cycle Also: PC <- (PC) +1 involves addition Use ALU May need additional micro-operations

Indirect Address Cycle
MAR <- (IRaddress) - address field of IR MBR <- (memory) IRaddress <- (MBRaddress) MBR contains an address IR is now in same state as if direct addressing had been used

Interrupt Cycle This is a minimum
t1: MBR <-(PC) t2: MAR <- save-address PC <- routine-address t3: memory <- (MBR) This is a minimum May be additional micro-ops to get addresses N.B. saving context is done by interrupt handler routine, not micro-ops

Execute Cycle (ADD) Different for each instruction
e.g. ADD R1,X - add the contents of location X to Register 1 , result in R1 t1: MAR <- (IRaddress) t2: MBR <- (memory) t3: R1 <- R1 + (MBR) Note no overlap of micro-operations

Execute Cycle (ISZ) ISZ X - increment and skip if zero Notes:
t1: MAR <- (IRaddress) t2: MBR <- (memory) t3: MBR <- (MBR) + 1 t4: memory <- (MBR) if (MBR) == 0 then PC <- (PC) + 1 Notes: if is a single micro-operation Micro-operations done during t4

Execute Cycle (BSA) BSA X - Branch and save address
Address of instruction following BSA is saved in X Execution continues from X+1 t1: MAR <- (IRaddress) MBR <- (PC) t2: PC <- (IRaddress) memory <- (MBR) t3: PC <- (PC) + 1

Functional Requirements
Define basic elements of processor Describe micro-operations processor performs Determine functions control unit must perform Basic Elements of Processor ALU Registers Internal data paths External data paths Control Unit Types of Micro-operation Transfer data between registers Transfer data from register to external Transfer data from external to register Perform arithmetic or logical ops

Functions of Control Unit
Sequencing Causing the CPU to step through a series of micro-operations Execution Causing the performance of each micro-op This is done using Control Signals

Control Signals Clock Instruction register Flags From control bus
One micro-instruction (or set of parallel micro-instructions) per clock cycle Instruction register Op-code for current instruction Determines which micro-instructions are performed Flags State of CPU Results of previous operations From control bus Interrupts Acknowledgements

Control Signals - output
Within CPU Cause data movement Activate specific functions Via control bus To memory To I/O modules

Example Control Signal Sequence - Fetch
MAR <- (PC) Control unit activates signal to open gates between PC and MAR MBR <- (memory) Open gates between MAR and address bus Memory read control signal Open gates between data bus and MBR

Internal Organization
Usually a single internal bus Gates control movement of data onto and off the bus Control signals control data transfer to and from external systems bus Temporary registers needed for proper operation of ALU

Hardwired Implementation (1)
Control unit inputs Flags and control bus Each bit means something Instruction register Op-code causes different control signals for each different instruction Unique logic for each op-code Decoder takes encoded input and produces single output n binary inputs and 2n outputs

Hardwired Implementation (2)
Clock Repetitive sequence of pulses Useful for measuring duration of micro-ops Must be long enough to allow signal propagation Different control signals at different times within instruction cycle Need a counter with different control signals for t1, t2 etc.

Problems With Hard Wired Designs
Complex sequencing & micro-operation logic Difficult to design and test Inflexible design Difficult to add new instructions Better to use a microprogrammed controller……

Computer Systems & Networks

Similar presentations

Presentation on theme: "Computer Systems & Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Systems & Networks

Similar presentations

Presentation on theme: "Computer Systems & Networks"— Presentation transcript:

Similar presentations

About project

Feedback