3Chapter Goals System bus and bus protocol CPU and bus interaction with peripheral devicesDevice controllersInterrupt processingImprove computer system performance
4System Bus A bus is a set of parallel communication lines System bus connects the CPU with other system componentsSubsets of bus linesdata bus – the number of lines is the same as or a multiple of the CPU word sizeaddress bus – the number of lines is ?control bus – carry bus clock pulse, command, response message, status code …
5Bus Clock and Data Transfer Rate Bus clock pulse is a common timing reference for all attached devices (MHz)At the beginning of each bus clock pulse, bus can transmit data or control signalUsually a fraction of CPU clock rateBus cycle is the time interval from one clock pulse to the nextbus cycle time = 1 / bus clock ratecannot be shorter than the time required for an electrical signal to traverse the bus from end to end.Data transfer rate measure of communication capacityBus capacity = data transfer unit x clock ratee.g.400 MHz 2.5 ns64 bit * 400 MHz =
6Bus ProtocolBus protocol governs format, content, timing of data, memory addresses, and control messages sent across busRegulates bus access to prevent devices from interfering with one another.E.g. command acknowledgement confirmationEfficient bus protocol consumes a minimal number of bus cycles, (maximizing bus availability for data transfers), but tend to be complexE.g. SCSI bus protocol (later slides …)Master-slave bus: a traditional computer architecturebus master & bus slavesSimple but system performance is low (why?)
7Chapter Goals System bus and bus protocol CPU and bus interaction with peripheral devicesDevice controllersInterrupt processingImprove computer system performance
8Transferring data without CPU To improve system performance: transferring data without CPUDirect memory access (DMA)Peer-to-peer buses: a bus arbitration unit solves conflictsDMA transfer essentially copies a block of memory from one device to another. While the CPU initiates the transfer, it does not execute the transfer itself.reading and/or writing independently of the central processing unit.With DMA, the CPU would initiate the transfer, do other operations while the transfer is in progress, and receive an interrupt from the DMA controller once the operation has been done.Many hardware systems use DMA including disk drive controllers, graphics cards, network cards, and sound cards.
9An I/O port is a communication pathway from CPU to peripheral device Physical access: system bus is usually physically implemented on a large printed circuit board with attachment points for devices.
10Logical and Physical Access An I/O port is a memory address, or a set of contiguous memory addressescan be read/written by the CPU and a single peripheral deviceAn I/O port is also a logical abstractionenables CPU and bus to interact with each peripheral device as if the device were a storage device with linear address spaceWhat is good about this logic abstraction?Example next …
11Logical accesse.g. in a hard disk, the device controller, translates linear address space into corresponding physical sector, track and platter.How about logic access for other devices? (keyboard, video card, sound card …
12Chapter Goals System bus and bus protocol CPU and bus interaction with peripheral devicesDevice controllersInterrupt processingImprove computer system performance
13Device Controllers Implement the bus interface and access protocols Translate logical addresses into physical addressesEnable several devices to share access to a bus connection
14SCSI (Small Computer System Interface) SCSI is a family of standard buses designed primarily for secondary storage devicesImplements both a low-level physical I/O protocol and a high-level logical device control protocol
15Characteristics of a SCSI Bus Non-proprietary standardHigh data transfer ratee.g. 20 MB/S with SCSI-2, up to 3 GB/S with SASPeer-to-peer capabilityWho is the bus master?High-level (logical) data access commandsE.g., acknowledge, busy, Input/Output, Request, Select, …Multiple command executionThe initiator sends a data structure containing one or more commands.The target queues the commands and executes them in sequenceInterleaved command executionOther devices can use the bus while the target is processing commands
16Chapter Goals System bus and bus protocol CPU and bus interaction with peripheral devicesDevice controllersInterrupt processingImprove computer system performance
17Secondary storage and I/O devices have much slower data transfer rates than the CPU. (Why?) If CPU waits for a I/O device to complete an access request millions CPU cycles could be wasted. (wait state)To improve performance, peripherals devices communicate with the CPU using interrupt.an electrical signal over control busDue to mechanical limitations of secondary storage and I/O devices
18Interrupt Processing Device sends an interrupt (e.g. I/O request) CPU continuously monitors the bus for interrupt. When an interrupt is detected, a numeric interrupt code is copied to an interrupt register.The control unit checks interrupt register at the end of each execution cycle. If an interrupt code is found in the interrupt register, CPU suspends the executing program and handle that interrupt.Pushes current register values onto the stackExecutes a master interrupt handler – supervisorSupervisor checks interrupt code and transfers control to a corresponding interrupt handlerWhen the interrupt handler finishes, the stack is popped and suspended process resumes from point of interruption
19Stack ProcessingWhat is stack in RAMpush, pop, stack pointer stored in a special purpose regstack overflow error
20Multiple InterruptsOperating system groups interrupts by their priority.I/O eventError conditionService requestWhat if an interrupt comes while CPU is processing another interrupt ?If the new interrupt is more important …else …Processing an interrupt typically consumes at least 100 CPU cycles, in addition to the cycles consumed by the interrupt handler.
21Chapter Goals System bus and bus protocol CPU and bus interaction with peripheral devicesDevice controllersInterrupt processingImprove computer system performance
22Buffers and Caches To improve overall computer system performance DMA busInterrupt processingAnother technique is by employing RAMto overcome mismatches in data transfer rate and data transfer unit sizeTwo ways:Bufferingcaching
23BuffersBuffer is a small storage area (usually RAM) that holds data in transit from one device to anotherBuffer Is required when there is difference in data transfer unit sizeE.g. printer
24BuffersBuffer is not required when there is difference in data transfer rate, but using buffer improves performanceExample: a buffer between modem and busHow the buffer size affect the performance?Modem data rate: 56 kbps (slow) vs. System bus data rate: 400 Mbps (fast)Modem transmit 4 byte data per bus cycle, each interrupt also takes one bus cycle.Modem has 4 byte buffer when 4 bytes data is in buffer, modem sends an interrupts to stop CPU sending data, transmit 4 byte at a time, then modem sends another interrupt to let CPU resume sending data. transmitting 4 byte involves 2 interruptsHow the buffer size affect the performance? For example transmit 64 KB data
25The Law of Diminishing Returns When multiple resources are required to produce something useful, adding more and more of a single resource produces fewer and fewer benefitsCheck the previous exampleCPU cycles improves …Total Bus cycles improves …Find an applicable buffer size optimize the cost and performanceissue with buffer: buffer overflow
26CacheCache is a storage area (usually RAM) that holds data duplicating original data stored elsewhere, where the original data is expensive to fetch relative to reading the cache.Accessing the cached data is much more quicker than re-fetching the original dataCache vs. Buffer (P235)Why cache works?Access patterns in typical computer applications have locality of reference:the same data are often used several times, with accesses that are close together in time, or that data near to each other are accessed close together in time.
27Write Access with confirmation Immediate write confirmation: Sending confirmation (2) before data is written to secondary storage device (3)Improve program performance -- program can immediately proceed with other processing tasks.Risky since an error might occur while copying data from the cache to the storage device – data is lost permanently
28Read Access Read accesses are routed to cache (1) first. If data is already in cache, accessed from cache (2) – a cache hitIf data is not in cache, it must be read from the storage device (3) – a cache missPerformance improvement realized only if requested data is already waiting in cache.
29Cache controller Cache can only a small portion of duplicated contents Q1. Which device manages the cache contents?Q2. How to manage the cache contents?A1: Cache controller is a processor that manages cache contentTwo implementations of cache controllerIn storage device controller or communication channelAs a program in operating systemQ3: which implementation is more efficient?Depends …
30Cache ControllerCache controller predicts what data will be requested in the near future; loads it from storage device into cache before it is requested.e.g. a simple prediction algorithm: n, n+1, n+2, …What if the cache controller need to load data from storage device when cache is full? a cache swap cache controller needs to decide what data to swap out.The goal is to increase the cache’s hit ratiohit ratio = num of cache hit / num of read access
31Two Types of Cache Primary storage cache Secondary storage cache Can limit wait states by using SRAM cache between CPU and primary storageLevel one (L1): within CPULevel two (L2): on-chipLevel three (L3): off-chipGives frequently accessed files higher priority for cache retentionGives files opened for random access lower priority for cache retentionUses read-ahead caching for files that are read sequentially
32Processing Parallelism Processing parallelism: breaks a big problem into pieces and solves each piece in parallel with separate CPUs Increases computer system computational capacityThree TechniquesMulticore processorsMulti-CPU architectureClustering
33Scaling Up vs. Scaling Out Scaling up: Increasing processing power by using larger and more powerful computersUsed to be very cost-effectivestill cost-effective when maximal computer power is required and flexibility is not as importantE.g., Multicore & Multi-CPUScaling Out: Partitioning processing among multiple systems -- ClusteringIncreasing speed of communication networks diminished relative performance penaltyDistributed organizational structures emphasize flexibilityImproved software for managing multiprocessor configurations
34Multicore ProcessorsInclude multiple cores and shared memory cache in a single microchip.e.g IBM power 5 CPU)Share memory cache, memory interface, and off-chip I/O circuitry among the cores
35Multi-CPU Architecture Employs multiple single core CPUs on a single motherboard or a set of connected motherboards.Sharing main memory and system busCommon in midrange computers, mainframe computers, and supercomputers
36High-Performance Clustering Connects separate computer systems with high-speed interconnectionsUsed for the largest computational problemse.g. 1, European Centre for Medium-Range Weather forecastse.g. 2, Search for Extraterrestrial Intelligence (article 2)
37CompressionCompression: reduces number of bits required to encode a data set or streamTrade off: need increased processing resources to implement compression/decompression algorithms
38Compression Algorithms Compression algorithms vary in:Type(s) of data for which they are best suitedWhether information is lost during compressionLossless compression: the result of compressing and then decompressing any data is exactly the same as the original inputLossy compression: the result of compressing and then decompressing any data is similar to the original inputCompression ratio100 KB word file 25 KB (4 : 1)150 MB video file 60 MB (2.5 : 1)
39Examples of Compression Algorithm MPEGMotion picture experts groupStandards: MPEG-1, MPEG-2, MPEG-4Encoding images and soundsLayer 1 (system)Layer 2 (video) – I, P, B framesLayer 3 (audio) -- MP3MP3Self-study the basic compression ideas (p248 – 249)I frames (intra-coded frames) contain the information that results from encoding a still image,i.e., with no reference to any other image. They are points of reference and random access inthe video stream. They can be decoded without the need for any other frames. Thecompression rate for these frames is the lowest of all frame types.P frames (predictively coded frames) require information from previous I frames and/or Pframes for encoding and decoding. By exploiting temporal redundancies, P frames achievehigher compression rates than that for I frames.B frames (bidirectionally predictively coded frames) require information from the previousand following I frames and/or P frames for encoding and decoding. They are predicted fromboth previous and following frames. They have the highest compression ration among allframe types.Transmission order: I P B B P B B P B B I B B
40Summary System bus, bus protocol and device controllers Hardware and software techniques for improving overall system performance:bus protocolsinterrupt processingBuffering & cachingProcessing parallelismcompression