SCSC 311 Information Systems: hardware and software

SCSC 311 Information Systems: hardware and software

Chapter Goals System bus and bus protocol
CPU and bus interaction with peripheral devices Device controllers Interrupt processing Improve computer system performance

System Bus A bus is a set of parallel communication lines
System bus connects the CPU with other system components Subsets of bus lines data bus – the number of lines is the same as or a multiple of the CPU word size address bus – the number of lines is ? control bus – carry bus clock pulse, command, response message, status code …

Bus Clock and Data Transfer Rate
Bus clock pulse is a common timing reference for all attached devices (MHz) At the beginning of each bus clock pulse, bus can transmit data or control signal Usually a fraction of CPU clock rate Bus cycle is the time interval from one clock pulse to the next bus cycle time = 1 / bus clock rate cannot be shorter than the time required for an electrical signal to traverse the bus from end to end. Data transfer rate measure of communication capacity Bus capacity = data transfer unit x clock rate e.g. 400 MHz  2.5 ns 64 bit * 400 MHz =

Bus Protocol Bus protocol governs format, content, timing of data, memory addresses, and control messages sent across bus Regulates bus access to prevent devices from interfering with one another. E.g. command  acknowledgement  confirmation Efficient bus protocol consumes a minimal number of bus cycles, (maximizing bus availability for data transfers), but tend to be complex E.g. SCSI bus protocol (later slides …) Master-slave bus: a traditional computer architecture bus master & bus slaves Simple but system performance is low (why?)

Transferring data without CPU
To improve system performance: transferring data without CPU Direct memory access (DMA) Peer-to-peer buses: a bus arbitration unit solves conflicts DMA transfer essentially copies a block of memory from one device to another. While the CPU initiates the transfer, it does not execute the transfer itself. reading and/or writing independently of the central processing unit. With DMA, the CPU would initiate the transfer, do other operations while the transfer is in progress, and receive an interrupt from the DMA controller once the operation has been done. Many hardware systems use DMA including disk drive controllers, graphics cards, network cards, and sound cards.

An I/O port is a communication pathway from CPU to peripheral device
Physical access: system bus is usually physically implemented on a large printed circuit board with attachment points for devices.

Logical and Physical Access
An I/O port is a memory address, or a set of contiguous memory addresses can be read/written by the CPU and a single peripheral device An I/O port is also a logical abstraction enables CPU and bus to interact with each peripheral device as if the device were a storage device with linear address space What is good about this logic abstraction? Example next …

Logical access e.g. in a hard disk, the device controller, translates linear address space into corresponding physical sector, track and platter. How about logic access for other devices? (keyboard, video card, sound card …

Device Controllers Implement the bus interface and access protocols
Translate logical addresses into physical addresses Enable several devices to share access to a bus connection

SCSI (Small Computer System Interface)
SCSI is a family of standard buses designed primarily for secondary storage devices Implements both a low-level physical I/O protocol and a high-level logical device control protocol

Characteristics of a SCSI Bus
Non-proprietary standard High data transfer rate e.g. 20 MB/S with SCSI-2, up to 3 GB/S with SAS Peer-to-peer capability Who is the bus master? High-level (logical) data access commands E.g., acknowledge, busy, Input/Output, Request, Select, … Multiple command execution The initiator sends a data structure containing one or more commands. The target queues the commands and executes them in sequence Interleaved command execution Other devices can use the bus while the target is processing commands

Secondary storage and I/O devices have much slower data transfer rates than the CPU. (Why?)
If CPU waits for a I/O device to complete an access request  millions CPU cycles could be wasted. (wait state) To improve performance, peripherals devices communicate with the CPU using interrupt. an electrical signal over control bus Due to mechanical limitations of secondary storage and I/O devices

Interrupt Processing Device sends an interrupt (e.g. I/O request)
CPU continuously monitors the bus for interrupt. When an interrupt is detected, a numeric interrupt code is copied to an interrupt register. The control unit checks interrupt register at the end of each execution cycle. If an interrupt code is found in the interrupt register, CPU suspends the executing program and handle that interrupt. Pushes current register values onto the stack Executes a master interrupt handler – supervisor Supervisor checks interrupt code and transfers control to a corresponding interrupt handler When the interrupt handler finishes, the stack is popped and suspended process resumes from point of interruption

Stack Processing What is stack in RAM push, pop, stack pointer stored in a special purpose reg stack overflow error

Multiple Interrupts Operating system groups interrupts by their priority. I/O event Error condition Service request What if an interrupt comes while CPU is processing another interrupt ? If the new interrupt is more important … else … Processing an interrupt typically consumes at least 100 CPU cycles, in addition to the cycles consumed by the interrupt handler.

Buffers and Caches To improve overall computer system performance
DMA bus Interrupt processing Another technique is by employing RAM to overcome mismatches in data transfer rate and data transfer unit size Two ways: Buffering caching

Buffers Buffer is a small storage area (usually RAM) that holds data in transit from one device to another Buffer Is required when there is difference in data transfer unit size E.g. printer

Buffers Buffer is not required when there is difference in data transfer rate, but using buffer improves performance Example: a buffer between modem and bus How the buffer size affect the performance? Modem data rate: 56 kbps (slow) vs. System bus data rate: 400 Mbps (fast) Modem transmit 4 byte data per bus cycle, each interrupt also takes one bus cycle. Modem has 4 byte buffer  when 4 bytes data is in buffer, modem sends an interrupts to stop CPU sending data, transmit 4 byte at a time, then modem sends another interrupt to let CPU resume sending data.  transmitting 4 byte involves 2 interrupts How the buffer size affect the performance? For example transmit 64 KB data

The Law of Diminishing Returns
When multiple resources are required to produce something useful, adding more and more of a single resource produces fewer and fewer benefits Check the previous example CPU cycles improves … Total Bus cycles improves … Find an applicable buffer size  optimize the cost and performance issue with buffer: buffer overflow

Cache Cache is a storage area (usually RAM) that holds data duplicating original data stored elsewhere, where the original data is expensive to fetch relative to reading the cache. Accessing the cached data is much more quicker than re-fetching the original data Cache vs. Buffer (P235) Why cache works? Access patterns in typical computer applications have locality of reference: the same data are often used several times, with accesses that are close together in time, or that data near to each other are accessed close together in time.

Write Access with confirmation
Immediate write confirmation: Sending confirmation (2) before data is written to secondary storage device (3) Improve program performance -- program can immediately proceed with other processing tasks. Risky since an error might occur while copying data from the cache to the storage device – data is lost permanently

Read Access Read accesses are routed to cache (1) first.
If data is already in cache, accessed from cache (2) – a cache hit If data is not in cache, it must be read from the storage device (3) – a cache miss Performance improvement realized only if requested data is already waiting in cache.

Cache controller Cache can only a small portion of duplicated contents
Q1. Which device manages the cache contents? Q2. How to manage the cache contents? A1: Cache controller is a processor that manages cache content Two implementations of cache controller In storage device controller or communication channel As a program in operating system Q3: which implementation is more efficient? Depends …

Cache Controller Cache controller predicts what data will be requested in the near future; loads it from storage device into cache before it is requested. e.g. a simple prediction algorithm: n, n+1, n+2, … What if the cache controller need to load data from storage device when cache is full?  a cache swap  cache controller needs to decide what data to swap out. The goal is to increase the cache’s hit ratio hit ratio = num of cache hit / num of read access

Two Types of Cache Primary storage cache Secondary storage cache
Can limit wait states by using SRAM cache between CPU and primary storage Level one (L1): within CPU Level two (L2): on-chip Level three (L3): off-chip Gives frequently accessed files higher priority for cache retention Gives files opened for random access lower priority for cache retention Uses read-ahead caching for files that are read sequentially

Processing Parallelism
Processing parallelism: breaks a big problem into pieces and solves each piece in parallel with separate CPUs  Increases computer system computational capacity Three Techniques Multicore processors Multi-CPU architecture Clustering

Scaling Up vs. Scaling Out
Scaling up: Increasing processing power by using larger and more powerful computers Used to be very cost-effective still cost-effective when maximal computer power is required and flexibility is not as important E.g., Multicore & Multi-CPU Scaling Out: Partitioning processing among multiple systems -- Clustering Increasing speed of communication networks  diminished relative performance penalty Distributed organizational structures emphasize flexibility Improved software for managing multiprocessor configurations

Multicore Processors Include multiple cores and shared memory cache in a single microchip. e.g IBM power 5 CPU) Share memory cache, memory interface, and off-chip I/O circuitry among the cores

Multi-CPU Architecture
Employs multiple single core CPUs on a single motherboard or a set of connected motherboards. Sharing main memory and system bus Common in midrange computers, mainframe computers, and supercomputers

High-Performance Clustering
Connects separate computer systems with high-speed interconnections Used for the largest computational problems e.g. 1, European Centre for Medium-Range Weather forecasts e.g. 2, Search for Extraterrestrial Intelligence (article 2)

Compression Compression: reduces number of bits required to encode a data set or stream Trade off: need increased processing resources to implement compression/decompression algorithms

Compression Algorithms
Compression algorithms vary in: Type(s) of data for which they are best suited Whether information is lost during compression Lossless compression: the result of compressing and then decompressing any data is exactly the same as the original input Lossy compression: the result of compressing and then decompressing any data is similar to the original input Compression ratio 100 KB word file  25 KB (4 : 1) 150 MB video file  60 MB (2.5 : 1)

Examples of Compression Algorithm
MPEG Motion picture experts group Standards: MPEG-1, MPEG-2, MPEG-4 Encoding images and sounds Layer 1 (system) Layer 2 (video) – I, P, B frames Layer 3 (audio) -- MP3 MP3 Self-study the basic compression ideas (p248 – 249) I frames (intra-coded frames) contain the information that results from encoding a still image, i.e., with no reference to any other image. They are points of reference and random access in the video stream. They can be decoded without the need for any other frames. The compression rate for these frames is the lowest of all frame types. P frames (predictively coded frames) require information from previous I frames and/or P frames for encoding and decoding. By exploiting temporal redundancies, P frames achieve higher compression rates than that for I frames. B frames (bidirectionally predictively coded frames) require information from the previous and following I frames and/or P frames for encoding and decoding. They are predicted from both previous and following frames. They have the highest compression ration among all frame types. Transmission order: I P B B P B B P B B I B B

Summary System bus, bus protocol and device controllers
Hardware and software techniques for improving overall system performance: bus protocols interrupt processing Buffering & caching Processing parallelism compression

SCSC 311 Information Systems: hardware and software

Similar presentations

Presentation on theme: "SCSC 311 Information Systems: hardware and software"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SCSC 311 Information Systems: hardware and software

Similar presentations

Presentation on theme: "SCSC 311 Information Systems: hardware and software"— Presentation transcript:

Similar presentations

About project

Feedback