CpE 242 Computer Architecture and Engineering Busses and OS’s Responsibilities Start: X:40.

CpE 242 Computer Architecture and Engineering Busses and OS’s Responsibilities
Start: X:40

Recap: IO Benchmarks and I/O Devices
Disk I/O Benchmarks: Supercomputer Application: main concern is data rate Transaction Processing: main concern is I/O rate File System: main concern is file access Three Components of Disk Access Time: Seek Time Rotational Latency: Transfer Time Let’s do a quick review of what we learned last Wednesday. First we showed you the diversity of I/O requirements by talking about three categories of disk I/O benchmarks. Supercomputer application’s main concern is data rate. The main concern of transaction processing is I/O rate and file system’s main concern is file access. Then we talk about magnetic disk. One thing to remember is that disk access time has 3 components. The first 2 components, seek time and rotational latency involves mechanical moving parts and are therefore very slow compare to the 3rd component, the transfer time. One good thing about the seek time is that this is probably one of the few times in life that you can actually get better than the “advertised” result due to the locality of disk access. As far as graphic display is concerned, resolution is the basic measurement of how much information is on the screen and is usually described as “some number” by “some number.” The first “some number” is the horizontal resolution in number of pixels while the second “some number” is the vertical resolution in number of scan lines. Then I showed you how the size as well as bandwidth requirement of a Color Frame Buffer can be reduced if a Color Map is placed between the Frame Buffer and the graphic display. Finally, we talked about a spacial memory, the VRAM, that can be used to construct the Frame Buffer. It is nothing but a DRAM core with a high speed shift register attach to it. That’s all I have for today and we will continue our discussion on I/O Friday. +3 = 3 min. (X:43)

Outline of Today’s Lecture
Recap and Introduction (5 minutes) Introduction to Buses (15 minutes) Bus Types and Bus Operation (10 minutes) Bus Arbitration and How to Design a Bus Arbiter (15 minutes) Operating System’s Role (15 minutes) Delegating I/O Responsibility from the CPU (5 minutes) Summary (5 minutes) Here is an outline of today’’s lecture. We will spend the first half of the lecture talking about buses and then after the break, we will talk about Operating System’s role in all of these. We will also talk about how to delegate I/O responsibility from the CPU. Finally we will summarize today’s lecture. +1 = 4 min. (X:44)

The Big Picture: Where are We Now?
Today’s Topic: How to connect I/O to the rest of the computer? Network Processor Processor Input Input Control Control Memory Memory Datapath Datapath Output Output In terms of the overall picture, by now we have covered all 5 components of the computer. Today we will talk about how to interface the I/O devices to the processor and memory via busses and the OS software. Next Wednesday, we will show you how multiple computers can be connected together with a network through the I/O devices. +1 = 5 min. (X:45)

Buses: Connecting I/O to Processor and Memory
Input Control Memory Datapath Output In a computer system, the various subsystems must be able to talk to one another. For example, the memory and processor need to communicate with each other. Similarly, the processor needs to communicate with the I/O devices. The most common way to do is to use a bus. A bus is a shared communication link that uses 1 set of wires to connect multiple subsystems. +1 = 6 min. (X:46) A bus is a shared communication link It uses one set of wires to connect multiple subsystems

Advantages of Buses Versatility: New devices can be added easily
Processor I/O Device I/O Device I/O Device Memory Versatility: New devices can be added easily Peripherals can be moved between computer systems that use the same bus standard Low Cost: A single set of wires is shared in multiple ways The two major advantages of the bus organization are versatility and low cost. By versatility, we mean new devices can easily be added. Furthermore, if a device is designed according to a industry bus standard, it can be move between computer systems that use the same bus standard. The bus organization is a low cost solution because a single set of wires is shared in multiple ways. +1 = 7 min. (X:47)

Disadvantages of Buses
Processor I/O Device I/O Device I/O Device Memory It creates a communication bottleneck The bandwidth of that bus can limit the maximum I/O throughput The maximum bus speed is largely limited by: The length of the bus The number of devices on the bus The need to support a range of devices with: Widely varying latencies Widely varying data transfer rates The major disadvantage of the bus organization is that it creates a communication bottleneck. When I/O must pass through a single bus, the bandwidth of that bus can limit the maximum I/O throughput. The maximum bus speed is also largely limited by: (a) The length of the bus. (b) The number of I/O devices on the bus. (C) And the need to support a wide range of devices with a widely varying latencies and data transfer rates. +2 = 9 min. (Y:49)

The General Organization of a Bus
Control Lines Data Lines Control lines: Signal requests and acknowledgments Indicate what type of information is on the data lines Data lines carry information between the source and the destination: Data and Addresses Complex commands A bus transaction includes two parts: Sending the address Receiving or sending the data A bus generally contains a set of control lines and a set of data lines. The control lines are used to signal requests and acknowledgments and to indicate what type of information is on the data lines. The data lines carry information between the source and the destination. This information may consists of data, addresses, or complex commands. A bus transaction includes tow parts: (a) sending the address and (b) then receiving or sending the data. +1 = 10 min (X:50)

Master versus Slave A bus transaction includes two parts:
Master send address Bus Master Bus Slave Data can go either way A bus transaction includes two parts: Sending the address Receiving or sending the data Master is the one who starts the bus transaction by: Salve is the one who responds to the address by: Sending data to the master if the master ask for data Receiving data from the master if the master wants to send data The bus master is the one who starts the bus transaction by sending out the address. The slave is the one who responds to the master by either sending data to the master if the master asks for data. Or the slave may end up receiving data from the master if the master wants to send data. In most simple I/O operations, the processor will be the bus master but as I will show you later in today’s lecture, this is not always be the case. +1 = 11 min. (X:51)

Output Operation Output is defined as the Processor sending data to the I/O device: Step 1: Request Memory Control (Memory Read Request) Processor Memory Data (Memory Address) I/O Device (Disk) Step 2: Read Memory Control Processor Memory Data I/O Device (Disk) We will define the term Output as the processor sending data to the I/O device and as shown here, it is a 3-step processes. The first step is to wake up the memory by sending it a read request on the control line and the address of the location to be read on the data lines. In the second step, the actual read occurs. That is the memory system is accessing the data. Finally, the memory system transfers the data to the I/O device using the bus’s data lines and the control lines are used (either by the processor or by the memory system, depending on the memory system’s intelligence) to tell the I/O device data is coming. +2 = 13 min. (X:53) Step 3: Send Data to I/O Device Control (Device Write Request) Processor Memory Data (I/O Device Address and then Data) I/O Device (Disk)

Input Operation Input is defined as the Processor receiving data from the I/O device: Step 1: Request Memory Control (Memory Write Request) Processor Memory Data (Memory Address) I/O Device (Disk) Step 2: Receive Data Control (I/O Read Request) In our discussion, Input is defined as the processor receiving data from the I/O device and is only a 2-step process. The first step is to wake up the Memory System by telling it a write operation is imminent (Control line) and provides the memory system with the address of the location to be written. Then the CPU will tell the I/O device to start sending data by sending a read request to the I/O device (Control line and I/O address on the data line). The data will then be sent to the Memory System via the data lines. +1 = 54 min. (X:54) Processor Memory Data (I/O Device Address and then Data) I/O Device (Disk)

Types of Buses Processor-Memory Bus (design specific)
Short and high speed Only need to match the memory system Maximize memory-to-processor bandwidth Connects directly to the processor I/O Bus (industry standard) Usually is lengthy and slower Need to match a wide range of I/O devices Connects to the processor-memory bus or backplane bus Backplane Bus (industry standard) Backplane: an interconnection structure within the chassis Allow processors, memory, and I/O devices to coexist Cost advantage: one single bus for all components Buses are traditionally classified as one of 3 types: processor memory buses, I/O buses, or backplane buses. The processor memory bus is usually design specific while the I/O and backplane buses are often standard buses. In general processor bus are short and high speed. It tries to match the memory system in order to maximize the memory-to-processor BW and is connected directly to the processor. I/O bus usually is lengthy and slow because it has to match a wide range of I/O devices and it usually connects to the processor-memory bus or backplane bus. Backplane bus receives its name because it was often built into the backplane of the computer--it is an interconnection structure within the chassis. It is designed to allow processors, memory, and I/O devices to coexist on a single bus so it has the cost advantage of having only one single bus for all components. +2 = 16 min. (X:56)

A Computer System with One Bus: Backplane Bus
Processor Memory I/O Devices A single bus (the backplane bus) is used for: Processor to memory communication Communication between I/O devices and memory Advantages: Simple and low cost Disadvantages: slow and the bus can become a major bottleneck Example: IBM PC Here is an example showing a single bus, the backplane bus is used to provide communication between the processor and memory. As well as communication between I/O devices and memory. The advantage here is of course low cost. One disadvantage of this approach is that the bus with so many things attached to it will be lengthy and slow. Furthermore, the bus can become a major communication bottleneck if everybody wants to use the bus at the same time. The IBM PC is an example that uses only a backplane bus for all communication. +2 = 18 min. (X:58)

A Two-Bus System Processor Memory Bus Processor Memory Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus I/O Bus I/O buses tap into the processor-memory bus via bus adaptors: Processor-memory bus: mainly for processor-memory traffic I/O buses: provide expansion slots for I/O devices Apple Macintosh-II NuBus: Processor, memory, and a few selected I/O devices SCCI Bus: the rest of the I/O devices Right before the break, I showed you a system with one bus only. Here is an example using two buses where multiple I/O buses tap into the processor-memory bus via bus adaptors. The Processor-memory bus is used mainly for processor-memory traffic while the I/O buses are used to provide expansion slots for the I/O devices. The Apple Macintosh-II adopts this organization where the NuBus is used to connect processor, memory, and a few selected I/O devices together. The rest of the I/O devices reside on an industry standard bus, the SCCI Bus, which is connected to the NuBus via a bus adaptor. +2 = 25 min. (Y:05)

A Three-Bus System Processor Memory Bus Processor Memory Bus Adaptor Bus Adaptor I/O Bus Backplane Bus Bus Adaptor I/O Bus Finally, in a 3-bus system, a small number of backplane buses (in our example here, just 1) tap into the processor-memory bus. The processor-memory bus is used mainly for processor memory traffic while the I/O buses are connected to the backplane bus via bus adaptors. An advantage of this organization is that the loading on the processor-memory bus is greatly reduced because of the small number of taps into the high-speed processor-memory bus. +1 = 26 min. (Y:06) A small number of backplane buses tap into the processor-memory bus Processor-memory bus is used for processor memory traffic I/O buses are connected to the backplane bus Advantage: loading on the processor bus is greatly reduced

Example Architecture: The SPARCstation 20
Memory SIMMs Memory Controller Memory Bus MBus Disk Slot 1 MBus Slot 1 SBus Slot 3 SBus Tape Slot 0 MBus Slot 0 SBus Slot 2 SBus SBus SCSI Bus MSBI SEC MACIO Keyboard Floppy External Bus & Mouse Disk

Sun’s High Speed I/O Bus:
The Underlying Network SPARCstation 20 Memory Bus Memory Controller Standard I/O Bus: SCSI Bus Processor Bus: MBus Sun’s High Speed I/O Bus: SBus MSBI SEC MACIO Low Speed I/O Bus: External Bus

USB Up to 480 Mbps (a factor of 40 over USB 1.1)
Universal Serial Bus Originally developed in 1995 by a consortium including Compaq, HP, Intel, Lucent, Microsoft, and Philips USB 1.1 supports Low-speed devices (1.5 Mbps) Full-speed devices (12 Mbps) USB 2.0 supports High-speed devices Up to 480 Mbps (a factor of 40 over USB 1.1) Uses the same connectors Transmission speed is negotiated on device-by-device basis

USB (cont’d) Hubs can be used to expand Upstream port Downstream ports

Synchronous and Asynchronous Bus
Includes a clock in the control lines A fixed protocol for communication that is relative to the clock Advantage: involves very little logic and can run very fast Disadvantages: Every device on the bus must run at the same clock rate To avoid clock skew, they cannot be long if they are fast Asynchronous Bus: It is not clocked It can accommodate a wide range of devices It can be lengthened without worrying about clock skew It requires a handshaking protocol There are substantial differences between the design requirements for the I/O buses and processor-memory buses and the backplane buses. Consequently, there are two different schemes for communication on the bus: synchronous and asynchronous. Synchronous bus includes a clock in the control lines and a fixed protocol for communication that is relative to the clock. Since the protocol is fixed and everything happens with respect to the clock, it involves very logic and can run very fast. Most processor-memory buses fall into this category. Synchronous buses have two major disadvantages: (1) every device on the bus must run at the same clock rate. (2) And if they are fast, they must be short to avoid clock skew problem. By definition, an asynchronous bus is not clocked so it can accommodate a wide range of devices at different clock rates and can be lengthened without worrying about clock skew. The draw back is that it can be slow and more complex because a handshaking protocol is needed to coordinate the transmission of data between the sender and receiver. +2 = 28 min. (Y:08)

A Handshaking Protocol
ReadReq 1 2 3 Data Address Data 2 4 6 5 Ack 6 7 4 DataRdy Three control lines ReadReq: indicate a read request for memory Address is put on the data lines at the same line DataRdy: indicate the data word is now ready on the data lines Data is put on the data lines at the same time Ack: acknowledge the ReadReq or the DataRdy of the other party Here is a simple handshaking protocol example that uses three control lines: (a) ReadReq indicates a read request for memory. When the sender asserts this signal, the address is put on the data lines at the same time. (b) DataRdy indicates the data word is now ready on the data lines. When the data sender asserts this signal, it must also put the data on the data lines simultaneously. (c) Ack is used to acknowledge the ReadReq or the DataRdy signal of the other party. So here it is how it works. Assume the processor wants to read something from memory. The signal in red is driven by the processor while the signal in black is driven by memory. (1) The processor initiates the bus transaction by asserting the ReadReq line and put the address it wants to read on the data lines at the same time. (2) The memory upon seeing the ReadReq latch in the address asserts the Ack signal to (3) tell the processor: “I got it” so the it can disassert ReadReq as well as the Data lines. (4) Upon seeing ReadReq goes low, the memory knows the processor is off the bus so it disasserts the Ack signal. (5) When the data being requested is ready, the memory will put the data on the Data lines and asserts the DataRdy signal. (6) Upon seeing the DataRdy signal, the processor will latch in the data on the data line and then assert the Ack to tell the memory it is now OK to get off the bus. (7) Once the processor sees DataRdy line go low, it knows the memory has gotten off the bus so it disasserts the ACK signal so another bus transaction can start. +3 = 31 min. (Y:11)

Increasing the Bus Bandwidth
Separate versus multiplexed address and data lines: Address and data can be transmitted in one bus cycle if separate address and data lines are available Cost: (a) more bus lines, (b) increased complexity Data bus width: By increasing the width of the data bus, transfers of multiple words require fewer bus cycles Example: SPARCstation 20’s memory bus is 128 bit wide Cost: more bus lines Block transfers: Allow the bus to transfer multiple words in back-to-back bus cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request Our handshaking example in the previous slide used the same wires to transmit the address as well as data. The advantage is saving in signal wires. The disadvantage is that it will take multiple cycles to transmit address and data. By having separate lines for addresses and data, we can increase the bus bandwidth by transmitting address and data in the same cycle at the cost of more bus lines and increased complexity. This (1st bullet) is one way to increase bus bandwidth. Another way is to increase the width of the data bus so multiple words can be transferred in a single cycle. For example, the SPARCstation memory bus is 128 bits of 16 bytes wide. The cost of this approach is more bus lines. Finally, we can also increase the bus bandwidth by allowing the bus to transfer multiple words in back-to-back bus cycles without sending an address or releasing the bus. The cost of this last approach is an increase of complexity in the bus controller as well as a decease in response time for other parties who want to get onto the bus. +2 = 33 min. (Y:13)

Obtaining Access to the Bus
Control: Master initiates requests Bus Master Bus Slave Data can go either way One of the most important issues in bus design: How is the bus reserved by a devices that wishes to use it? Chaos is avoided by a master-slave arrangement: Only the bus master can control access to the bus: It initiates and controls all bus requests A slave responds to read and write requests The simplest system: Processor is the only bus master All bus requests must be controlled by the processor Major drawback: the processor is involved in every transaction Taking about trying to get onto the bus: how does a device get onto the bus anyway? If everybody tries to use the bus at the same time, chaos will result. Chaos is avoided by a maser-slave arrangement where only the bus master is allow to initiate and control bus requests. The slave has no control over the bus. It just responds to the master’s response. Pretty sad. In the simplest system, the processor is the one and ONLY one bus master and all bus requests must be controlled by the processor. The major drawback of this simple approach is that the processor needs to be involved in every bus transaction and can use up too many processor cycles. +2 = 35 min. (Y:15)

Multiple Potential Bus Masters: the Need for Arbitration
Bus arbitration scheme: A bus master wanting to use the bus asserts the bus request A bus master cannot use the bus until its request is granted A bus master must signal to the arbiter after finish using the bus Bus arbitration schemes usually try to balance two factors: Bus priority: the highest priority device should be serviced first Fairness: Even the lowest priority device should never be completely locked out from the bus Bus arbitration schemes can be divided into four broad classes: Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus. Distributed arbitration by collision detection: Ethernet uses this. Daisy chain arbitration: see next slide. Centralized, parallel arbitration: see next-next slide A more aggressive approach is to allow multiple potential bus masters in the system. With multiple potential bus masters, a mechanism is needed to decide which master gets to use the bus next. This decision process is called bus arbitration and this is how it works. A potential bus master (which can be a device or the processor) wanting to use the bus first asserts the bus request line and it cannot start using the bus until the request is granted. Once it finishes using the bus, it must tell the arbiter that it is done so the arbiter can allow other potential bus master to get onto the bus. All bus arbitration schemes try to balance two factors: bus priority and fairness. Priority is self explanatory. Fairness means even the device with the lowest priority should never be completely locked out from the bus. Bus arbitration schemes can be divided into four broad classes. In the fist one: (a) Each device wanting the bus places a code indicating its identity on the bus. (b) By examining the bus, the device can determine the highest priority device that has made a request and decide whether it can get on. In the second scheme, each device independently requests the bus and collision will result in garbage on the bus if multiple request occurs simultaneously. Each device will detect whether its request result in a collision and if it does, it will back off for an random period of time before trying again. The Ethernet you use for your workstation uses this scheme. We will talk about the 3rd and 4th schemes in the next two slides. +3 = 38 min. (Y:18)

The Daisy Chain Bus Arbitrations Scheme
Device 1 Highest Priority Device N Lowest Priority Device 2 Grant Grant Grant Release Bus Arbiter Request Advantage: simple Disadvantages: Cannot assure fairness: A low-priority device may be locked out indefinitely The use of the daisy chain grant signal also limits the bus speed The daisy chain arbitration scheme got its name from the structure for the grant line which chains through each device from the highest priority to the lowest priority. The higher priority device will pass the grant line to the lower priority device ONLY if it does not want it so priority is built into the scheme. The advantage of this scheme is simple. The disadvantages are: (a) It cannot assure fairness. A low priority device may be locked out indefinitely. (b) Also, the daisy chain grant line will limit the bus speed. +1 = 39 min. (Y:19)

Centralized Arbitration with a Bus Arbiter
ReqA GrantA ReqB Arbiter Highest priority: ReqA Lowest Priority: ReqB GrantB ReqC GrantC Clk Clk ReqA ReqB In the centralized, parallel arbitration scheme, the devices independently request the bus by using multiple request lines. A centralized arbiter chooses from among the devices requesting bus access and notifies the selected device that it is now the bus master via one of the grant line. Here is an example with A has the highest priority and C the lowest. Since A has the highest priority, Grant A will be asserted even though both requests A and B are asserted. Device A will keep Request A asserted until it no longer needs the bus so when Request A goes low, the arbiter will disassert Grant A. Since Request B remains asserted (Device B has not gotten the bus yet) at this time, the arbiter will then asserts Grant B to grant Device B access to the bus. Similarly, Device B will not disassert Request B until it is done with the bus. +2 = 41 min. (Y:21) GrA GrB

Simple Implementation of a Bus Arbiter
SetGrA ReqA G0 J GrantA P0 ReqA Q K Priority Clk ReqB 3-bit D Register P1 SetGrB G1 J GrantB ReqC ReqB Q P2 K Clk SetGrC EN G2 J GrantC ReqC Q Clk K Clk This is how the arbiter we used on the last slide is implemented. First, let’s look at the priority logic. +1 = 42 min. (Y:22)

Priority Logic P0 G0 P1 G1 G2 P2 EN
It is pretty simple. If Enable is not asserted, then all bets are off. G0, 1, and 2 will all be 0s. If Enable is 1, then G0 will be 1 as long as P0 is 1. G1, however, will be 1 if and only if P1 is 1 and P0 is 0. Finally, G2 will be 1 if and only if P2 is 1 and BOTH P0 and P1 are zeros. +1 = 43 min. (Y:23) EN

JK Flip Flop JK Flip Flop can be implemented with a D-Flip Flop J J K
Q(t-1) Q(t) Q 1 1 D 1 x 1 x 1 K Q 1 1 1 1 1 1 In case you forget how a J-K flip flops works, here is a reminder. If both J and K are zeros, the flip flop keeps its old value. The flip flop is set to zero if K is 1 and set to 1 if J is 1. If both J and K are asserted, the flip flop toggles its values. If you go through the Kanaugh map, you can convince yourself that if you only have D flip flops, you can implement a J/K flip flop by placing this simple logic in front of a D flip flop. +1 = 44 min. (Y:24)

Simple Implementation of a Bus Arbiter
SetGrA ReqA G0 J GrantA P0 ReqA Q K Priority Clk ReqB 3-bit D Register P1 SetGrB G1 J GrantB ReqC ReqB Q P2 K Clk SetGrC EN G2 J GrantC ReqC Q Clk K Clk So this is how it works. If any of the Grant lines are high, the Enable signal is low so the output of the Priority logic are all zeros. We are not accepting any new request. When all the Grant lines are low, then the highest priority request line that is asserted will cause its corresponding SetGr (points to SetGrA) signal to go high and set the J/K FF to 1. Once the J/G flip flop is set to 1, the priority circuit will be disabled. When do we disassert the J/K flip flop? When the corresponding Request line goes low. +2 = 46 min. (Y:26)

Responsibilities of the Operating System
The operating system acts as the interface between: The I/O hardware and the program that requests I/O Three characteristics of the I/O systems: The I/O system is shared by multiple program using the processor I/O systems often use interrupts (external generated exceptions) to communicate information about I/O operations. Interrupts must be handled by the OS because they cause a transfer to supervisor mode The low-level control of an I/O device is complex: Managing a set of concurrent events The requirements for correct device control are very detailed The OS acts as the interface between the I/O hardware and the program that requests I/O. The responsibilities of the operating system arise from 3 characteristics of the I/O systems: (a) First the I/O system is shared by multiple programs using the processor. (b) I/O system, as I will show you, often use interrupts to communicate information about I/O operation and interrupt must be handled by the OS. (c) Finally, the low-level control of an I/O device is very complex so we should leave to those crazy kernel programers to handle them. +1 = 52 min. (Y:32)

Operating System Requirements
Provide protection to shared I/O resources Guarantees that a user’s program can only access the portions of an I/O device to which the user has rights Provides abstraction for accessing devices: Supply routines that handle low-level device operation Handles the interrupts generated by I/O devices Provide equitable access to the shared I/O resources All user programs must have equal access to the I/O resources Schedule accesses in order to enhance system throughput Here is a list of the function the OS must provide. First it must guarantee that a user’s program can only access the portion of an I/O device that it has the right to do so. Then the OS must hide low level complexity from the user by suppling routines that handle low-level device operation. The OS also needs to handle the interrupts generated by I/O devices. And the OS must be be fair: all user programs must have equal access to the I/O resources. Finally, the OS needs to schedule accesses in a way that system throughput is enhanced. +1 = 53 min. (Y:33)

OS and I/O Systems Communication Requirements
The Operating System must be able to prevent: The user program from communicating with the I/O device directly If user programs could perform I/O directly: Protection to the shared I/O resources could not be provided Three types of communication are required: The OS must be able to give commands to the I/O devices The I/O device must be able to notify the OS when the I/O device has completed an operation or has encountered an error Data must be transferred between memory and an I/O device The OS must be able to communicate with the I/O system but at the same time, the OS must be able to prevent the user from communicating with the I/O device directly. Why? Because if user programs could perform I/O directly, we would not be able to provide protection to the shared I/O device. Three types of communications are required: (1) First the OS must be able to give commands to the I/O devices. (2) Secondly, the device e must be able to notify the OS when the I/O device has completed an operation or has encountered an error. (3) Data must be transferred between memory and an I/O device. +2 = 55 min. (Y:35)

Giving Commands to I/O Devices
Two methods are used to address the device: Special I/O instructions Memory-mapped I/O Special I/O instructions specify: Both the device number and the command word Device number: the processor communicates this via a set of wires normally included as part of the I/O bus Command word: this is usually send on the bus’s data lines Memory-mapped I/O: Portions of the address space are assigned to I/O device Read and writes to those addresses are interpreted as commands to the I/O devices User programs are prevented from issuing I/O operations directly: The I/O address space is protected by the address translation How do the OS give commands to the I/O device? There are two methods. Special I/O instructions and memory-mapped I/O. If special I/O instructions are used, the OS will use the I/O instruction to specify both the device number and the command word. The processor then executes the special I/O instruction by passing the device number to the I/O device (in most cases) via a set of control lines on the bus and at the same time sends the command to the I/O device using the bus’s data lines. Special I/O instructions are not used that widely. Most processors use memory-mapped I/O where portions of the address space are assigned to the I/O device. Read and write to this special address space are interpreted by the memory controller as I/O commands and the memory controller will do right thing to communicate with the I/O device Why is memory-mapeed I/O so popular? Well, it is popular because we can use the same protection mechanism we already implemented for virtual memory to prevent the user from issuing commands to the I/O device directly. +2 = 57 min. (Y:37)

I/O Device Notifying the OS
The OS needs to know when: The I/O device has completed an operation The I/O operation has encountered an error This can be accomplished in two different ways: Polling: The I/O device put information in a status register The OS periodically check the status register I/O Interrupt: Whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. After the OS has issued a command to the I/O device either via a special I/O instruction or by writing to a location in the I/O address space, the OS needs to be notified when: (a) The I/O device has completed the operation. (b) Or when the I/O device has encountered an error. This can be accomplished in two different ways: Polling and I/O interrupt. +1 = 58 min. (Y:38)

Polling: Programmed I/O
CPU IOC device Memory Is the data ready? busy wait loop not an efficient way to use the CPU unless the device is very fast! no yes read data but checks for I/O completion can be dispersed among computation intensive code store data In Polling, the I/O device put information in a status register and the OS periodically check it (the busy loop) to see if the data is ready or if an error condition has occurred. If the data is ready, fine: read the data and move on. If not, we stay in this loop and try again at a later time. The advantage of this approach is simple: the processor is totally in control and does all the work but the processor in total control is also the problem. Needless to say, polling overhead can consume a lot of CPU time if the device is very fast. For this reason (Disadvantage), most I/O devices notify the processor via I/O interrupt. +2 = 60 min. (Y:40) done? no yes Advantage: Simple: the processor is totally in control and does all the work Disadvantage: Polling overhead can consume a lot of CPU time

Interrupt Driven Data Transfer
add sub and or nop CPU IOC device Memory user program (1) I/O interrupt (2) save PC (3) interrupt service addr read store ... rti interrupt service routine : (4) memory Advantage: User program progress is only halted during actual transfer Disadvantage, special hardware is needed to: Cause an interrupt (I/O device) Detect an interrupt (processor) Save the proper states to resume after the interrupt (processor) That is, whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. This is how an I/O interrupt looks in the overall scheme of things. The processor is minding its business when one of the I/O device wants its attention and causes an I/O interrupt. The processor then save the current PC, branch to the address where the interrupt service routine resides, and start executing the interrupt service routine. When it finishes executing the interrupt service routine, it branches back to the point of the original program where we stop and continue. The advantage of this approach is efficiency. The user program’s progress is halted only during actual transfer. The disadvantage is that it require special hardware in the I/O device to generate the interrupt. And on the processor side, we need special hardware to detect the interrupt and then to save the proper states so we can resume after the interrupt. +2 = 62 min. (Y:42)

I/O Interrupt An I/O interrupt is just like the exceptions except:
An I/O interrupt is asynchronous Further information needs to be conveyed An I/O interrupt is asynchronous with respect to instruction execution: I/O interrupt is not associated with any instruction I/O interrupt does not prevent any instruction from completion You can pick your own convenient point to take an interrupt I/O interrupt is more complicated than exception: Needs to convey the identity of the device generating the interrupt Interrupt requests can have different urgencies: Interrupt request needs to be prioritized How does an I/O interrupt different from the exception you already learned? Well, an I/O interrupt is asynchronous with respect to the instruction execution while exception such as overflow or page fault are always associated with a certain instruction. Also for exception, the only information needs to be conveyed is the fact that an exceptional condition has occurred but for interrupt, there is more information to be conveyed. Let me elaborate on each of these two points. Unlike exception, which is always associated with an instruction, interrupt is not associated with any instruction. The user program is just doing its things when an I/O interrupt occurs. So I/O interrupt does not prevent any instruction from completing so you can pick your own convenient point to take the interrupt. As far as conveying more information is concerned, the interrupt detection hardware must somehow let the OS know who is causing the interrupt. Furthermore, interrupt requests needs to be prioritized. The hardware that can do all these looks like this. +2 = 64 min. (Y:44)

Interrupt Logic : : Detect and synchronize interrupt requests
Ignore interrupts that are disabled (masked off) Rank the pending interrupt requests Create interrupt microsequence address Provide select signals for interrupt microsequence Synchronizer Circuits uSeq. addr & select logic Interrupt Priority Network Async interrupt requests : : First the interrupt requests have to pass through a synchronizer circuit, basically two flip flops in series, to avoid getting into meta-stability. Then based on the contents of the interrupt mask registers, all the interrupts that are currently disabled are ignored. The rest of the interrupt requests are then prioritized and we will generate the interrupt target address based on the highest priority interrupt we received. +2 = 66 min. (Y:46) Interrupt Mask Reg Sync. Inputs Async. Inputs D Q Q D Clk Clk

Program Interrupt/Exception Hardware
Hardware interrupt services: Save the PC (or PCs in a pipelined machine) Inhibit the interrupt that is being handled Branch to interrupt service routine Options: Save status, save registers, save interrupt information Change status, change operating modes, get interrupt info. A “good thing” about interrupt: Asynchronous: not associated with a particular instruction Pick the most convenient place in the pipeline to handle it Besides detecting the interrupt and generate the interrupt target address, what else does the hardware needs to do. Well, at the minimum, the processor has to save the program counter or in a pipeline machine with delayed branch, you may have to save multiple program counters. To prevent recursion within a interrupt service routine, the processor must also disable further interrupt from the device it is servicing. This can be done easily by turning the interrupt bit in the mask register off. Then, the processor also has to branch to the interrupt target address. These (first three) are the things the hardware must do. Here are a list of optional things. Well, as I said before, one good thing about the interrupt is that it is asynchronous so you can pick the most convenient place in the pipeline to take it. +2 = 68 min. (Y:48)

Programmer’s View main program
interrupts request (e.g., from keyboard) (1) Add (2) Save PC and “branch” to interrupt target address Div Sub Save processor status/state Service the (keyboard) interrupt Restore processor status/state (3) get PC Let me show you what I mean. For example here, the CPU receives a keyboard interrupt request in the middle of the Divide instruction. If the divide instruction causes an divide by 0 exception , we have to handle it immediately. But here we have an interrupt, we can say, well, I have been spending all this time doing this divide and is almost finish, let’s wait. So the interrupt is not taken until after the divide instruction has completed. As far as generating the interrupt target address is concerned, we also have two options. First is the low cost version. We make all interrupts go to a common address and let the software to figure out what to do next. Then for those you like to built complicated hardware, you can make the hardware automatically branch to different addresses based on the interrupt type and/or level. This is called vectored interrupt. +2 = 70 min. (Y:50) Interrupt target address options: General: Branch to a common address for all interrupts Software then decode the cause and figure out what to do Specific: Automatically branch to different addresses based on interrupt type and/or level--vectored interrupt

Delegating I/O Responsibility from the CPU: DMA
CPU sends a starting address, direction, and length count to DMAC. Then issues "start". Direct Memory Access (DMA): External to the CPU Act as a maser on the bus Transfer blocks of data to or from memory without CPU intervention CPU Memory DMAC IOC Finally, lets see how we can delegate some of the I/O responsibilities from the CPU. The first option is Direct Memory Access which take advantage of the fact that I/O events often involve block transfer: you are not going to access the disk 1 byte at a time. The DMA controller is external to the processor and can acts as a bus master to transfer blocks of data to or from memory and the I/O device without CPU intervention. This is how it works. The CPU sends the starting address, the direction and length of the transfer to the DMA controller and issues a start command. The DMA controller then take over from there and provides handshake signals required (point to the last text block) to complete the entire block transfer. So the DMA controller are pretty intelligent. If you add more intelligent to the DMA controller, you will end up with a IO processor or IOP for short. +2 = 72 min. (Y:52) device DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory.

Delegating I/O Responsibility from the CPU: IOP
main memory bus Mem Dn I/O bus target device where cmnds are OP Device Address CPU IOP (1) Issues instruction to IOP (4) IOP interrupts CPU when done IOP looks in memory for commands (2) The IOP is so smart that the CPU only needs to issue a simple instruction (Op, Device, Address) that tells them what is the target device and where to find more commands (Addr). The IOP will then fetch commands such as this (OP, Addr, Cnt, Other) from memory and do all the necessary data transfer between the I/O device and the memory system. The IOP will do the transfer at the background and it will not affect the CPU because it will access the memory only when the CPU is not using it: this is called stealing memory cycles. Only when the IOP finishes its operation will it interrupts the CPU. +2 = 74 min. (Y:54) OP Addr Cnt Other (3) memory what to do special requests Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles. where to put data how much

Summary: Three types of buses: Processor-memory buses I/O buses
Backplane buses Bus arbitration schemes: Daisy chain arbitration: it cannot assure fairness Centralized parallel arbitration: requires a central arbiter I/O device notifying the operating system: Polling: it can waste a lot of processor time I/O interrupt: similar to exception except it is asynchronous Delegating I/O responsibility from the CPU Direct memory access (DMA) I/O processor (IOP) Let’s summarize what we learned today. First we talked about three types of buses: the processor-memory bus, which is usually the shortest and fastest. The I/O bus, which has to deal with a large range of I/O devices, is usually the longest and slowest. The backplane bus is a general interconnect built into the chassis of the machine. The processor-memory bus, which runs at high speed, usually is synchronous while the I/O and backplane buses can either be synchronous or asynchronous. As far as bus arbitration schemes are concerned, I showed you two in details. The daisy chain scheme is simple, but it is slow and cannot assure fairness--that is a low priority device may never get to use the bus at all. The centralized parallel arbitration scheme is faster but it requires a centralized arbiter so I also show you how to build a simple arbiter using simple AND gates and JK flip flops. When we talked about OS’s role, we discussed two ways an I/O device can notify the operating system when data is ready or something goes wrong. Polling is simple to implement but it may end up wasting a lot of processor cycle. I/O interrupt is similar to exception but it is asynchronous with respect to instruction execution so we can pick our own convenient point in the pipeline to handle it. Finally we talked about two ways you can delegate I/O responsibility from the CPU: Direct memory access and IO processor. +3 = 77 min. (Y:57)

Where to get more information?
Happy trail ... Shing Kong (Kong) 1047 Noel Dr., Apt. #4 Menlo Park, CA 94025 (w) Until we meet again :-) Well, today is the last day I teach this class so I want to thank everybody, the TAs, and Professor Patterson for making this an enjoyable experience. Some of my pilot friends ask me how is teaching this class compare to teaching flying? Well the biggest difference is that if you guys have a bad day, I will just give you a bad grade but when one of the student pilots has a bad day, he or she can kill me. But all jokes aside, I have a lot of fun teaching this class and I hope you guys learn something from me since I sure learn a lot from teaching you. As we said down in the Silicon valley, it is a small valley so I am sure we will run into each other again. If you guys need any help or career advice, feel free to contact me. Good luck and take care. +3 = 80 min. (Z:00)

CpE 242 Computer Architecture and Engineering Busses and OS’s Responsibilities Start: X:40.

Similar presentations

Presentation on theme: "CpE 242 Computer Architecture and Engineering Busses and OS’s Responsibilities Start: X:40."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CpE 242 Computer Architecture and Engineering Busses and OS’s Responsibilities Start: X:40.

Similar presentations

Presentation on theme: "CpE 242 Computer Architecture and Engineering Busses and OS’s Responsibilities Start: X:40."— Presentation transcript:

Similar presentations

About project

Feedback