Lecture 25 Buses and I/O (2)

Lecture 25 Buses and I/O (2)
CS : Computer Architecture Lecture 25 Buses and I/O (2) November 17, 2007 Nael Abu-Ghazaleh

What is a bus? Slow vehicle that many people ride together
well, true... A bunch of wires...

A Bus is: shared communication link
single set of wires used to connect multiple subsystems A Bus is also a fundamental tool for composing large, complex systems systematic means of abstraction Control Datapath Memory Processor Input Output

Advantages of Buses Versatility: Low Cost:
Processor I/O Device I/O Device I/O Device Memory Versatility: New devices can be added easily Peripherals can be moved between computer systems that use the same bus standard Low Cost: A single set of wires is shared in multiple ways The two major advantages of the bus organization are versatility and low cost. By versatility, we mean new devices can easily be added. Furthermore, if a device is designed according to a industry bus standard, it can be move between computer systems that use the same bus standard. The bus organization is a low cost solution because a single set of wires is shared in multiple ways.

Disadvantage of Buses It creates a communication bottleneck
Processor I/O Device I/O Device I/O Device Memory It creates a communication bottleneck The bandwidth of that bus can limit the maximum I/O throughput The maximum bus speed is largely limited by: The length of the bus and the number of devices on the bus The need to support a range of devices with: Widely varying latencies Widely varying data transfer rates The major disadvantage of the bus organization is that it creates a communication bottleneck. When I/O must pass through a single bus, the bandwidth of that bus can limit the maximum I/O throughput. The maximum bus speed is also largely limited by: (a) The length of the bus. (b) The number of I/O devices on the bus. (C) And the need to support a wide range of devices with a widely varying latencies and data transfer rates.

The General Organization of a Bus
Control Lines Data Lines Control lines: Signal requests and acknowledgments Indicate what type of information is on the data lines Data lines carry information between the source and the destination: Data and Addresses Complex commands A bus generally contains a set of control lines and a set of data lines. The control lines are used to signal requests and acknowledgments and to indicate what type of information is on the data lines. The data lines carry information between the source and the destination. This information may consists of data, addresses, or complex commands. A bus transaction includes two parts: (a) sending the address and (b) then receiving or sending the data.

Master versus Slave A bus transaction includes two parts:
Master issues command Bus Master Bus Slave Data can go either way A bus transaction includes two parts: Issuing the command (and address) – request Transferring the data – action Master is the one who starts the bus transaction by: issuing the command (and address) Slave is the one who responds to the address by: Sending data to the master if the master ask for data Receiving data from the master if the master wants to send data The bus master is the one who starts the bus transaction by sending out the address. The slave is the one who responds to the master by either sending data to the master if the master asks for data. Or the slave may end up receiving data from the master if the master wants to send data. In most simple I/O operations, the processor will be the bus master but as we shall see later, this is not always be the case.

Types of Buses Processor-Memory Bus (design specific)
Short and high speed Only need to match the memory system Maximize memory-to-processor bandwidth Connects directly to the processor Optimized for cache block transfers I/O Bus (industry standard) Usually is lengthy and slower Need to match a wide range of I/O devices Connects to the processor-memory bus or backplane bus Backplane Bus (standard or proprietary) Backplane: an interconnection structure within the chassis Allow processors, memory, and I/O devices to coexist Cost advantage: one bus for all components Buses are traditionally classified as one of 3 types: processor memory buses, I/O buses, or backplane buses. The processor memory bus is usually design specific while the I/O and backplane buses are often standard buses. In general processor bus are short and high speed. It tries to match the memory system in order to maximize the memory-to-processor BW and is connected directly to the processor. I/O bus usually is lengthy and slow because it has to match a wide range of I/O devices and it usually connects to the processor-memory bus or backplane bus. Backplane bus receives its name because it was often built into the backplane of the computer--it is an interconnection structure within the chassis. It is designed to allow processors, memory, and I/O devices to coexist on a single bus so it has the cost advantage of having only one single bus for all components.

A Computer System with One Bus: Backplane Bus
Processor Memory I/O Devices A single bus (the backplane bus) is used for: Processor to memory communication Communication between I/O devices and memory Advantages: Simple and low cost Disadvantages: slow and the bus can become a major bottleneck Example: IBM PC - AT Here is an example showing a single bus, the backplane bus is used to provide communication between the processor and memory. As well as communication between I/O devices and memory. The advantage here is of course low cost. One disadvantage of this approach is that the bus with so many things attached to it will be lengthy and slow. Furthermore, the bus can become a major communication bottleneck if everybody wants to use the bus at the same time. The IBM PC is an example that uses only a backplane bus for all communication.

A Two-Bus System I/O buses tap into the memory bus via bus adaptors:
Processor Memory Bus Processor Memory Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus I/O Bus I/O buses tap into the memory bus via bus adaptors: Processor-memory bus: mainly for processor-memory traffic I/O buses: provide expansion slots for I/O devices Apple Macintosh-II NuBus: Processor, memory, and a few selected I/O devices SCCI Bus: the rest of the I/O devices Here is an example using two buses where multiple I/O buses tap into the processor-memory bus via bus adaptors. The Processor-memory bus is used mainly for processor-memory traffic while the I/O buses are used to provide expansion slots for the I/O devices. The Apple Macintosh-II adopts this organization where the NuBus is used to connect processor, memory, and a few selected I/O devices together. The rest of the I/O devices reside on an industry standard bus, the SCCI Bus, which is connected to the NuBus via a bus adaptor. +2 = 25 min. (Y:05)

A Three-Bus System Processor Memory Bus Processor Memory Bus Adaptor Bus Adaptor I/O Bus Backplane Bus Bus Adaptor I/O Bus A small number of backplane buses tap into the processor-memory bus system bus is used for processor memory traffic I/O buses are connected to the backplane bus Advantage: loading on the system bus greatly reduced Finally, in a 3-bus system, a small number of backplane buses (in our example here, just 1) tap into the processor-memory bus. The processor-memory bus is used mainly for processor memory traffic while the I/O buses are connected to the backplane bus via bus adaptors. An advantage of this organization is that the loading on the processor-memory bus is greatly reduced because of the small number of taps into the high-speed processor-memory bus.

What defines a bus? Transaction Protocol
Timing and Signaling Specification Bunch of Wires Electrical Specification Physical / Mechanical Characterisics – the connectors

Synchronous and Asynchronous Bus
Includes a clock in the control lines A fixed protocol for communication that is relative to the clock Advantage: involves very little logic and can run very fast Disadvantages: Every device on the bus must run at the same clock rate To avoid clock skew, they cannot be long if they are fast Asynchronous Bus: It is not clocked It can accommodate a wide range of devices It can be lengthened without worrying about clock skew It requires a handshaking protocol There are substantial differences between the design requirements for the I/O buses and processor-memory buses and the backplane buses. Consequently, there are two different schemes for communication on the bus: synchronous and asynchronous. Synchronous bus includes a clock in the control lines and a fixed protocol for communication that is relative to the clock. Since the protocol is fixed and everything happens with respect to the clock, it involves very logic and can run very fast. Most processor-memory buses fall into this category. Synchronous buses have two major disadvantages: (1) every device on the bus must run at the same clock rate. (2) And if they are fast, they must be short to avoid clock skew problem. By definition, an asynchronous bus is not clocked so it can accommodate a wide range of devices at different clock rates and can be lengthened without worrying about clock skew. The draw back is that it can be slow and more complex because a handshaking protocol is needed to coordinate the transmission of data between the sender and receiver.

Busses so far Master Slave ° ° ° Control Lines Address Lines
Data Lines Bus Master: has ability to control the bus, initiates transaction Bus Slave: module activated by the transaction Bus Communication Protocol: specification of sequence of events and timing requirements in transferring information. Asynchronous Bus Transfers: control lines (req, ack) serve to orchestrate sequencing. Synchronous Bus Transfers: sequence relative to common clock.

Bus Transaction Arbitration Request Action

Arbitration: Obtaining Access to the Bus
Control: Master initiates requests Bus Master Bus Slave Data can go either way One of the most important issues in bus design: How is the bus reserved by a devices that wishes to use it? Chaos is avoided by a master-slave arrangement: Only the bus master can control access to the bus: It initiates and controls all bus requests A slave responds to read and write requests The simplest system: Processor is the only bus master All bus requests must be controlled by the processor Major drawback: the processor is involved in every transaction Taking about trying to get onto the bus: how does a device get onto the bus anyway? If everybody tries to use the bus at the same time, chaos will result. Chaos is avoided by a maser-slave arrangement where only the bus master is allowed to initiate and control bus requests. The slave has no control over the bus. It just responds to the master’s response. Pretty sad. In the simplest system, the processor is the one and ONLY one bus master and all bus requests must be controlled by the processor. The major drawback of this simple approach is that the processor needs to be involved in every bus transaction and can use up too many processor cycles.

Multiple Potential Bus Masters: Arbitration
Bus arbitration scheme: A bus master wanting to use the bus asserts the bus request A bus master cannot use the bus until its request is granted A bus master must signal to the arbiter after finish using the bus Bus arbitration schemes balance two factors: Bus priority: the highest priority device serviced first Fairness: Even the lowest priority device should never be completely locked out from the bus Bus arbitration schemes divided into four classes: Daisy chain arbitration: single device with all request lines. Centralized, parallel arbitration: see next-next slide Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus. Distributed arbitration by collision detection: Ethernet A more aggressive approach is to allow multiple potential bus masters in the system. With multiple potential bus masters, a mechanism is needed to decide which master gets to use the bus next. This decision process is called bus arbitration and this is how it works. A potential bus master (which can be a device or the processor) wanting to use the bus first asserts the bus request line and it cannot start using the bus until the request is granted. Once it finishes using the bus, it must tell the arbiter that it is done so the arbiter can allow other potential bus master to get onto the bus. All bus arbitration schemes try to balance two factors: bus priority and fairness. Priority is self explanatory. Fairness means even the device with the lowest priority should never be completely locked out from the bus. Bus arbitration schemes can be divided into four broad classes. In the fist one: (a) Each device wanting the bus places a code indicating its identity on the bus. (b) By examining the bus, the device can determine the highest priority device that has made a request and decide whether it can get on. In the second scheme, each device independently requests the bus and collision will result in garbage on the bus if multiple request occurs simultaneously. Each device will detect whether its request result in a collision and if it does, it will back off for an random period of time before trying again. The Ethernet you use for your workstation uses this scheme. We will talk about the 3rd and 4th schemes in the next two slides.

The Daisy Chain Bus Arbitrations Scheme
Device 1 Highest Priority Device N Lowest Priority Device 2 Grant Grant Grant Release Bus Arbiter Request wired-OR Advantage: simple Disadvantages: Cannot assure fairness: A low-priority device may be locked out indefinitely The use of the daisy chain grant signal also limits the bus speed The daisy chain arbitration scheme got its name from the structure for the grant line which chains through each device from the highest priority to the lowest priority. The higher priority device will pass the grant line to the lower priority device ONLY if it does not want it so priority is built into the scheme. The advantage of this scheme is simple. The disadvantages are: (a) It cannot assure fairness. A low priority device may be locked out indefinitely. (b) Also, the daisy chain grant line will limit the bus speed.

Centralized Parallel Arbitration
Device 1 Device N Device 2 Grant Req Bus Arbiter Used in essentially all processor-memory busses and in high-speed I/O busses

Simplest bus paradigm All agents operate syncronously
All can source / sink data at same rate => simple protocol just manage the source and target

Simple Synchronous Protocol
BReq BG R/W Address Cmd+Addr Data1 Data2 Data Even memory busses are more complex than this memory (slave) may take time to respond it need to control data rate

Typical Synchronous Protocol
BReq BG R/W Address Cmd+Addr Wait Data1 Data1 Data2 Data Slave indicates when it is prepared for data xfer Actual transfer goes at bus rate

Increasing the Bus Bandwidth
Separate versus multiplexed address and data lines: Address and data can be transmitted in one bus cycle if separate address and data lines are available Cost: (a) more bus lines, (b) increased complexity Data bus width: By increasing the width of the data bus, transfers of multiple words require fewer bus cycles Example: SPARCstation 20’s memory bus is 128 bit wide Cost: more bus lines Block transfers: Allow bus to transfer multiple words in back-to-back cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request Our handshaking example in the previous slide used the same wires to transmit the address as well as data. The advantage is saving in signal wires. The disadvantage is that it will take multiple cycles to transmit address and data. By having separate lines for addresses and data, we can increase the bus bandwidth by transmitting address and data in the same cycle at the cost of more bus lines and increased complexity. This (1st bullet) is one way to increase bus bandwidth. Another way is to increase the width of the data bus so multiple words can be transferred in a single cycle. For example, the SPARCstation memory bus is 128 bits of 16 bytes wide. The cost of this approach is more bus lines. Finally, we can also increase the bus bandwidth by allowing the bus to transfer multiple words in back-to-back bus cycles without sending an address or releasing the bus. The cost of this last approach is an increase of complexity in the bus controller as well as a decease in response time for other parties who want to get onto the bus.

Increasing Transaction Rate on Multimaster Bus
Overlapped arbitration perform arbitration for next transaction during current transaction Bus parking master can hold onto bus and performs multiple transactions as long as no other master makes request Overlapped address / data phases requires one of the above techniques Split-phase (or packet switched) bus completely separate address and data phases arbitrate separately for each address phase yield a tag which is matched with data phase ”All of the above” in most modern mem busses

The I/O Bus Problem Designed to support wide variety of devices
full set not known at design time Allow data rate match between arbitrary speed deviced fast processor – slow I/O slow processor – fast I/O

Asynchronous Handshake
Write Transaction Address Data Read Req Ack Master Asserts Address Next Address Master Asserts Data t t t t3 t4 t5 t0 : Master has obtained control and asserts address, direction, data Waits a specified amount of time for slaves to decode target t1: Master asserts request line t2: Slave asserts ack, indicating data received t3: Master releases req t4: Slave releases ack

Read Transaction Address Data Read Req Ack Master Asserts Address
Next Address t t t t3 t4 t5 t0 : Master has obtained control and asserts address, direction, data Waits a specified amount of time for slaves to decode target\ t1: Master asserts request line t2: Slave asserts ack, indicating ready to transmit data t3: Master releases req, data received t4: Slave releases ack

Summary of Bus Options Protocol pipelined Serial
Option High performance Low cost Bus width Separate address Multiplex address & data lines & data lines Data width Wider is faster Narrower is cheaper (e.g., 32 bits) (e.g., 8 bits) Transfer size Multiple words has Single-word transfer less bus overhead is simpler Bus masters Multiple Single master (requires arbitration) (no arbitration) Clocking Synchronous Asynchronous Protocol pipelined Serial

Communicating with I/O Devices
Two methods are used to address the device: Special I/O instructions Memory-mapped I/O Special I/O instructions specify: Both the device number and the command word Device number: the processor communicates this via a set of wires normally included as part of the I/O bus Command word: this is usually send on the bus’s data lines Memory-mapped I/O: Portions of the address space are assigned to I/O device Read and writes to those addresses are interpreted as commands to the I/O devices User programs are prevented from issuing I/O operations directly: The I/O address space is protected by the address translation How does the OS give commands to the I/O device? There are two methods. Special I/O instructions and memory-mapped I/O. If special I/O instructions are used, the OS will use the I/O instruction to specify both the device number and the command word. The processor then executes the special I/O instruction by passing the device number to the I/O device (in most cases) via a set of control lines on the bus and at the same time sends the command to the I/O device using the bus’s data lines. Special I/O instructions are not used that widely. Most processors use memory-mapped I/O where portions of the address space are assigned to the I/O device. Read and write to this special address space are interpreted by the memory controller as I/O commands and the memory controller will do right thing to communicate with the I/O device Why is memory-mapeed I/O so popular? Well, it is popular because we can use the same protection mechanism we already implemented for virtual memory to prevent the user from issuing commands to the I/O device directly.

I/O Device Notifying the OS
The OS needs to know when: The I/O device has completed an operation The I/O operation has encountered an error This can be accomplished in two different ways: Polling: The I/O device put information in a status register The OS periodically check the status register I/O Interrupt: Whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. After the OS has issued a command to the I/O device either via a special I/O instruction or by writing to a location in the I/O address space, the OS needs to be notified when: (a) The I/O device has completed the operation. (b) Or when the I/O device has encountered an error. This can be accomplished in two different ways: Polling and I/O interrupt.

Polling: Programmed I/O
CPU IOC device Memory Is the data ready? busy wait loop not an efficient way to use the CPU unless the device is very fast! no yes read data but checks for I/O completion can be dispersed among computation intensive code store data In Polling, the I/O device puts information in a status register and the OS periodically checks it (the busy loop) to see if the data is ready or if an error condition has occurred. If the data is ready, fine: read the data and move on. If not, we stay in this loop and try again at a later time. The advantage of this approach is simple: the processor is totally in control and does all the work but the processor in total control is also the problem. Needless to say, polling overhead can consume a lot of CPU time if the device is very fast. For this reason (Disadvantage), most I/O devices notify the processor via I/O interrupt. done? no Advantage: Simple: the processor is totally in control and does all the work Disadvantage: Polling overhead can consume a lot of CPU time yes

Interrupt Driven Data Transfer
add sub and or nop CPU IOC device Memory user program (1) I/O interrupt (2) save PC (3) interrupt service addr read store ... rti interrupt service routine : (4) memory Advantage: User program progress is only halted during actual transfer Disadvantage, special hardware is needed to: Cause an interrupt (I/O device) Detect an interrupt (processor) Save the proper states to resume after the interrupt (processor) That is, whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. This is how an I/O interrupt looks in the overall scheme of things. The processor is minding its business when one of the I/O device wants its attention and causes an I/O interrupt. The processor then saves the current PC, branches to the address where the interrupt service routine resides, and start executing the interrupt service routine. When it finishes executing the interrupt service routine, it branches back to the point of the original program where we stop and continue. The advantage of this approach is efficiency. The user program’s progress is halted only during actual transfer. The disadvantage is that it requires special hardware in the I/O device to generate the interrupt. And on the processor side, we need special hardware to detect the interrupt and then to save the proper states so we can resume after the interrupt.

I/O Interrupt An I/O interrupt is just like the exceptions except:
An I/O interrupt is asynchronous Further information needs to be conveyed An I/O interrupt is asynchronous with respect to instruction execution: I/O interrupt is not associated with any instruction I/O interrupt does not prevent any instruction from completion You can pick your own convenient point to take an interrupt I/O interrupt is more complicated than exception: Needs to convey the identity of the device generating the interrupt Interrupt requests can have different urgencies: Interrupt request needs to be prioritized How is an I/O interrupt different from the exception you already learned? Well, an I/O interrupt is asynchronous with respect to the instruction execution while exception such as overflow or page fault are always associated with a certain instruction. Also for exception, the only information needs to be conveyed is the fact that an exceptional condition has occurred but for interrupt, there is more information to be conveyed. Let me elaborate on each of these two points. Unlike exception, which is always associated with an instruction, interrupt is not associated with any instruction. The user program is just doing its things when an I/O interrupt occurs. So I/O interrupt does not prevent any instruction from completing so you can pick your own convenient point to take the interrupt. As far as conveying more information is concerned, the interrupt detection hardware must somehow let the OS know who is causing the interrupt. Furthermore, interrupt requests needs to be prioritized. The hardware that can do all these looks like this.

Delegating I/O Responsibility from the CPU: DMA
CPU sends a starting address, direction, and length count to DMAC. Then issues "start". Direct Memory Access (DMA): External to the CPU Act as a master on the bus Transfer blocks of data to or from memory without CPU intervention CPU Memory DMAC IOC Finally, lets see how we can delegate some of the I/O responsibilities from the CPU. The first option is Direct Memory Access which take advantage of the fact that I/O events often involve block transfer: you are not going to access the disk 1 byte at a time. The DMA controller is external to the processor and can acts as a bus master to transfer blocks of data to or from memory and the I/O device without CPU intervention. This is how it works. The CPU sends the starting address, the direction and length of the transfer to the DMA controller and issues a start command. The DMA controller then take over from there and provides handshake signals required to complete the entire block transfer. So the DMA controller are pretty intelligent. If you add more intelligent to the DMA controller, you will end up with a IO processor or IOP for short. device DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory.

Delegating I/O Responsibility from the CPU: IOP
main memory bus Mem Dn I/O bus target device where cmnds are OP Device Address CPU IOP (1) Issues instruction to IOP (4) IOP interrupts CPU when done IOP looks in memory for commands (2) The IOP is so smart that the CPU only needs to issue a simple instruction (Op, Device, Address) that tells them what is the target device and where to find more commands (Addr). The IOP will then fetch commands such as this (OP, Addr, Cnt, Other) from memory and do all the necessary data transfer between the I/O device and the memory system. The IOP will do the transfer at the background and it will not affect the CPU because it will access the memory only when the CPU is not using it: this is called stealing memory cycles. Only when the IOP finishes its operation will it interrupts the CPU. OP Addr Cnt Other (3) memory what to do special requests Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles. where to put data how much

Responsibilities of the Operating System
The operating system acts as the interface between: The I/O hardware and the program that requests I/O Three characteristics of the I/O systems: The I/O system is shared by multiple program using the processor I/O systems often use interrupts (externally generated exceptions) to communicate information about I/O operations. Interrupts must be handled by the OS because they cause a transfer to supervisor mode The low-level control of an I/O device is complex: Managing a set of concurrent events The requirements for correct device control are very detailed The OS acts as the interface between the I/O hardware and the program that requests I/O. The responsibilities of the operating system arise from 3 characteristics of the I/O systems: (a) First the I/O system is shared by multiple programs using the processor. (b) I/O system, as I will show you, often use interrupts to communicate information about I/O operation and interrupt must be handled by the OS. (c) Finally, the low-level control of an I/O device is very complex so we should leave to those crazy kernel programers to handle them.

Operating System Requirements
Provides protection to shared I/O resources Guarantees that a user’s program can only access the portions of an I/O device to which the user has rights Provides abstraction for accessing devices: Supply routines that handle low-level device operation Handles the interrupts generated by I/O devices Provides equitable access to the shared I/O resources All user programs must have equal access to the I/O resources Schedules accesses in order to enhance system throughput Here is a list of the function the OS must provide. First it must guarantee that a user’s program can only access the portion of an I/O device that it has the right to do so. Then the OS must hide low level complexity from the user by suppling routines that handle low-level device operation. The OS also needs to handle the interrupts generated by I/O devices. And the OS must be be fair: all user programs must have equal access to the I/O resources. Finally, the OS needs to schedule accesses in a way that system throughput is enhanced.

OS and I/O Systems Communication Requirements
The Operating System must be able to prevent: The user program from communicating with the I/O device directly If user programs could perform I/O directly: Protection to the shared I/O resources could not be provided Three types of communication are required: The OS must be able to give commands to the I/O devices The I/O device must be able to notify the OS when the I/O device has completed an operation or has encountered an error Data must be transferred between memory and an I/O device The OS must be able to communicate with the I/O system but at the same time, the OS must be able to prevent the user from communicating with the I/O device directly. Why? Because if user programs could perform I/O directly, we would not be able to provide protection to the shared I/O device. Three types of communications are required: (1) First the OS must be able to give commands to the I/O devices. (2) Secondly, the device e must be able to notify the OS when the I/O device has completed an operation or has encountered an error. (3) Data must be transferred between memory and an I/O device.

Lecture 25 Buses and I/O (2)

Similar presentations

Presentation on theme: "Lecture 25 Buses and I/O (2)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 25 Buses and I/O (2)

Similar presentations

Presentation on theme: "Lecture 25 Buses and I/O (2)"— Presentation transcript:

Similar presentations

About project

Feedback