IO System CPU Performance: 60% per year

IO System CPU Performance: 60% per year
I/O system performance limited by mechanical delays (disk I/O) < 10% per year Amdahl's Law: system speed-up limited by the slowest part! Suppose we have a difference of 10% between CPU time and response time and suppose we speed up the CPU by a factor of 10, while neglecting I/O: We get a speedup of only 5 times! 5x Performance (or a loss of 50% of CPU potential) Suppose we speedup the CPU by a factor of 100, while neglecting I/O we get a speedup of only 10- times, 10x Performance (loosing 90% of CPU potential) A detailed numerical example given in class. I/O bottleneck: Diminishing value of faster CPUs The analogy is with a car: very fast engine will get nowhere if the movement of the wheels is too slow! Ancestor of Java had no I/O CPU vs. Peripheral Primary vs. Secondary What maks portable, PDA exciting?

Motivation: Who Cares About I/O?
Some people still maintain that I/O is really not important for the overall performance. The argument is that I/O Speed does not matter because the CPU can always switch to another process if the running process requests an I/O operation. This argument is valid only in systems where the throughput is the measure of performance! If response time is a critical measure of performance then the argument is no more valid! Response time is critical in Personal computers (only a single user), in workstations since there is only one person (and often time one process) per CPU! Also the price of switching could be very high in terms of storage and switch time. Ancestor of Java had no I/O CPU vs. Peripheral Primary vs. Secondary What maks portable, PDA exciting?

I/O Systems interrupts Processor Cache Memory - I/O Bus Main Memory I/O Controller I/O Controller I/O Controller Graphics Disk Disk Network There are several ways of interfacing I/O devices to the CPU Either through the cache, the memory bus or through a separate I/O bus. In the figure this is a low-cost option the memory bus is the I/O bus.

I/O Interface Adv: less state-data problem Disa: slow
CPU Memory Independent I/O Bus connected through the cache memory bus Interface Interface Seperate I/O instructions (in,out) Adv: less state-data problem Disa: slow Peripheral Peripheral CPU Lines distinguish between I/O and memory transfers common memory & I/O bus 40 Mbytes/sec optimistically 10 MIP processor completely saturates the bus! VME bus Multibus-II Nubus Memory Interface Interface Peripheral Peripheral See one more figure in class: bridge-based bus architecture

Technology Trends Disk Capacity now doubles every 18 months; before
• Today: Processing Power Doubles Every 18 months • Today: Memory Size Doubles Every 18 months(4X/3yr) • Today: Disk Capacity Doubles Every 18 months • Disk Positioning Rate (Seek + Rotate) Doubles Every Ten Years! The I/O GAP

Storage Technology Drivers
Driven by the prevailing computing paradigm 1950s: migration from batch to on-line processing 1990s: migration to ubiquitous computing computers in phones, books, cars, video cameras, … nationwide fiber optical network with wireless tails Effects on storage industry: Embedded storage smaller, cheaper, more reliable, lower power Data utilities high capacity, hierarchically managed storage

Disk Device Terminology
Purpose: 1: Long-term non-volatile storage 2: Large, inexpensive, slow level in the memory hierarchy 3. A collection of platters rotating on a spindle at a certain RPM ( ) Each platter is a metal disk covered with a magnetic recording material on both sides. Reading and writing involves mechanical movement seeking and rotating, to be explained next.

Devices: Magnetic Disks
Purpose: Long-term, nonvolatile storage Large, inexpensive, slow level in the storage hierarchy Characteristics: Seek Time (~8 ms avg) positional latency (track) rotational latency (sector within track) Transfer rate About a sector per ms (5-15 MB/s) (in Blocks) Queuing Delay: time waiting for the disk to become free Controller time Capacity Gigabytes Quadruples every 3 years Track Sector: the smallest unit that can be read/ written Cylinder Platter Head 7200 RPM = 120 RPS => 8 ms per rev ave rot. latency = 4 ms 128 sectors per track => 0.25 ms per sector 1 KB per sector => 16 MB / s Response time = Queue + Controller + Seek + Rot + Xfer Service time Le’st see some numbers page 490

Disk Device Terminology
Disk Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Xfer Time Order of magnitude times for 4K byte transfers: See some pictures of disk design Seek: 8 ms or less Rotate: rpm Xfer: rpm

Disk Time Example Disk Parameters: Controller overhead is 2 ms
Transfer size is 8K bytes Advertised average seek is 12 ms Disk spins at 7200 RPM Transfer rate is 4 MB/sec Controller overhead is 2 ms Assume that disk is idle so no queuing delay What is Average Disk Access Time for a Sector? Ave seek + ave rot delay + transfer time + controller overhead 12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms = 20 ms.

Relative Cost of Storage Technology—Late 1995/Early 1996
Magnetic Disks 5.25” 9.1 GB $2129 $0.23/MB $1985 $0.22/MB 3.5” 4.3 GB $1199 $0.27/MB $999 $0.23/MB 2.5” 514 MB $299 $0.58/MB GB $345 $0.33/MB Optical Disks 5.25” 4.6 GB $ $0.41/MB $ $0.39/MB PCMCIA Cards Static RAM 4.0 MB $700 $175/MB Flash RAM 40.0 MB $1300 $32/MB 175 MB $3600 $20.50/MB

Processor Interface Issues
An interface answers the following questions for us: 1) how is a user I/O request transformed into a device command and communicated to the device? 2) how is data actually transferred to or from a memory location? 3) what is the role of the operating system in this? The OS is important since the I/O system is shared by multiple programs using the CPU. This sharing needs to be implemented in a fair way. The CPU can not do that, it is busy executing programs.

Processor Interface Issues
Interrupts Memory mapped I/O I/O Control Structures Polling DMA I/O Controllers I/O Processors Capacity, Access Time, Bandwidth Interconnections Busses

A Need for an I/O Interface
One may wonder why we don’t connect peripherals directly to the system bus. Reasons for not doing that: There are a wide variety of peripherals with various methods of operation. It would be very impractical to incorporate the necessary logic within the processor to control each device. The data transfer rate of the peripheral is much slower than that of the memory or the processor. Thus it is impractical to use a high-speed system bus to communicate directly with a peripheral. Peripherals often use different data formats and word lengths than the computer to which they are attached. Next question is how to connect the I/O interface that may be attached to an I/O bus to the CPU?

Example of an Interface
Interface to system bus data registers control/status registers I/O logic used for decoding commands from the processor such as read, write, scan, address recognition, status reporting etc. External device interface (data, status, control) function of the interface: control and timing processor communication device communication data buffering error correction Next how does the CPU address an I/O device to send or receive data? See figures in class.

Memory Mapped I/O CPU Single Memory & I/O Bus No Separate I/O Instructions ROM Memory Interface Interface RAM Peripheral Peripheral I/O In this mode, there is a single address space for memory locations and I/O devices. Each I/O device will have unique addresses for its data and status registers which are treated just like any other memory location. The bus will contain data and address lines and some I/O command lines The command line specifies whether the address refers to a memory location or an I/O device. The alternative solution is isolated I/O address space and I/O opcodes. In this case I/O ports are only accessible by special I/O instructions.

Benefits of Memory-Mapped I/O
Data Transfer to and from the Processor is standardized. The number of connections to the processor chip or board are reduced. With the increasing number of address bits (32, 64) etc. there is sufficient extra room to apportion some of the memory space to I/O.

I/O Addressing In both cases (memory-mapped, isolated I/O), each I/O device has registers for status (busy, ready, idle, etc.), and control information. The CPU sets flags to determine the operation the I/O device will perform, either through load/store instructions in memory-mapped, or through special I/O instructions for the isolated I/O. The next question is how is this interaction done?

Programmed I/O (Polling)
See diagram in class first and example next. 1. The CPU periodically checks status bits to see if there is I/O operation. CPU Memory IOC 2. busy wait loop device 3. The CPU ends up doing all the work! 4.Not an efficient way to use the CPU unless the device is very fast! The problem with this method is that the processor has to wait for a long time for the I/O module of concern to be ready for either reception or transmission of data. The processor while waiting must repeatedly interrogate the module.

Polling 1. CPU interrogates the I/O module to check status of the attached device. 2. The I/O module returns device status 3. If the device is operational and ready to transmit, the CPU requests the transfer of data, by means of a command to the I/O module. 4. The I/O module obtains a unit of data from the external device 5. The data are transferred from the I/O module to the processor.

Overhead of Polling Three different devices: mouse, floppy disk, hard disk. Assume the polling operation (transferring to the polling routine, accessing the device, and restarting the user program) takes 400 CCs. Processor is 500-MHz. Mouse must be polled 30 times/second. Floppy disk transfers data to the processor in 16-bits units and has a data rate of 50 KB/sec. No data can be missed. Hard disk transfers data in 16-byte (four-word) chunks and can transfer at 4 MB/sec. Again no data can be missed. Devices always busy.

Overhead of Polling Mouse:
clock cycles per second for polling: 30 X 400 = 12,000 Cycles per second. Fraction of the processor clock cycles consumed: 12,000/500 X 106 = 0.002% Polling is good for the mouse in this computer. It does not degrade the performance significantly. Floppy disk: the rate at which we must poll is: 50 KB/s divided by 2 bytes per polling access, we get: 25K polling accesses per second. Cycles per second for polling: 25k X 400 = 10 X 106 Fraction of processor clock cycles consumed: 10 X 106 / 500 X 106 = 2 %, could be tolerable.

Overhead of Polling Hard disk:
polling rate is 250 K times per second (why?) (4 MB per second/ 16 bytes per transfer) = 250 K (a quarter of a mega). cycles per second for polling: 250 K X 400 fraction of processor consumed: 100 Mega. /500 MHz = 20%. One-fifth of the processor is used just for polling the disk. This is clearly not acceptable. Alternative solution to polling is interrupt-driven I/O next!

Interrupt Driven Data Transfer
CPU add sub and or nop user program (1) I/O interrupt Memory IOC (2) save PC (3) interrupt service addr device read store ... rti interrupt service routine (4) User program progress only halted during actual transfer to deal with different I/O devices, interrupt mechanisms have several levels of priority. These priorities indicate the order in which the processor should process the interrupts. memory Interrupt algorithm given in class.

Overhead of Interrupt-driven I/O
Suppose we have same hard disk and processor as before. The overhead for each transfer including the interrupt is 500 clock cycles. Let’s find the fraction of the processor consumed if the hard disk is only transferring data 5% of the time. The interrupt rate when the disk is busy is the same as the polling rate, hence : Cycles per second for disk = 250K X500 125 X 106 cycles per second. (see previous example for this).

Overhead of Interrupt-driven I/O
Fraction of the processor consumed during a transfer: 125 x 106 / 500 X 106 = 25% assuming that the disk is only transferring data at 5% of the time, Fraction of the processor consumed is 25% X5 % = 1.25% so the absence of overhead when the I/O device is not actually transferring is the major advantage of interrupt-driven interface versus polling. Interrupt-driven I/O relieves the CPU from having to wait for every I/O event. However, if we use this method and the disk is transferring it still costs 25%.

Direct Memory Access Controllers
A solution to that is the DMA:a mechanism for off-loading the processor and having the device controller transfer data directly to or from memory without involving the processor. The interrupt mechanism is still used by the I/O device to communicate with the processor but only on completion of an I/O transfer. DMA is implemented with a specialized controller that transfers data between an I/O device and the memory independent of the processor.

Direct Memory Access Controllers
Step 1: CPU sets up the DMA by supplying the identity of the device, the operation to perform, the memory address, and the number of bytes to transfer. Step2: DMA starts operation and arbitrates for the bus, and transfers the data. Step3: Once DMA transfer is complete, the controller interrupts the processor.

Direct Memory Access CPU sends a starting address,
direction, and length count to DMAC. Then issues "start". CPU Memory DMAC IOC device DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory.

Overheard of I/O using DMA
Suppose same processor and hard disk as before. Assume that the initial setup of a DMA transfer takes 1000 clock cycles for the processor, and assume the handling of the interrupt at DMA completion requires 500 clock cycles for the processor. Hard disk has transfer rate of 4MB/sec. Average transfer from disk is 8KB. Disk is transferring 100%. What fraction of the 500MHz processor is consumed?

Overheard of I/O using DMA
Each DMA transfer takes: 8 KB / 4MB/sec = sec. If the disk is constantly transferring, it requires: cycles/transfer / second per transfer = 750,000 clock cycles/second processor is 500MHz, fraction of processor consumed: 750,000 / 500 X106 = 0.2%. Of course the disk is not always transferring and this number will be even lower. To further relieve the processor from I/O, the I/O controller could be made more intelligent. Such a controller is often called and I/O processor. This processor executes I/O programs already stored.

Input/Output Processors
CPU IOP D1 D2 main memory bus Mem Dn I/O bus CPU IOP issues instruction to IOP interrupts when done (4) (1) (2) (3) memory Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles.

Summary Disk industry growing rapidly, improves:
bandwidth 40%/yr , areal density 60%/year, $/MB faster? queue + controller + seek + rotate + transfer Advertised average seek time benchmark much greater than average seek time in practice Response time vs. Bandwidth tradeoffs Value of faster response time: 0.7sec off response saves 4.9 sec and 2.0 sec (70%) total time per transaction => greater productivity everyone gets more done with faster response, but novice with fast response = expert with slow Processor Interface: today peripheral processors, DMA, I/O bus, interrupts

IO System CPU Performance: 60% per year

Similar presentations

Presentation on theme: "IO System CPU Performance: 60% per year"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IO System CPU Performance: 60% per year

Similar presentations

Presentation on theme: "IO System CPU Performance: 60% per year"— Presentation transcript:

Similar presentations

About project

Feedback