Chapter 8: Part II Storage, Network and Other Peripherals.

Chapter 8: Part II Storage, Network and Other Peripherals

Performance Analysis: Sync. vs. Async.  Synchronous bus: clock time=50ns, each transaction takes one clock cycle  Asynchronous bus: 40 ns per handshake  Data portion=32 bits  Question: Find the bandwidth of each bus when performing one-word reads from a 200ns memory.

Sync. vs. Async. Buses (I)  For the synchronous bus: 1. Send the address to memory:50 ns 2. Read the memory: 200 ns 3. Send the data to the device: 50 ns Total time= 300 ns, bandwidth=4bytes/300ns=13.3 MB/sec

Sync. vs. Async. Buses (II)  For the asynchronous bus: 1. Step 1: 40 ns 2. Step 2,3,4: max(3x40, 200ns)=200ns 3. Step 5,6,7: 3x40ns = 120ns Total time = 360 ns, maximum bandwidth= 4bytes/360ns = 11.1 MB/s

Increasing Bus Bandwidth  Data bus width  Separate versus multiplexed address and data lines  Block transfers

Performance Analysis of Two Bus Schemes  Given a system with a memory and bus system supporting block access of 4 to 16 words a 64-bit synchronous bus clocked at 200MHz, with each 64-bit transfer taking 1 clock cycle, and 1 clock cycle to send an address to memory two clock cycles needed between each bus operation memory access for first 4 words takes 200ns, each additional set of 4 words requires 20ns

Question  Find the sustained bandwidth and latency for a read of 256 words for transfers using 4-word blocks and 16-word blocks.  Find the effective number of bus transactions for each case.

4-Word Block Transfer  1 clock cycle to send address to memory  200ns/(5ns/cycle) = 40 cycles to read memory  2 cycles to send data from memory  2 idle cycles  Total = 45 cycles  256 words requires 45x64= 2880 cycles

4-Word Block Transfer  Latency = 2880 cycles x 5ns/cycle = 14400 ns  Number of bus transactions = 64 x 1s/14400ns = 4.44M transactions/s  Bandwidth = (256x4 bytes)x 1/14400ns = 71.11 MB/s

16-Word Block Transfer  1 clock cycle to send address to memory  40 cycles to read first 4 words from memory  2 cycles to send data, during which the read of the next 4 words is started.  2 idle cycles between transfers, during which the read of the next block is completed.  Need to repeat the last two steps 3 times to read a total of 16 words.

16-Word Block Transfer  Total cycles required = 1 + 40 + 4x(2+2) =57 cycles  256/16=16 transactions are required  Total number of cycles required for 256 word = 16x57 = 912 cycles, latency = 4560 ns  Number of bus transactions = 16 x 1s/4560ns = 3.51M transactions/s  Bandwidth = (256x4 bytes)x 1/4560ns = 224.56 MB/

Bus Arbitration  Daisy chain arbitration (not very fair)  Centralized arbitration (requires an arbiter), e.g., PCI  Self selection, e.g., NuBus used in Macintosh  Collision detection, e.g., Ethernet

Bus Standards  PCI ( a general purpose backplane bus)  SCSI (Small Computer System Interface)  IEEE 1394 (Firewire)  USB 2.0 CharacteristicFirewire(1394)USB 2.0 Bus width42 Clockingasynchronous Peak bandwidth50MB/s (Firewire 400) 100MB/s (Firewire 800) 0.2 MB/s 1.5 MB/s 60 MB/s Hot pluggableYes Max # of devices63127 Max. Bus length4.5M5M

Interfacing I/O Devices  How is a user I/O request transformed into a device command and communicated to the device?  How is data actually transferred to or from a memory location?  What is the role of the operating system?

Role of the OS  The OS plays a major role in handling I/O, in that: I/O system is shared by multiple programs using the processor I/O system often use interrupts (cause transfer to supervisor mode) low-level control of I/O is complex

Communications between OS and I/O Devices  The OS must be able to give commands to I/O.  The I/O must be able to notify the OS when operation is completed or error has occurred.  Data must be transferred between memory and an I/O device.

Giving Commands to I/O  To give a command, the processor must be able to address the device and to supply command words: memory-mapped I/O: portions of the address space is assigned to I/O devices special I/O: dedicated I/O instructions in the processor.

Communicating with the Processor  Polling  Interrupts  DMA

Polling  Polling: processor periodically checks the status of I/O.  Overhead of polling in an I/O system Example 1: mouse Example 2: floppy disk Example 3: hard disk

Mouse  Assume the number of clock cycles for a polling operation, including transferring to the polling routine, accessing the device, and restarting the user program, is 400, with a 500 MHz clock.  The mouse must be polled 30 times a second to ensure that no user movement is missed.  Fraction of CPU time = 30x400/(500x10^6) = 0.002%

Floppy Disk  The floppy disk transfers data to the processor in 16-bit units and has a data rate of 50KB/s.  Polling rate = (50KB/s)/(2 Bytes/polling) = 25K polling/sec  Fraction of CPU time = 25Kx400/(500x10^6) = 2%

Hard Disk  Transfer in 4-word blocks  transfer rate: 4MB/s  Polling rate = (4MB/s)/(4x4 Bytes/polling) = 250K polling/sec  Fraction of CPU time = 250Kx400/(500x10^6) = 20%

Overhead of Polling  Can do the polling only when the device is active, thus reducing the overhead.  However, the overhead is still significant, resulting in another design called interrupt-driven I/O.

Overhead of Interrupt-Driven I/O  Assume the overhead for each transfer, including the interrupt, is 500 cycles.  Cycles per second for disk = 250Kx500 = 125x10^6 cycles  Fraction of processor consumed = 125x10^6/(500x10^6) = 25%  Assuming disk is transferring data 5% of the time, fraction of CPU on average = 25%x5%=1.25%

Direct Memory Access(DMA)  If disk is transferring data most of the time, the overhead for interrupt-driven I/O is still high.  For high-bandwidth device, let the device controller transfer data directly to or from the memory without involving the processor, known as direct memory access.  Interrupt is used to signal the completion of I/O transfer or error.  Note: How does it affect cache design?

Overhead of I/O Using DMA  Assume initial setup of DMA transfer takes 1000 cycles, handling of interrupt at DMA completion takes 500 cycles, average transfer from disk is 8KB  Each DMA transfer takes 8KB/(4MB/s) = 2x10^-3s  If the disk is constantly transferring data, it requires: (1000+500)/(2x10^-3) = 750x10^3 cycles  Fraction of CPU time= 750x10^3/(500x10^6) = 0.15%

I/O System Design  Latency constraints: ensuring the latency to complete and I/O operation is bounded.  Bandwidth constraints  Performance Analysis techniques: — queuing theory — simulation — analysis

I/O System Design- Example  CPU: 3 BIPS, average 100,000 instructions in the OS per I/O operation  backplane bus transfer rate: 1000 MB/s  SCSI-Ultra 320 controller with transfer rate = 320 MB/s, accommodating up to 7 disks  Disk bandwidth = 75MB/s, seek+rotational latency=6 ms  Workload: 64-KB reads, user program need 200,000 instructions per I/O

Example  Find the maximum sustainable I/O rate the number of disks and SCSI controller required.

Real Stuff: Buses and Network of P4

Intel P4 I/O Chip Sets

A Digital Camera

SoC (System on a chip)

Chapter 8: Part II Storage, Network and Other Peripherals.

Similar presentations

Presentation on theme: "Chapter 8: Part II Storage, Network and Other Peripherals."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 8: Part II Storage, Network and Other Peripherals.

Similar presentations

Presentation on theme: "Chapter 8: Part II Storage, Network and Other Peripherals."— Presentation transcript:

Similar presentations

About project

Feedback