Presentation is loading. Please wait.

Presentation is loading. Please wait.

331 Week 12.1Spring 2005 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system [Adapted from Dave Patterson’s.

Similar presentations


Presentation on theme: "331 Week 12.1Spring 2005 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system [Adapted from Dave Patterson’s."— Presentation transcript:

1 331 Week 12.1Spring 2005 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

2 331 Week 12.2Spring 2005 Head’s Up  This week’s material l Buses: Connecting I/O devices -Reading assignment – PH 8.4 l Memory hierarchies -Reading assignment – PH 7.1 and B.8-9  Reminders l Next week’s material l Basics of caches -Reading assignment – PH 7.2

3 331 Week 12.3Spring 2005 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)

4 331 Week 12.4Spring 2005 Input and Output Devices  I/O devices are incredibly diverse wrt l Behavior l Partner l Data rate DeviceBehaviorPartnerData rate (KB/sec) Keyboardinputhuman0.01 Mouseinputhuman0.02 Laser printeroutputhuman200.00 Graphics displayoutputhuman60,000.00 Network/LANinput or output machine500.00-6000.00 Floppy diskstoragemachine100.00 Magnetic diskstoragemachine2000.00-10,000.00

5 331 Week 12.5Spring 2005 Magnetic Disk  Purpose l Long term, nonvolatile storage l Lowest level in the memory hierarchy -slow, large, inexpensive  General structure l A rotating platter coated with a magnetic surface l Use a moveable read/write head to access the disk  Advantages of hard disks over floppy disks l Platters are more rigid (metal or glass) so they can be larger l Higher density because it can be controlled more precisely l Higher data rate because it spins faster l Can incorporate more than one platter

6 331 Week 12.6Spring 2005 Organization of a Magnetic Disk  Typical numbers (depending on the disk size) l 1 to 15 (2 surface) platters per disk with 1” to 8” diameter l 1,000 to 5,000 tracks per surface l 63 to 256 sectors per track -the smallest unit that can be read/written (typically 512 to 1,024 B) l Traditionally all tracks have the same number of sectors -Newer disks with smart controllers can record more sectors on the outer tracks (constant bit density) Platters Track Sector

7 331 Week 12.7Spring 2005 Magnetic Disk Characteristic  Cylinder: all the tracks under the heads at a given point on all surfaces  Read/write data is a three-stage process: l Seek time: position the arm over the proper track (6 to 14 ms avg.) -due to locality of disk references the actual average seek time may be only 25% to 33% of the advertised number l Rotational latency: wait for the desired sector to rotate under the read/write head (½ of 1/RPM) l Transfer time: transfer a block of bits (sector) under the read-write head (2 to 20 MB/sec typical) l Controller time: the overhead the disk controller imposes in performing an disk I/O access (typically < 2 ms) Sector Track Cylinder Head Platter

8 331 Week 12.8Spring 2005 Magnetic Disk Examples CharacteristicSun X6713AToshiba MK2016 Disk diameter (inches)3.52.5 Capacity73 GB20 GB MTTF (k hr’s)1,200300 # of platters - heads2 - 4 # cylinders16,383 # B/sector - # sectors/track512 - 63 Rotation speed (RPM)10,0004,200 Max. - Avg. seek time (ms)? - 6.624 - 13 Avg. rot. latency (ms)37.14 Transfer rate (PIO)35 MB/sec16.6 MB/sec Power (watts)< 2.5 Volume (in 3 )4.01 Weight (oz)3.49

9 331 Week 12.9Spring 2005 I/O System Interconnect Issues  A bus is a shared communication link (a set of wires used to connect multiple subsystems) l Performance l Expandability l Resilience in the face of failure – fault tolerance Processor Receiver Main Memory Keyboard bus

10 331 Week 12.10Spring 2005 Performance Measures  Latency (execution time, response time) is the total time from the start to finish of one instruction or action l usually used to measure processor performance  Throughput – total amount of work done in a given amount of time l aka execution bandwidth l the number of operations performed per second  Bandwidth – amount of information communicated across an interconnect (e.g., a bus) per unit time l the bit width of the operation * rate of the operation l usually used to measure I/O performance

11 331 Week 12.11Spring 2005 I/O System Expandability Cache Memory Memory - I/O Bus Main Memory I/O Controller Disk I/O Controller I/O Controller Terminal Network interrupt signals  Usually have more than one I/O device in the system l each I/O device is controlled by an I/O Controller Processor

12 331 Week 12.12Spring 2005 Quiz  What is disk seek time, and what is rotational time?

13 331 Week 12.13Spring 2005 Bus Characteristics  Control lines l Signal requests and acknowledgments l Indicate what type of information is on the data lines  Data lines l Data, complex commands, and addresses  Bus transaction consists of l Sending the address l Receiving (or sending) the data Data Lines Control Lines

14 331 Week 12.14Spring 2005 Output (Read) Bus Transaction  Defined by what they do to memory l read = output: transfers data from memory (read) to I/O device (write) Processor Main Memory Control Data Step 1: Processor sends read request and read address to memory Processor Control Data Main Memory Step 2: Memory accesses data Processor Control Data Main Memory Step 3: Memory transfers data to disk

15 331 Week 12.15Spring 2005 Input (Write) Bus Transaction  Defined by what they do to memory l write = input: transfers data from I/O device (read) to memory (write) Processor Main Memory Control Data Step 1: Processor sends write request and write address to memory Processor Control Data Main Memory Step 2: Disk transfers data to memory

16 331 Week 12.16Spring 2005 Advantages and Disadvantages of Buses  Advantages l Versatility: -New devices can be added easily -Peripherals can be moved between computer systems that use the same bus standard l Low Cost: -A single set of wires is shared in multiple ways  Disadvantages l It creates a communication bottleneck -The bus bandwidth limits the maximum I/O throughput l The maximum bus speed is largely limited by -The length of the bus -The number of devices on the bus l It needs to support a range of devices with widely varying latencies and data transfer rates

17 331 Week 12.17Spring 2005 Types of Buses  Processor-Memory Bus (proprietary) l Short and high speed l Matched to the memory system to maximize the memory- processor bandwidth l Optimized for cache block transfers  I/O Bus (industry standard, e.g., SCSI, USB, ISA, IDE) l Usually is lengthy and slower l Needs to accommodate a wide range of I/O devices l Connects to the processor-memory bus or backplane bus  Backplane Bus (industry standard, e.g., PCI) l The backplane is an interconnection structure within the chassis l Used as an intermediary bus connecting I/O busses to the processor-memory bus

18 331 Week 12.18Spring 2005 A Two Bus System  I/O buses tap into the processor-memory bus via Bus Adaptors (that do speed matching between buses) l Processor-memory bus: mainly for processor-memory traffic l I/O busses: provide expansion slots for I/O devices ProcessorMemory Processor-Memory Bus I/O Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus

19 331 Week 12.19Spring 2005 A Three Bus System  A small number of Backplane Buses tap into the Processor- Memory Bus l Processor-Memory Bus is used for processor memory traffic l I/O buses are connected to the Backplane Bus  Advantage: loading on the Processor-Memory Bus is greatly reduced ProcessorMemory Processor-Memory Bus Bus Adaptor Backplane Bus Bus Adaptor Bus Adaptor I/O Bus

20 331 Week 12.20Spring 2005 I/O System Example (Apple Mac 7200) Cache Memory PCI Main Memory I/O Controller I/O Controller Graphic Terminal Network Processor  Typical of midrange to high-end desktop system in 1997 PCI Interface/ Memory Controller I/O Controller I/O Controller SCSI bus Disk CDRom Tape Processor-Memory Bus Serial portsAudio I/O

21 331 Week 12.21Spring 2005 Example: Pentium System Organization Processor-Memory Bus PCI Bus I/O Busses Memory controller (“Northbridge”) http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&

22 331 Week 12.22Spring 2005 Synchronous and Asynchronous Buses  Synchronous Bus l Includes a clock in the control lines l A fixed protocol for communication that is relative to the clock l Advantage: involves very little logic and can run very fast l Disadvantages: -Every device on the bus must run at the same clock rate -To avoid clock skew, they cannot be long if they are fast  Asynchronous Bus l It is not clocked, so requires handshaking protocol (req, ack) -Implemented with additional control lines l Advantages: -Can accommodate a wide range of devices -Can be lengthened without worrying about clock skew or synchronization problems l Disadvantage: slow(er)

23 331 Week 12.23Spring 2005 Asynchronous Handshaking Protocol 1. Memory sees ReadReq, reads addr from data lines, and raises Ack 2. I/O device sees Ack and releases the ReadReq and data lines 3. Memory sees ReadReq go low and drops Ack 4. When memory has data ready, it places it on data lines and raises DataRdy 5. I/O device sees DataRdy, reads the data from data lines, and raises Ack 6. Memory sees Ack, releases the data lines, and drops DataRdy 7. I/O device sees DataRdy go low and drops Ack  Output (read) data from memory to an I/O device. I/O device signals a request by raising ReadReq and putting the addr on the data lines 1 2 3 ReadReq Data Ack DataRdy addrdata 4 56 7

24 331 Week 12.24Spring 2005 Key Characteristics of Two Bus Standards CharacteristicFirewire (1394)USB 2.0 TypeI/O Data bus width(signals) 42 Clockingasynchronous Theoretical Peak bandwidth 50 MB/sec (Firewire 400) or 100 MB/sec (Firewire 800) 0.2 MB/sec (low speed), 1.5 MB/sec (full) or 60MB/sec (high) Hot plugableYesyes Max. devices63127 Max. length (copper wire) 4.5 meters5 meters

25 331 Week 12.25Spring 2005 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output

26 331 Week 12.26Spring 2005 Second Level Cache (SRAM) A Typical Memory Hierarchy Control Datapath Secondary Memory (Disk) On-Chip Components RegFile Main Memory (DRAM) Data Cache Instr Cache ITLB DTLB eDRAM Speed (ns):.1’s 1’s 10’s 100’s 1,000’s Size (bytes): 100’s K’s 10K’s M’s T’s Cost: highest lowest  By taking advantage of the principle of locality: l Present the user with as much memory as is available in the cheapest technology. l Provide access at the speed offered by the fastest technology.

27 331 Week 12.27Spring 2005 Characteristics of the Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the memory at each level Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM 4-8 bytes (word) 1 block 1,023+ bytes (disk sector = page) 8-32 bytes (block)

28 331 Week 12.28Spring 2005 Memory Hierarchy Technologies  Random Access l “Random” is good: access time is the same for all locations l DRAM: Dynamic Random Access Memory -High density (1 transistor cells), low power, cheap, slow -Dynamic: need to be “refreshed” regularly (~ every 8 ms) l SRAM: Static Random Access Memory -Low density (6 transistor cells), high power, expensive, fast -Static: content will last “forever” (until power turned off) l Size: DRAM/SRAM ­ 4 to 8 l Cost/Cycle time: SRAM/DRAM ­ 8 to 16  “Non-so-random” Access Technology l Access time varies from location to location and from time to time (e.g., Disk, CDROM)

29 331 Week 12.29Spring 2005 Classical SRAM Organization (~Square) rowdecoderrowdecoder row address data word RAM Cell Array word (row) select bit (data) lines Each intersection represents a 6-T SRAM cell Column Selector & I/O Circuits column address One memory row holds a block of data, so the column address selects the requested word from that block

30 331 Week 12.30Spring 2005 data bit Classical DRAM Organization (~Square Planes) rowdecoderrowdecoder row address Column Selector & I/O Circuits column address data bit word (row) select bit (data) lines Each intersection represents a 1-T DRAM cell The column address selects the requested bit from the row in each plane data word... RAM Cell Array

31 331 Week 12.31Spring 2005 RAM Memory Definitions  Caches use SRAM for speed  Main Memory is DRAM for density l Addresses divided into 2 halves (row and column) -RAS or Row Access Strobe triggering row decoder -CAS or Column Access Strobe triggering column selector  Performance of Main Memory DRAMs l Latency: Time to access one word -Access Time: time between request and when word arrives -Cycle Time: time between requests -Usually cycle time > access time l Bandwidth: How much data can be supplied per unit time -width of the data channel * the rate at which it can be used

32 331 Week 12.32Spring 2005 Classical DRAM Operation  DRAM Organization: l N rows x N column x M-bit l Read or Write M-bit at a time l Each M-bit access requires a RAS / CAS cycle Row Address CAS RAS Col AddressRow AddressCol Address 1st M-bit Access 2nd M-bit Access N rows N cols DRAM M bits Row Address Column Address M-bit Output Cycle Time

33 331 Week 12.33Spring 2005 Ways to Improve DRAM Performance  Memory interleaving  Fast Page Mode DRAMs – FPM DRAMs l www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.h tm www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.h tm  Extended Data Out DRAMs – EDO DRAMs l www.chips.ibm.com/products/memory/88H2011/88H2011.pdf www.chips.ibm.com/products/memory/88H2011/88H2011.pdf  Synchronous DRAMS – SDRAMS l www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D. htm www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D. htm  Rambus DRAMS l www.rambus.com/developer/quickfind_documents.html www.rambus.com/developer/quickfind_documents.html l www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669 B.htm www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669 B.htm  Double Data Rate DRAMs – DDR DRAMS l www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323H A.htm www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323H A.htm ...

34 331 Week 12.34Spring 2005 Increasing Bandwidth - Interleaving Access pattern without Interleaving: Start Access for D1 CPUMemory Start Access for D2 D1 available Access pattern with 4-way Interleaving: CPU Memory Bank 1 Memory Bank 0 Memory Bank 3 Memory Bank 2 Cycle Time Access Time Access Bank 0 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again D2 available

35 331 Week 12.35Spring 2005 Problems with Interleaving  How many banks? l Ideally, the number of banks  number of clocks we have to wait to access the next word in the bank l Only works for sequential accesses (i.e., first word requested in first bank, second word requested in second bank, etc.)  Increasing DRAM sizes => fewer chips => harder to have banks l Growth bits/chip DRAM : 50%-60%/yr  Only can use for very large memory systems (e.g., those encountered in supercomputer systems)

36 331 Week 12.36Spring 2005 N rows N cols DRAM Column Address M-bit Output M bits N x M “SRAM” Row Address Fast Page Mode DRAM Operation  Fast Page Mode DRAM l N x M “SRAM” to save a row Row Address CAS RAS Col Address 1st M-bit Access Col Address 2nd M-bit3rd M-bit4th M-bit  After a row is read into the SRAM “register” l Only CAS is needed to access other M-bit blocks on that row l RAS remains asserted while CAS is toggled

37 331 Week 12.37Spring 2005 Why Care About the Memory Hierarchy? 1 10 100 1000 19801981198319841985198619871988198919901991199219931994199519961997199819992000 DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance Time “Moore’s Law” Processor-DRAM Memory Gap µProc 60%/year (2X/1.5yr) DRAM 9%/year (2X/10yrs)

38 331 Week 12.38Spring 2005 Memory Hierarchy: Goals  Fact: Large memories are slow, fast memories are small  How do we create a memory that gives the illusion of being large, cheap and fast (most of the time)? by taking advantage of  The Principle of Locality: Programs access a relatively small portion of the address space at any instant of time. Address Space 02 n - 1 Probability of reference

39 331 Week 12.39Spring 2005 Memory Hierarchy: Why Does it Work?  Temporal Locality (Locality in Time): => Keep most recently accessed data items closer to the processor  Spatial Locality (Locality in Space): => Move blocks consists of contiguous words to the upper levels Lower Level Memory Upper Level Memory To Processor From Processor Blk X Blk Y

40 331 Week 12.40Spring 2005 Memory Hierarchy: Terminology  Hit: data appears in some block in the upper level (Block X) l Hit Rate: the fraction of memory accesses found in the upper level l Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss  Miss: data needs to be retrieve from a block in the lower level (Block Y) l Miss Rate = 1 - (Hit Rate) l Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor l Hit Time << Miss Penalty Lower Level Memory Upper Level Memory To Processor From Processor Blk X Blk Y

41 331 Week 12.41Spring 2005 How is the Hierarchy Managed?  registers memory l by compiler (programmer?)  cache main memory l by the hardware  main memory disks l by the hardware and operating system (virtual memory) l by the programmer (files)

42 331 Week 12.42Spring 2005 Summary  DRAM is slow but cheap and dense l Good choice for presenting the user with a BIG memory system  SRAM is fast but expensive and not very dense l Good choice for providing the user FAST access time  Two different types of locality l Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. l Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon.  By taking advantage of the principle of locality: l Present the user with as much memory as is available in the cheapest technology. l Provide access at the speed offered by the fastest technology.


Download ppt "331 Week 12.1Spring 2005 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system [Adapted from Dave Patterson’s."

Similar presentations


Ads by Google