Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture 2010 – PC Structure and Peripherals 1 Computer Architecture PC Structure and Peripherals Dr. Lihu Rappoport.

Similar presentations


Presentation on theme: "Computer Architecture 2010 – PC Structure and Peripherals 1 Computer Architecture PC Structure and Peripherals Dr. Lihu Rappoport."— Presentation transcript:

1 Computer Architecture 2010 – PC Structure and Peripherals 1 Computer Architecture PC Structure and Peripherals Dr. Lihu Rappoport

2 Computer Architecture 2010 – PC Structure and Peripherals 2 Memory

3 3 CapacitySpeed Logic2× in 3 years2× in 3 years DRAM4× in 3 years1.4× in 10 years Disk2× in 3 years1.4× in 10 years Technology Trends CPU-DRAM Memory Gap (latency)

4 Computer Architecture 2010 – PC Structure and Peripherals 4 SRAM vs. DRAM u Random Access: access time is the same for all locations DRAM – Dynamic RAMSRAM – Static RAM RefreshRegular refresh (~1% time)No refresh needed AddressAddress muxed: row+ columnAddress not multiplexed AccessNot true “Random Access”True “Random Access” densityHigh (1 Transistor/bit)Low (6 Transistor/bit) Powerlowhigh Speedslowfast Price/bitlowhigh Typical usageMain memorycache

5 Computer Architecture 2010 – PC Structure and Peripherals 5 Basic DRAM chip  Addressing sequence  Row address and then RAS# asserted  RAS# to CAS# delay  Column address and then CAS# asserted  DATA transfer Row latch Row address decoder Column addr decoder Column latch CAS# RAS# Data Memory array Memory address bus Addr

6 Computer Architecture 2010 – PC Structure and Peripherals 6 Addressing sequence  Access sequence  Put row address on data bus and assert RAS#  Wait for RAS# to CAS# delay (t RCD )  Put column address on data bus and assert CAS#  DATA transfer  Precharge t RAC –Access time RAS/CAS delay Precharge delay RAS# Data A[0:7] CAS# Data n Row iCol n Row j X CL - CAS latency X

7 Computer Architecture 2010 – PC Structure and Peripherals 7 DRAM Timing u CAS Latency: #clock cycles to access a specific column of data  #clock cycle from the moment memory controller issues a column in the current row, and the data is read out from memory u RAS to CAS Delay: #clock cycles between row and column access u Row Pre-charge time: #clock cycles to close an open row, and open the next row u Active to Precharge Delay #clock cycles to access a specific row between the data request and the pre-charge command.

8 Computer Architecture 2010 – PC Structure and Peripherals 8 Basic SDRAM controller DRAM address decoder Time delay gen. address mux RAS# CAS# R/W# A[20:23] A[10:19] A[0:9] Memory address bus D[0:7] Select Chip select u DRAM data must be periodically refreshed  Needed to keep data correct  DRAM controller performs DRAM refresh, using refresh counter

9 Computer Architecture 2010 – PC Structure and Peripherals 9  Paged Mode DRAM – Multiple accesses to different columns from same row – Saves RAS and RAS to CAS delay  Extended Data Output RAM (EDO RAM) – A data output latch enables to parallel next column address with current column data Improved DRAM Schemes RAS# Data A[0:7] CAS# Data nD n+1 RowXCol n XCol n+1 XCol n+2 X D n+2 X RAS# Data A[0:7] CAS# Data nData n+1 RowXCol n XCol n+1 XCol n+2 X Data n+2 X

10 Computer Architecture 2010 – PC Structure and Peripherals 10  Burst DRAM – Generates consecutive column address by itself Improved DRAM Schemes (cont) RAS# Data A[0:7] CAS# Data nData n+1 RowXCol n X Data n+2 X

11 Computer Architecture 2010 – PC Structure and Peripherals 11 Synchronous DRAM – SDRAM u All signals are referenced to an external clock (100MHz-200MHz)  Makes timing more precise with other system devices u Multiple Banks  Multiple pages open simultaneously (one per bank) u Command driven functionality instead of signal driven  ACTIVE: selects both the bank and the row to be activated ACTIVE to a new bank can be issued while accessing current bank  READ/WRITE: select column u Read and write accesses to the SDRAM are burst oriented  Successive column locations accessed in the given row  Burst length is programmable: 1, 2, 4, 8, and full-page May end full-page burst by BURST TERMINATE to get arbitrary burst length u A user programmable Mode Register  CAS latency, burst length, burst type u Auto pre-charge: may close row at last read/write in burst u Auto refresh: internal counters generate refresh address

12 Computer Architecture 2010 – PC Structure and Peripherals 12 SDRAM Timing u t RCD : ACTIVE to READ/WRITE gap =  t RCD (MIN) / clock period  u t RC : successive ACTIVE to a different row in the same bank u t RRD : successive ACTIVE commands to different banks BL = 1

13 Computer Architecture 2010 – PC Structure and Peripherals 13 DDR-SDRAM u 2n-prefetch architecture  The DRAM cells are clocked at the same speed as SDR SDRAM  Internal data bus is twice the width of the external data bus  Data capture occurs twice per clock cycle Lower half of the bus sampled at clock rise Upper half of the bus sampled at clock fall u Uses 2.5V (vs. 3.3V in SDRAM)  Reduced power consumption 0:n-1 n:2n-1 0:n-1 200MHz clock 0:2n-1 SDRAM Array

14 Computer Architecture 2010 – PC Structure and Peripherals 14 DDR SDRAM Timing 133MHz clock cmd Bank Data Addr NOP X ACT Bank 0 Row iX RD Bank 0 Col j t RCD >20ns ACT Bank 0 Row l t RC >70ns ACT Bank 1 Row m t RRD >20ns CL=2 NOP X X X X X X RD Bank 1 Col n NOP X X X X X X X X j +1 +2 +3 n +1 +2 +3

15 Computer Architecture 2010 – PC Structure and Peripherals 15 DIMMs u DIMM: Dual In-line Memory Module  A small circuit board that holds memory chips u 64-bit wide data path (72 bit with parity)  Single sided: 9 chips, each with 8 bit data bus 512 Mbit / chip  8 chips  512 Mbyte per DIMM  Dual sided: 18 chips, each with 4 bit data bus 256 Mbit / chip  16 chips  512 Mbyte per DIMM

16 Computer Architecture 2010 – PC Structure and Peripherals 16 DRAM Standards u SDR SDRAM: PC66, PC100 and PC133 u DDR SDRAM u Total BW for DDR400  3200M Byte/sec = 64 bit  2  200MHz / 8 (bit/byte) u Dual channel DDR SDRAM  Uses 2 64 bit DIMM modules in parallel to get a 128 data bus  Total BW for DDR400 dual channel: 6400M Byte/sec = 128 bit  2  200MHz /8 DDR200DDR266DDR333DDR400DDR533 Bus freq (MHz)100133167200266 Bit/pin (Mbps)200266333400533 Total bandwidth (M Byte/sec ) 16002133266632004264

17 Computer Architecture 2010 – PC Structure and Peripherals 17 DDR2 u DDR2 achieves high-speed using 4-bit prefetch architecture  SDRAM cells read/write 4× the amount of data as the external bus  DDR2-533 cell works at the same frequency as a DDR266 SDRAM or a PC133 SDRAM cell u This method comes at a price of increased latency  DDR2-based systems may perform worse than DDR1-based systems

18 Computer Architecture 2010 – PC Structure and Peripherals 18 DDR2 – Other Features u Shortened page size for reduced activation power  When ACTIVATE command is given, read all bits in the page A major contributor to the active power  A device with shorter page size has significantly lower power  512Mb DDR2 page size is 1KByte vs. 2KB for 512Mb DDR1 u Eight banks in 1Gb densities and above  Increases flexibility in DRAM accesses (also increases the power) DDR1DDR 2 DRAM Frequency100/133/166/200 MHz Bus Frequency100/133/166/200 MHz200/266/333/400 MHz Data Rate200/266/333/400 Mbps400/533/667/800 Mbps Operation Voltage2.5V1.8V CAS Latency2, 2.5, 33, 4, 5 Data Bandwidth3.2GBs6.4GBs Power Consumption399mW217mW

19 Computer Architecture 2010 – PC Structure and Peripherals 19 DDR2 Latency u DRAM timing, measured in I/O bus cycles, specifies 3 numbers  CAS Latency - RAS to CAS Delay - RAS Precharge Time u DRAM latency is the time for accessing data in an open page  E.g., latency for DDR400 2-3-2: 1/200MHz × 2.5 = 10ns  DDR2-533 4-4-4 latency is 1.5× of to DDR400 2–3–2 30% bandwidth growth does not compensate for access time worsening  DDR2-533 3-3-3 latency is only 12% worse than DDR400 2-3-2 MemoryMem ClkBus ClkTimingsLatencydual-channel BW DDR400200MHz 2.5 – 3 – 312.5 ns6.4 GB/sec DDR400200MHz 2–3–22–3–210 ns6.4 GB/sec DDR533266MHz 3–4–43–4–411.2 ns8.5 GB/sec DDR533266MHz 2.5 – 3 – 39.4 ns8.5 GB/sec DDR2-533133MHz266MHz4–4–44–4–415 ns8.5 GB/sec DDR2-533133MHz266MHz3–3–33–3–311.2 ns8.5 GB/sec DDR2-600150MHz300MHz5–5–55–5–516.6 ns9.6 GB/sec DDR2-600150MHz300MHz4–4–44–4–413.3 ns9.6 GB/sec

20 Computer Architecture 2010 – PC Structure and Peripherals 20 DDR2 Latency (cont.) u Performance tests  DDR2-533 with 4-4-4 timings worse than DDR400 2–3–2  DDR2-533 with 3-3-3 timings better than DDR400 2–3–2 u DDR2-533 modules with 3-3-3 timings  Supported by 925/915  best choice for enthusiastic users  significant improvement u Over-clocked motherboards clock DDR2-533 at 600MHz  realized through undocumented memory frequency ratios available in i925/i915 u The performance of DDR2-based systems is more sensitive to a lower latency than to a higher frequency  We get practically nothing from using DDR2-600 SDRAM with i925/i915

21 Computer Architecture 2010 – PC Structure and Peripherals 21 DDR2 Standards Standard name Memory clock Cycle time I/O Bus clock Data transfers per second Module name Peak transfer rate Timings DDR2-400100 MHz10 ns200 MHz400 MillionPC2-32003200 MB/s 3-3-3 4-4-4 DDR2-533133 MHz7.5 ns266 MHz533 MillionPC2-42004266 MB/s 3-3-3 4-4-4 DDR2-667166 MHz6 ns333 MHz667 MillionPC2-53005333 MB/s 4-4-4 5-5-5 DDR2-800200 MHz5 ns400 MHz800 MillionPC2-64006400 MB/s 4-4-4 5-5-5 6-6-6 DDR2- 1066 266 MHz3.75 ns533 MHz 1066 Million PC2-85008533 MB/s 6-6-6 7-7-7

22 Computer Architecture 2010 – PC Structure and Peripherals 22 DDR3 u 30% a power consumption reduction compared to DDR2  1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V  90 nanometer fabrication technology u Higher bandwidth  8 bit deep prefetch buffer (vs. 4 bit in DDR2 and 2 bit in DDR) u Transfer data rate  Effective clock rate of 800–1600 MHz using both rising and falling edges of a 400–800 MHz I/O clock.  DDR2: 400–800 MHz using a 200–400 MHz I/O clock  DDR: 200–400 MHz based on a 100–200 MHz I/O clock u DDR3 DIMMs  240 pins, the same number as DDR2, and are the same size  Electrically incompatible, and have a different key notch location

23 Computer Architecture 2010 – PC Structure and Peripherals 23 DDR2 vs. DDR3 Performance The high latency of DDR3 SDRAM has negative effect on streaming operations Source: xbitlabsxbitlabs

24 Computer Architecture 2010 – PC Structure and Peripherals 24 SRAM – Static RAM u True random access u High speed, low density, high power u No refresh u Address not multiplexed u DDR SRAM  2 READs or 2 WRITEs per clock  Common or Separate I/O  DDRII: 200MHz to 333MHz Operation; Density: 18/36/72Mb+ u QDR SRAM  Two separate DDR ports: one read and one write  One DDR address bus: alternating between the read address and the write address  QDRII: 250MHz to 333MHz Operation; Density: 18/36/72Mb+

25 Computer Architecture 2010 – PC Structure and Peripherals 25 Read Only Memory (ROM) u Random Access u Non volatile u ROM Types  PROM – Programmable ROM Burnt once using special equipment  EPROM – Erasable PROM Can be erased by exposure to UV, and then reprogrammed  E 2 PROM – Electrically Erasable PROM Can be erased and reprogrammed on board Write time (programming) much longer than RAM Limited number of writes (thousands)

26 Computer Architecture 2010 – PC Structure and Peripherals 26 Flash Memory u Non-volatile, rewritable memory  limited lifespan of around 100,000 write cycles u Flash drives compared to HD drives:  Smaller size, faster, lighter, noiseless, consume less energy  Withstanding shocks up to 2000 Gs Equivalent to a 10 foot drop onto concrete - without losing data  Lower capacity (8GB), but going up  Much more expensive (cost/byte): currently ~20$/1GB u NOR Flash  Supports per-byte addressing  Suitable for storing code (e.g. BIOS, cell phone SW) u NAND Flash  Supports page-mode addressing (e.g., 1KB blocks)  Suitable for storing large data (e.g. pictures, songs)

27 Computer Architecture 2010 – PC Structure and Peripherals 27 The Motherboard

28 Computer Architecture 2010 – PC Structure and Peripherals 28 Computer System Structure CPU PCI North Bridge DDRII Channel 1 mouse LAN Lan Adap External Graphics Card Mem BUS Cache Sound Card speakers South Bridge PCI express ×16 IDE controller IO Controller Old DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive keybrd DDRII Channel 2 USB controller SATA controller PCI express ×1 Memory controller On-board Graphics CPU BUS

29 Computer Architecture 2010 – PC Structure and Peripherals 29 Computer System Structure – New PCI North Bridge mouse LAN Lan Adap External Graphics Card CPU BUS Sound Card speakers South Bridge PCI express ×16 IDE controller IO Controller DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive keybrd USB controller SATA controller PCI express ×1 On-board Graphics CPU Cache DDRIII Channel 1 Mem BUS DDRIII Channel 2 Memory controller

30 Computer Architecture 2010 – PC Structure and Peripherals 30 The Motherboard IEEE- 1394a header audio header PCI add-in card connector PCI express x1 connector PCI express x16 connector Back panel connectors Processor core power connector Rear chassis fan header LGA775 processor socket GMCH: North Bridge + integ GFX Processor fan header DIMM Channel A sockets Serial port header DIMM Channel B sockets Diskette drive connector Main Power connector Battery ICH: South Bridge + integ Audio 4 × SATA connectors Front panel USB header Speaker Parallel ATA IDE connector High Def. Audio header PCI add-in card connector

31 Computer Architecture 2010 – PC Structure and Peripherals 31 How to get the most of Memory ? u Single Channel DDR u Dual channel DDR  Each DIMM pair must be the same u Balance FSB and memory bandwidth  800MHz FSB provides 800MHz × 64bit / 8 = 6.4 G Byte/sec  Dual Channel DDR400 SDRAM also provides 6.4 G Byte/sec L2 Cache CPU FSB – Front Side Bus Memory Memory Bus North Bridge DRAM Ctrlr CH A DDR DIMM DDR DIMM DDR DIMM DDR DIMM CH B L2 Cache CPU FSB – Front Side Bus North Bridge DRAM Ctrlr

32 Computer Architecture 2010 – PC Structure and Peripherals 32 How to get the most of Memory ? u Each DIMM supports 4 open pages simultaneously  The more open pages, the more random access  It is better to have more DIMMs n DIMMs: 4n open pages u DIMMs can be single sided or dual sided  Dual sided DIMMs may have separate CS of each side In this case the number of open pages is doubled (goes up to 8) This is not a must – dual sided DIMMs may also have a common CS for both sides, in which case, there are only 4 open pages, as with single side

33 Computer Architecture 2010 – PC Structure and Peripherals 33 Hard Disks

34 Computer Architecture 2010 – PC Structure and Peripherals 34 Hard Disk Structure u Direct access u Nonvolatile, Large, inexpensive, and slow  Lowest level in the memory hierarchy u Technology  Rotating platters coated with a magnetic surface  Use a moveable read/write head to access the disk  Each platter is divided to tracks: concentric circles  Each track is divided to sectors Smallest unit that can be read or written  Disk outer parts have more space for sectors than the inner parts Constant bit density: record more sectors on the outer tracks speed varies with track location u Buffer Cache  A temporary data storage area used to enhance drive performance Platters Track Sector

35 Computer Architecture 2010 – PC Structure and Peripherals 35 The IBM Ultrastar 36ZX u Top view of a 36 GB, 10,000 RPM, IBM SCSI server hard disk u 10 stacked platters

36 Computer Architecture 2010 – PC Structure and Peripherals 36 Disk Access Read/write data is a three-stage process u Seek time: position the arm over the proper track  Average: Sum of the time for all possible seek / total # of possible seeks  Due to locality of disk reference, actual average seek is shorter: 4 to 12 ms u Rotational latency: wait for desired sector to rotate under head  The faster the drives spins, the shorter the rotational latency time  Most disks rotate at 5,400 to 15,000 RPM At 7200 RPM: 8 ms per revolution  An average latency to the desired information is halfway around the disk At 7200 RPM: 4 ms u Transfer block: read/write the data  Transfer Time is a function of: Sector size Rotation speed Recording density: bits per inch on a track  Typical values: 100 MB / sec u Disk Access Time = Seek time + Rotational Latency + Transfer time + Controller Time + Queuing Delay

37 Computer Architecture 2010 – PC Structure and Peripherals 37 The Disk Interface – EIDE u EIDE, ATA, UltraATA, ATA 100, ATAPI: all the same interface  Uses for connecting hard disk drives and CD-ROM drives  80-pin cable, 40-pin dual header connector  100 MB/s (ATA66 is only 66MB/s)  EIDE controller integrated with the motherboard (in the ICH) u EIDE controller has two channels: primary and a secondary  Work independently  Two devices per channel: master and slave, but equal The 2 devices have to take turns controlling the bus A total of four devices per cont  If there are two device on the system (e.g., a hard disk and a CD-ROM) It is better to put them on different channels  Avoid mixing slower (CD) and faster devices (HDD) on the same channel  If doing a lot of copying from a CD-ROM drive to the CD-RW Better performance by separating devices to separate channels

38 Computer Architecture 2010 – PC Structure and Peripherals 38 The Disk Interface – Serial ATA (SATA) u Point-to-point connection  Ensures dedicated 150 MB/s per device (no sharing) u Dual controllers allow independent operation of each device u Thinner (7 wires), flexible, longer cables  Easier routing and improved airflow  4 wires for signaling + 3 ground wires to minimize impedance and crosstalk u New 7-pin connector design  for easier installation and better device reliability  takes 1/6 the area on the system board u CRC error checking on all data and control information u Increased BW supports data intensive applications such as  digital video production, digital audio storage and recording, high-speed file sharing u No configuration needed when a adding a 2 nd SATA drive  One cable for each drive eliminates the need for jumpers  No more figuring out which device is the master or slave u Today's hard drives are clearly below 100 MB/s  Do not benefit from UltraATA / SATA

39 Computer Architecture 2010 – PC Structure and Peripherals 39 The BIOS

40 Computer Architecture 2010 – PC Structure and Peripherals 40 System Start-up Upon computer turn-on several events occur: 1. The CPU "wakes up" and sends a message to activate the BIOS 2. BIOS runs the Power On Self Test (POST): make sure system devices are working ok  Initialize system hardware and chipset registers  Initialize power management  Test RAM  Enable the keyboard  Test serial and parallel ports  Initialize floppy disk drives and hard disk drive controllers  Displays system summary information

41 Computer Architecture 2010 – PC Structure and Peripherals 41 System Start-up (cont.) 3. During POST, the BIOS compares the system configuration data obtained from POST with the system information stored on a memory chip located on the MB  A CMOS chip, which is updated whenever new system components are added  Contains the latest information about system components 4. After the POST tasks are completed  the BIOS looks for the boot program responsible for loading the operating system  Usually, the BIOS looks on the floppy disk drive A: followed by drive C: 5. After boot program is loaded into memory  It loads the system configuration information contained in the registry in a Windows® environment, and device drivers 6. Finally, the operating system is loaded


Download ppt "Computer Architecture 2010 – PC Structure and Peripherals 1 Computer Architecture PC Structure and Peripherals Dr. Lihu Rappoport."

Similar presentations


Ads by Google