Download presentation
Presentation is loading. Please wait.
1
Computer Architecture 2010 – PC Structure and Peripherals 1 Computer Architecture PC Structure and Peripherals Dr. Lihu Rappoport
2
Computer Architecture 2010 – PC Structure and Peripherals 2 Memory
3
3 CapacitySpeed Logic2× in 3 years2× in 3 years DRAM4× in 3 years1.4× in 10 years Disk2× in 3 years1.4× in 10 years Technology Trends CPU-DRAM Memory Gap (latency)
4
Computer Architecture 2010 – PC Structure and Peripherals 4 SRAM vs. DRAM u Random Access: access time is the same for all locations DRAM – Dynamic RAMSRAM – Static RAM RefreshRegular refresh (~1% time)No refresh needed AddressAddress muxed: row+ columnAddress not multiplexed AccessNot true “Random Access”True “Random Access” densityHigh (1 Transistor/bit)Low (6 Transistor/bit) Powerlowhigh Speedslowfast Price/bitlowhigh Typical usageMain memorycache
5
Computer Architecture 2010 – PC Structure and Peripherals 5 Basic DRAM chip Addressing sequence Row address and then RAS# asserted RAS# to CAS# delay Column address and then CAS# asserted DATA transfer Row latch Row address decoder Column addr decoder Column latch CAS# RAS# Data Memory array Memory address bus Addr
6
Computer Architecture 2010 – PC Structure and Peripherals 6 Addressing sequence Access sequence Put row address on data bus and assert RAS# Wait for RAS# to CAS# delay (t RCD ) Put column address on data bus and assert CAS# DATA transfer Precharge t RAC –Access time RAS/CAS delay Precharge delay RAS# Data A[0:7] CAS# Data n Row iCol n Row j X CL - CAS latency X
7
Computer Architecture 2010 – PC Structure and Peripherals 7 DRAM Timing u CAS Latency: #clock cycles to access a specific column of data #clock cycle from the moment memory controller issues a column in the current row, and the data is read out from memory u RAS to CAS Delay: #clock cycles between row and column access u Row Pre-charge time: #clock cycles to close an open row, and open the next row u Active to Precharge Delay #clock cycles to access a specific row between the data request and the pre-charge command.
8
Computer Architecture 2010 – PC Structure and Peripherals 8 Basic SDRAM controller DRAM address decoder Time delay gen. address mux RAS# CAS# R/W# A[20:23] A[10:19] A[0:9] Memory address bus D[0:7] Select Chip select u DRAM data must be periodically refreshed Needed to keep data correct DRAM controller performs DRAM refresh, using refresh counter
9
Computer Architecture 2010 – PC Structure and Peripherals 9 Paged Mode DRAM – Multiple accesses to different columns from same row – Saves RAS and RAS to CAS delay Extended Data Output RAM (EDO RAM) – A data output latch enables to parallel next column address with current column data Improved DRAM Schemes RAS# Data A[0:7] CAS# Data nD n+1 RowXCol n XCol n+1 XCol n+2 X D n+2 X RAS# Data A[0:7] CAS# Data nData n+1 RowXCol n XCol n+1 XCol n+2 X Data n+2 X
10
Computer Architecture 2010 – PC Structure and Peripherals 10 Burst DRAM – Generates consecutive column address by itself Improved DRAM Schemes (cont) RAS# Data A[0:7] CAS# Data nData n+1 RowXCol n X Data n+2 X
11
Computer Architecture 2010 – PC Structure and Peripherals 11 Synchronous DRAM – SDRAM u All signals are referenced to an external clock (100MHz-200MHz) Makes timing more precise with other system devices u Multiple Banks Multiple pages open simultaneously (one per bank) u Command driven functionality instead of signal driven ACTIVE: selects both the bank and the row to be activated ACTIVE to a new bank can be issued while accessing current bank READ/WRITE: select column u Read and write accesses to the SDRAM are burst oriented Successive column locations accessed in the given row Burst length is programmable: 1, 2, 4, 8, and full-page May end full-page burst by BURST TERMINATE to get arbitrary burst length u A user programmable Mode Register CAS latency, burst length, burst type u Auto pre-charge: may close row at last read/write in burst u Auto refresh: internal counters generate refresh address
12
Computer Architecture 2010 – PC Structure and Peripherals 12 SDRAM Timing u t RCD : ACTIVE to READ/WRITE gap = t RCD (MIN) / clock period u t RC : successive ACTIVE to a different row in the same bank u t RRD : successive ACTIVE commands to different banks BL = 1
13
Computer Architecture 2010 – PC Structure and Peripherals 13 DDR-SDRAM u 2n-prefetch architecture The DRAM cells are clocked at the same speed as SDR SDRAM Internal data bus is twice the width of the external data bus Data capture occurs twice per clock cycle Lower half of the bus sampled at clock rise Upper half of the bus sampled at clock fall u Uses 2.5V (vs. 3.3V in SDRAM) Reduced power consumption 0:n-1 n:2n-1 0:n-1 200MHz clock 0:2n-1 SDRAM Array
14
Computer Architecture 2010 – PC Structure and Peripherals 14 DDR SDRAM Timing 133MHz clock cmd Bank Data Addr NOP X ACT Bank 0 Row iX RD Bank 0 Col j t RCD >20ns ACT Bank 0 Row l t RC >70ns ACT Bank 1 Row m t RRD >20ns CL=2 NOP X X X X X X RD Bank 1 Col n NOP X X X X X X X X j +1 +2 +3 n +1 +2 +3
15
Computer Architecture 2010 – PC Structure and Peripherals 15 DIMMs u DIMM: Dual In-line Memory Module A small circuit board that holds memory chips u 64-bit wide data path (72 bit with parity) Single sided: 9 chips, each with 8 bit data bus 512 Mbit / chip 8 chips 512 Mbyte per DIMM Dual sided: 18 chips, each with 4 bit data bus 256 Mbit / chip 16 chips 512 Mbyte per DIMM
16
Computer Architecture 2010 – PC Structure and Peripherals 16 DRAM Standards u SDR SDRAM: PC66, PC100 and PC133 u DDR SDRAM u Total BW for DDR400 3200M Byte/sec = 64 bit 2 200MHz / 8 (bit/byte) u Dual channel DDR SDRAM Uses 2 64 bit DIMM modules in parallel to get a 128 data bus Total BW for DDR400 dual channel: 6400M Byte/sec = 128 bit 2 200MHz /8 DDR200DDR266DDR333DDR400DDR533 Bus freq (MHz)100133167200266 Bit/pin (Mbps)200266333400533 Total bandwidth (M Byte/sec ) 16002133266632004264
17
Computer Architecture 2010 – PC Structure and Peripherals 17 DDR2 u DDR2 achieves high-speed using 4-bit prefetch architecture SDRAM cells read/write 4× the amount of data as the external bus DDR2-533 cell works at the same frequency as a DDR266 SDRAM or a PC133 SDRAM cell u This method comes at a price of increased latency DDR2-based systems may perform worse than DDR1-based systems
18
Computer Architecture 2010 – PC Structure and Peripherals 18 DDR2 – Other Features u Shortened page size for reduced activation power When ACTIVATE command is given, read all bits in the page A major contributor to the active power A device with shorter page size has significantly lower power 512Mb DDR2 page size is 1KByte vs. 2KB for 512Mb DDR1 u Eight banks in 1Gb densities and above Increases flexibility in DRAM accesses (also increases the power) DDR1DDR 2 DRAM Frequency100/133/166/200 MHz Bus Frequency100/133/166/200 MHz200/266/333/400 MHz Data Rate200/266/333/400 Mbps400/533/667/800 Mbps Operation Voltage2.5V1.8V CAS Latency2, 2.5, 33, 4, 5 Data Bandwidth3.2GBs6.4GBs Power Consumption399mW217mW
19
Computer Architecture 2010 – PC Structure and Peripherals 19 DDR2 Latency u DRAM timing, measured in I/O bus cycles, specifies 3 numbers CAS Latency - RAS to CAS Delay - RAS Precharge Time u DRAM latency is the time for accessing data in an open page E.g., latency for DDR400 2-3-2: 1/200MHz × 2.5 = 10ns DDR2-533 4-4-4 latency is 1.5× of to DDR400 2–3–2 30% bandwidth growth does not compensate for access time worsening DDR2-533 3-3-3 latency is only 12% worse than DDR400 2-3-2 MemoryMem ClkBus ClkTimingsLatencydual-channel BW DDR400200MHz 2.5 – 3 – 312.5 ns6.4 GB/sec DDR400200MHz 2–3–22–3–210 ns6.4 GB/sec DDR533266MHz 3–4–43–4–411.2 ns8.5 GB/sec DDR533266MHz 2.5 – 3 – 39.4 ns8.5 GB/sec DDR2-533133MHz266MHz4–4–44–4–415 ns8.5 GB/sec DDR2-533133MHz266MHz3–3–33–3–311.2 ns8.5 GB/sec DDR2-600150MHz300MHz5–5–55–5–516.6 ns9.6 GB/sec DDR2-600150MHz300MHz4–4–44–4–413.3 ns9.6 GB/sec
20
Computer Architecture 2010 – PC Structure and Peripherals 20 DDR2 Latency (cont.) u Performance tests DDR2-533 with 4-4-4 timings worse than DDR400 2–3–2 DDR2-533 with 3-3-3 timings better than DDR400 2–3–2 u DDR2-533 modules with 3-3-3 timings Supported by 925/915 best choice for enthusiastic users significant improvement u Over-clocked motherboards clock DDR2-533 at 600MHz realized through undocumented memory frequency ratios available in i925/i915 u The performance of DDR2-based systems is more sensitive to a lower latency than to a higher frequency We get practically nothing from using DDR2-600 SDRAM with i925/i915
21
Computer Architecture 2010 – PC Structure and Peripherals 21 DDR2 Standards Standard name Memory clock Cycle time I/O Bus clock Data transfers per second Module name Peak transfer rate Timings DDR2-400100 MHz10 ns200 MHz400 MillionPC2-32003200 MB/s 3-3-3 4-4-4 DDR2-533133 MHz7.5 ns266 MHz533 MillionPC2-42004266 MB/s 3-3-3 4-4-4 DDR2-667166 MHz6 ns333 MHz667 MillionPC2-53005333 MB/s 4-4-4 5-5-5 DDR2-800200 MHz5 ns400 MHz800 MillionPC2-64006400 MB/s 4-4-4 5-5-5 6-6-6 DDR2- 1066 266 MHz3.75 ns533 MHz 1066 Million PC2-85008533 MB/s 6-6-6 7-7-7
22
Computer Architecture 2010 – PC Structure and Peripherals 22 DDR3 u 30% a power consumption reduction compared to DDR2 1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V 90 nanometer fabrication technology u Higher bandwidth 8 bit deep prefetch buffer (vs. 4 bit in DDR2 and 2 bit in DDR) u Transfer data rate Effective clock rate of 800–1600 MHz using both rising and falling edges of a 400–800 MHz I/O clock. DDR2: 400–800 MHz using a 200–400 MHz I/O clock DDR: 200–400 MHz based on a 100–200 MHz I/O clock u DDR3 DIMMs 240 pins, the same number as DDR2, and are the same size Electrically incompatible, and have a different key notch location
23
Computer Architecture 2010 – PC Structure and Peripherals 23 DDR2 vs. DDR3 Performance The high latency of DDR3 SDRAM has negative effect on streaming operations Source: xbitlabsxbitlabs
24
Computer Architecture 2010 – PC Structure and Peripherals 24 SRAM – Static RAM u True random access u High speed, low density, high power u No refresh u Address not multiplexed u DDR SRAM 2 READs or 2 WRITEs per clock Common or Separate I/O DDRII: 200MHz to 333MHz Operation; Density: 18/36/72Mb+ u QDR SRAM Two separate DDR ports: one read and one write One DDR address bus: alternating between the read address and the write address QDRII: 250MHz to 333MHz Operation; Density: 18/36/72Mb+
25
Computer Architecture 2010 – PC Structure and Peripherals 25 Read Only Memory (ROM) u Random Access u Non volatile u ROM Types PROM – Programmable ROM Burnt once using special equipment EPROM – Erasable PROM Can be erased by exposure to UV, and then reprogrammed E 2 PROM – Electrically Erasable PROM Can be erased and reprogrammed on board Write time (programming) much longer than RAM Limited number of writes (thousands)
26
Computer Architecture 2010 – PC Structure and Peripherals 26 Flash Memory u Non-volatile, rewritable memory limited lifespan of around 100,000 write cycles u Flash drives compared to HD drives: Smaller size, faster, lighter, noiseless, consume less energy Withstanding shocks up to 2000 Gs Equivalent to a 10 foot drop onto concrete - without losing data Lower capacity (8GB), but going up Much more expensive (cost/byte): currently ~20$/1GB u NOR Flash Supports per-byte addressing Suitable for storing code (e.g. BIOS, cell phone SW) u NAND Flash Supports page-mode addressing (e.g., 1KB blocks) Suitable for storing large data (e.g. pictures, songs)
27
Computer Architecture 2010 – PC Structure and Peripherals 27 The Motherboard
28
Computer Architecture 2010 – PC Structure and Peripherals 28 Computer System Structure CPU PCI North Bridge DDRII Channel 1 mouse LAN Lan Adap External Graphics Card Mem BUS Cache Sound Card speakers South Bridge PCI express ×16 IDE controller IO Controller Old DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive keybrd DDRII Channel 2 USB controller SATA controller PCI express ×1 Memory controller On-board Graphics CPU BUS
29
Computer Architecture 2010 – PC Structure and Peripherals 29 Computer System Structure – New PCI North Bridge mouse LAN Lan Adap External Graphics Card CPU BUS Sound Card speakers South Bridge PCI express ×16 IDE controller IO Controller DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive keybrd USB controller SATA controller PCI express ×1 On-board Graphics CPU Cache DDRIII Channel 1 Mem BUS DDRIII Channel 2 Memory controller
30
Computer Architecture 2010 – PC Structure and Peripherals 30 The Motherboard IEEE- 1394a header audio header PCI add-in card connector PCI express x1 connector PCI express x16 connector Back panel connectors Processor core power connector Rear chassis fan header LGA775 processor socket GMCH: North Bridge + integ GFX Processor fan header DIMM Channel A sockets Serial port header DIMM Channel B sockets Diskette drive connector Main Power connector Battery ICH: South Bridge + integ Audio 4 × SATA connectors Front panel USB header Speaker Parallel ATA IDE connector High Def. Audio header PCI add-in card connector
31
Computer Architecture 2010 – PC Structure and Peripherals 31 How to get the most of Memory ? u Single Channel DDR u Dual channel DDR Each DIMM pair must be the same u Balance FSB and memory bandwidth 800MHz FSB provides 800MHz × 64bit / 8 = 6.4 G Byte/sec Dual Channel DDR400 SDRAM also provides 6.4 G Byte/sec L2 Cache CPU FSB – Front Side Bus Memory Memory Bus North Bridge DRAM Ctrlr CH A DDR DIMM DDR DIMM DDR DIMM DDR DIMM CH B L2 Cache CPU FSB – Front Side Bus North Bridge DRAM Ctrlr
32
Computer Architecture 2010 – PC Structure and Peripherals 32 How to get the most of Memory ? u Each DIMM supports 4 open pages simultaneously The more open pages, the more random access It is better to have more DIMMs n DIMMs: 4n open pages u DIMMs can be single sided or dual sided Dual sided DIMMs may have separate CS of each side In this case the number of open pages is doubled (goes up to 8) This is not a must – dual sided DIMMs may also have a common CS for both sides, in which case, there are only 4 open pages, as with single side
33
Computer Architecture 2010 – PC Structure and Peripherals 33 Hard Disks
34
Computer Architecture 2010 – PC Structure and Peripherals 34 Hard Disk Structure u Direct access u Nonvolatile, Large, inexpensive, and slow Lowest level in the memory hierarchy u Technology Rotating platters coated with a magnetic surface Use a moveable read/write head to access the disk Each platter is divided to tracks: concentric circles Each track is divided to sectors Smallest unit that can be read or written Disk outer parts have more space for sectors than the inner parts Constant bit density: record more sectors on the outer tracks speed varies with track location u Buffer Cache A temporary data storage area used to enhance drive performance Platters Track Sector
35
Computer Architecture 2010 – PC Structure and Peripherals 35 The IBM Ultrastar 36ZX u Top view of a 36 GB, 10,000 RPM, IBM SCSI server hard disk u 10 stacked platters
36
Computer Architecture 2010 – PC Structure and Peripherals 36 Disk Access Read/write data is a three-stage process u Seek time: position the arm over the proper track Average: Sum of the time for all possible seek / total # of possible seeks Due to locality of disk reference, actual average seek is shorter: 4 to 12 ms u Rotational latency: wait for desired sector to rotate under head The faster the drives spins, the shorter the rotational latency time Most disks rotate at 5,400 to 15,000 RPM At 7200 RPM: 8 ms per revolution An average latency to the desired information is halfway around the disk At 7200 RPM: 4 ms u Transfer block: read/write the data Transfer Time is a function of: Sector size Rotation speed Recording density: bits per inch on a track Typical values: 100 MB / sec u Disk Access Time = Seek time + Rotational Latency + Transfer time + Controller Time + Queuing Delay
37
Computer Architecture 2010 – PC Structure and Peripherals 37 The Disk Interface – EIDE u EIDE, ATA, UltraATA, ATA 100, ATAPI: all the same interface Uses for connecting hard disk drives and CD-ROM drives 80-pin cable, 40-pin dual header connector 100 MB/s (ATA66 is only 66MB/s) EIDE controller integrated with the motherboard (in the ICH) u EIDE controller has two channels: primary and a secondary Work independently Two devices per channel: master and slave, but equal The 2 devices have to take turns controlling the bus A total of four devices per cont If there are two device on the system (e.g., a hard disk and a CD-ROM) It is better to put them on different channels Avoid mixing slower (CD) and faster devices (HDD) on the same channel If doing a lot of copying from a CD-ROM drive to the CD-RW Better performance by separating devices to separate channels
38
Computer Architecture 2010 – PC Structure and Peripherals 38 The Disk Interface – Serial ATA (SATA) u Point-to-point connection Ensures dedicated 150 MB/s per device (no sharing) u Dual controllers allow independent operation of each device u Thinner (7 wires), flexible, longer cables Easier routing and improved airflow 4 wires for signaling + 3 ground wires to minimize impedance and crosstalk u New 7-pin connector design for easier installation and better device reliability takes 1/6 the area on the system board u CRC error checking on all data and control information u Increased BW supports data intensive applications such as digital video production, digital audio storage and recording, high-speed file sharing u No configuration needed when a adding a 2 nd SATA drive One cable for each drive eliminates the need for jumpers No more figuring out which device is the master or slave u Today's hard drives are clearly below 100 MB/s Do not benefit from UltraATA / SATA
39
Computer Architecture 2010 – PC Structure and Peripherals 39 The BIOS
40
Computer Architecture 2010 – PC Structure and Peripherals 40 System Start-up Upon computer turn-on several events occur: 1. The CPU "wakes up" and sends a message to activate the BIOS 2. BIOS runs the Power On Self Test (POST): make sure system devices are working ok Initialize system hardware and chipset registers Initialize power management Test RAM Enable the keyboard Test serial and parallel ports Initialize floppy disk drives and hard disk drive controllers Displays system summary information
41
Computer Architecture 2010 – PC Structure and Peripherals 41 System Start-up (cont.) 3. During POST, the BIOS compares the system configuration data obtained from POST with the system information stored on a memory chip located on the MB A CMOS chip, which is updated whenever new system components are added Contains the latest information about system components 4. After the POST tasks are completed the BIOS looks for the boot program responsible for loading the operating system Usually, the BIOS looks on the floppy disk drive A: followed by drive C: 5. After boot program is loaded into memory It loads the system configuration information contained in the registry in a Windows® environment, and device drivers 6. Finally, the operating system is loaded
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.