Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Cache Internal External

Similar presentations


Presentation on theme: "Memory Cache Internal External"— Presentation transcript:

1 Memory Cache Internal External
Computer memory is organized into hierarchy.

2 Location – is memory internal or external?
Characteristics We classify memory systems according to their characteristics. Location – is memory internal or external? Capacity – 1 byte or word = 8 bits or 16bits or 32bits Unit of transfer – equal to the No of data lines into and out of memory module. Access method – method of accessing units of data (sequential, direct, random, associative) Performance – access time (latency), memory cycle time, transfer rate Physical type – semiconductor, magnetic surface, optical Physical characteristics – …of data storage…volatile… Organisation – the physical arrangement of bits to form words.

3 Location CPU Internal External

4 Capacity Word size Number of words The natural unit of organisation
or Bytes

5 Unit of Transfer Internal External Addressable unit
Usually governed by data bus width External Usually a block which is much larger than a word Addressable unit Smallest location which can be uniquely addressed Word internally Cluster on M$ disks

6 Access Methods (1) Sequential Direct
Start at the beginning and read through in order Access time depends on location of data and previous location e.g. tape Direct Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location e.g. disk

7 Access Methods (2) Random Associative
Individual addresses identify locations exactly Access time is independent of location or previous access e.g. RAM Associative Data is located by a comparison with contents of a portion of the store e.g. cache

8 Internal or Main memory
Memory Hierarchy Registers In CPU Internal or Main memory May include one or more levels of cache “RAM” External memory Backing store

9 Memory Hierarchy - Diagram

10 Performance Access time Memory Cycle time Transfer Rate
Time between presenting the address and getting the valid data Memory Cycle time Time may be required for the memory to “recover” before next access Cycle time is access + recovery Transfer Rate Rate at which data can be moved

11 Physical Types Semiconductor Magnetic Optical Others RAM Disk & Tape
CD & DVD Others Bubble Hologram

12 Physical Characteristics
Decay Volatility Erasable Power consumption

13 Organisation Physical arrangement of bits into words Not always obvious e.g. interleaved

14 The Bottom Line How much? How fast? How expensive? Dilemma Capacity
Time is money How expensive? Dilemma Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time Another Dilemma As one goes down the hierarchy… Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor

15 Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape

16 This would need no cache This would cost a very large amount
So you want fast? It is possible to build a computer which uses only static RAM (see later) This would be very fast This would need no cache How can you cache cache? This would cost a very large amount

17 Small amount of fast memory Sits between normal main memory and CPU
Cache …is intended to give memory speed approaching that of the fastest memories available, and to provide a large memory size at the price of less expensive types of semiconductor memory. Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module

18 Cache/Main Memory Structure
Main memory consists of up to 2n addressable words, with each word having a unique n-bit address. For mapping purposes, this memory is considered to consist of a number of fixed-length blocks of K words each. There are M = 2n /K blocks and the cache consists of C lines Each line contains K words and a tag of a few bits (line size) C << M.

19 Cache operation – overview
CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

20 Cache Read Operation - Flowchart

21 Replacement Algorithm Write Policy Block Size Number of Caches
Cache Design Design Elements Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches

22 Size does matter Cost Speed More cache is expensive
More cache is faster (up to a point) Checking cache for data takes time

23 Typical Cache Organization

24 Comparison of Cache Sizes
Comparison of Cache Sizes Processor Type Year of Introduction L1 cachea L2 cache L3 cache IBM 360/85 Mainframe 1968 16 to 32 KB PDP-11/70 Minicomputer 1975 1 KB VAX 11/780 1978 16 KB IBM 3033 64 KB IBM 3090 1985 128 to 256 KB Intel 80486 PC 1989 8 KB Pentium 1993 8 KB/8 KB 256 to 512 KB PowerPC 601 32 KB PowerPC 620 1996 32 KB/32 KB PowerPC G4 PC/server 1999 256 KB to 1 MB 2 MB IBM S/390 G4 1997 256 KB IBM S/390 G6 8 MB Pentium 4 2000 IBM SP High-end server/ supercomputer 64 KB/32 KB CRAY MTAb Supercomputer Itanium 2001 16 KB/16 KB 96 KB 4 MB SGI Origin 2001 High-end server Itanium 2 2002 6 MB IBM POWER5 2003 1.9 MB 36 MB CRAY XD-1 2004 64 KB/64 KB 1MB a Two values seperated by a slash refer to instruction and data caches b Both caches are instruction only; no data caches

25 Mapping Function Cache of 64kByte Cache block of 4 bytes
i.e. cache is 16k (214) lines of 4 bytes 16MBytes main memory 24 bit address (224=16M)

26 Each block of main memory maps to only one cache line
Direct Mapping Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)

27 Direct Mapping Address Structure
Tag s-r Line or Slot r Word w 14 2 8 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

28 Direct Mapping Cache Line Table
Cache line Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1

29 Direct Mapping Cache Organization

30 Direct Mapping Example

31 Direct Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = m = 2r Size of tag = (s – r) bits

32 Direct Mapping pros & cons
Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

33 Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive

34 Fully Associative Cache Organization

35 Associative Mapping Example

36 Associative Mapping Address Structure
Word 2 bit Tag 22 bit 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC FFF

37 Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits

38 Set Associative Mapping
Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

39 Set Associative Mapping Example
13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set

40 Two Way Set Associative Cache Organization

41 Set Associative Mapping Address Structure
Tag 9 bit Set 13 bit Word 2 bit Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF FFF 001 7FFC FFF

42 Two Way Set Associative Mapping Example

43 Set Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s – d) bits

44 Replacement Algorithms (1) Direct mapping
No choice Each block only maps to one line Replace that line

45 Replacement Algorithms (2) Associative & Set Associative
Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random

46 Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

47 Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes Remember bogus write through caches!

48 Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

49 Pentium 4 Cache 80386 – no on chip cache
80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions Pentium III – L3 cache added off chip Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative L3 cache on chip

50 Processor on which feature first appears
Intel Cache Evolution Problem Solution Processor on which feature first appears External memory slower than the system bus. Add external cache using faster memory technology. 386 Increased processor speed results in external bus becoming a bottleneck for cache access. Move external cache on-chip, operating at the same speed as the processor. 486 Internal cache is rather small, due to limited space on chip Add external L2 cache using faster technology than main memory Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place. Create separate data and instruction caches. Pentium Increased processor speed results in external bus becoming a bottleneck for L2 cache access. Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache. Pentium Pro Move L2 cache on to the processor chip. Pentium II Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small. Add external L3 cache. Pentium III Move L3 cache on-chip. Pentium 4

51 Pentium 4 Block Diagram

52 Pentium 4 Core Processor
Fetch/Decode Unit Fetches instructions from L2 cache Decode into micro-ops Store micro-ops in L1 cache Out of order execution logic Schedules micro-ops Based on data dependence and resources May speculatively execute Execution units Execute micro-ops Data from L1 cache Results in registers Memory subsystem L2 cache and systems bus

53 Pentium 4 Design Reasoning
Decodes instructions into RISC like micro-ops before L1 cache Micro-ops fixed length Superscalar pipelining and scheduling Pentium instructions long & complex Performance improved by separating decoding from scheduling & pipelining (More later – ch14) Data cache is write back Can be configured to write through L1 cache controlled by 2 bits in register CD = cache disable NW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate L2 and L3 8-way set-associative Line size 128 bytes

54 Internal Memory

55 Semiconductor Memory Types

56 Semiconductor Memory RAM
Misnamed as all semiconductor memory is random access Read/Write Volatile Temporary storage Static or dynamic

57 Memory Cell Operation Functional terminals – they carry electrical signal It sets the state of the cell to 1 or 0.

58 Bits stored as charge in capacitors Charges leak
Dynamic RAM Bits stored as charge in capacitors Charges leak Need refreshing even when powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory Essentially analogue Level of charge determines value

59 Dynamic RAM Structure Is activated when the bit value from this cell is to be read or written. Transistor acts as a switch that is closed (allowing current to flow) if a voltage is applied to the address line and open if no voltage is present on the address line. For read operation, when the address line is selected, the transistor turns on and the charge stored on the capacitor is fed out onto a bit line and to a sense amplifier, which compares the capacitor voltage to a reference value and determines if the cell contains a logic 1 or a logic 0. For write operation, a voltage signal is applied to the bit line; a high voltage represents 1 and a low voltage represents 0. A signal is then applied to the address line, allowing a charge to be transferred to the capacitor. The readout from the cell discharges the capacitor, which must be restored to complete the operation.

60 DRAM Operation Address line active when bit read or written Write Read
Transistor switch closed (current flows) Write Voltage to bit line High for 1 low for 0 Then signal address line Transfers charge to capacitor Read Address line selected transistor turns on Charge from capacitor fed via bit line to sense amplifier Compares with reference value to determine 0 or 1 Capacitor charge must be restored

61 Bits stored as on/off switches No charges to leak
Static RAM Bits stored as on/off switches No charges to leak No refreshing needed when powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache Digital Uses flip-flops

62 Stating RAM Structure The circles at the head of T3 and T4 indicate signal negation. Four transistors T1, T2, T3 and T4 are cross connected in an arrangement that produces a stable logic state. off on Both states are stable as long as the direct current (dc) voltage is applied. H L Logic state 1 or 0 on off

63 Transistor arrangement gives stable logic state State 1
Static RAM Operation Transistor arrangement gives stable logic state State 1 C1 high, C2 low T1 T4 off, T2 T3 on State 0 C2 high, C1 low T2 T3 off, T1 T4 on Address line transistors T5 T6 is switch Write – apply value to B & compliment to B Read – value is on line B

64 SRAM v DRAM Both volatile Dynamic cell Static
Power needed to preserve data Dynamic cell Simpler to build, smaller More dense Less expensive Needs refresh Larger memory units Static Faster Cache

65 Microprogramming (see later) Library subroutines
Read Only Memory (ROM) Permanent storage Nonvolatile Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables

66 Written during manufacture Programmable (once)
Types of ROM Written during manufacture Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory Erase whole memory electrically

67 Organisation in detail
A 16Mbit chip can be organised as 1M of 16 bit words A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array Reduces number of address pins Multiplex row address and column address 11 pins to address (211=2048) Adding one more pin doubles range of values so x4 capacity

68 Refreshing Refresh circuit included on chip Disable chip Count through rows Read & Write back Takes time Slows down apparent performance

69 Typical 16 Mb DRAM (4M x 4)

70 Packaging

71 256kByte Module Organisation

72 1MByte Module Organisation

73 Detected using Hamming error correcting code
Error Correction Hard Failure Permanent defect Soft Error Random, non-destructive No permanent damage to memory Detected using Hamming error correcting code

74 Error Correcting Code Function

75 Advanced DRAM Organization
Basic DRAM same since first RAM chips Enhanced DRAM Contains small SRAM as well SRAM holds last line read (c.f. Cache!) Cache DRAM Larger SRAM component Use as cache or serial buffer

76 Synchronous DRAM (SDRAM)
Access is synchronized with an external clock Address is presented to RAM RAM finds data (CPU waits in conventional DRAM) Since SDRAM moves data in time with system clock, CPU knows when data will be ready CPU does not have to wait, it can do something else Burst mode allows SDRAM to set up stream of data and fire it out in block DDR-SDRAM sends data twice per clock cycle (leading & trailing edge)

77 SDRAM

78 SDRAM Read Timing

79 Adopted by Intel for Pentium & Itanium Main competitor to SDRAM
RAMBUS Adopted by Intel for Pentium & Itanium Main competitor to SDRAM Vertical package – all pins on one side Data exchange over 28 wires < cm long Bus addresses up to 320 RDRAM chips at 1.6Gbps Asynchronous block protocol 480ns access time Then 1.6 Gbps

80 RAMBUS Diagram

81 SDRAM can only send data once per clock
DDR SDRAM SDRAM can only send data once per clock Double-data-rate SDRAM can send data twice per clock cycle Rising edge and falling edge

82 Integrates small SRAM cache (16 kb) onto generic DRAM chip
Cache DRAM Mitsubishi Integrates small SRAM cache (16 kb) onto generic DRAM chip Used as true cache 64-bit lines Effective for ordinary random access To support serial access of block of data E.g. refresh bit-mapped screen CDRAM can prefetch data from DRAM into SRAM buffer Subsequent accesses solely to SRAM

83 Chapter 6 External Memory

84 Types of External Memory
Magnetic Disk RAID Removable Optical CD-ROM CD-Recordable (CD-R) CD-R/W DVD Magnetic Tape

85 Disk substrate coated with magnetizable material (iron oxide…rust)
Magnetic Disk Disk substrate coated with magnetizable material (iron oxide…rust) Substrate used to be aluminium Now glass Improved surface uniformity Increases reliability Reduction in surface defects Reduced read/write errors Lower flight heights (See later) Better stiffness Better shock/damage resistance

86 Read and Write Mechanisms
Recording & retrieval via conductive coil called a head May be single read/write head or separate ones During read/write, head is stationary, platter rotates Write Current through coil produces magnetic field Pulses sent to head Magnetic pattern recorded on surface below Read (traditional) Magnetic field moving relative to coil produces current Coil is the same for read and write Read (contemporary) Separate read head, close to write head Partially shielded magneto resistive (MR) sensor Electrical resistance depends on direction of magnetic field High frequency operation Higher storage density and speed

87 Inductive Write MR Read

88 Data Organization and Formatting
Concentric rings or tracks Gaps between tracks Reduce gap to increase capacity Same number of bits per track (variable packing density) Constant angular velocity Tracks divided into sectors Minimum block size is one sector May have more than one sector per block

89 Disk Data Layout

90 Disk Velocity Bit near centre of rotating disk passes fixed point slower than bit on outside of disk Increase spacing between bits in different tracks Rotate disk at constant angular velocity (CAV) Gives pie shaped sectors and concentric tracks Individual tracks and sectors addressable Move head to given track and wait for given sector Waste of space on outer tracks Lower data density Can use zones to increase capacity Each zone has fixed bits per track More complex circuitry

91 Disk Layout Methods Diagram

92 Must be able to identify start of track and sector Format disk
Finding Sectors Must be able to identify start of track and sector Format disk Additional information not available to user Marks tracks and sectors

93 Winchester Disk Format Seagate ST506

94 Fixed (rare) or movable head Removable or fixed
Characteristics Fixed (rare) or movable head Removable or fixed Single or double (usually) sided Single or multiple platter Head mechanism Contact (Floppy) Fixed gap Flying (Winchester)

95 Fixed/Movable Head Disk
Fixed head One read write head per track Heads mounted on fixed ridged arm Movable head One read write head per side Mounted on a movable arm

96 Removable or Not Removable disk Nonremovable disk
Can be removed from drive and replaced with another disk Provides unlimited storage capacity Easy data transfer between systems Nonremovable disk Permanently mounted in the drive

97 Heads are joined and aligned
Multiple Platter One head per side Heads are joined and aligned Aligned tracks on each platter form cylinders Data is striped by cylinder reduces head movement Increases speed (transfer rate)

98 Multiple Platters

99 Tracks and Cylinders

100 Floppy Disk 8”, 5.25”, 3.5” Small capacity Slow Universal Cheap
Up to 1.44Mbyte (2.88M never popular) Slow Universal Cheap Obsolete?

101 Winchester Hard Disk (1)
Developed by IBM in Winchester (USA) Sealed unit One or more platters (disks) Heads fly on boundary layer of air as disk spins Very small head to disk gap Getting more robust

102 Winchester Hard Disk (2)
Universal Cheap Fastest external storage Getting larger all the time 250 Gigabyte now easily available

103 Access time = Seek + Latency Transfer rate
Speed Seek time Moving head to correct track (Rotational) latency Waiting for data to rotate under head Access time = Seek + Latency Transfer rate

104 Timing of Disk I/O Transfer

105 RAID Redundant Array of Independent Disks Redundant Array of Inexpensive Disks 6 levels in common use Not a hierarchy Set of physical disks viewed as single logical drive by O/S Data distributed across physical drives Can use redundant capacity to store parity information

106 Data striped across all disks Round Robin striping Increase speed
RAID 0 No redundancy Data striped across all disks Round Robin striping Increase speed Multiple data requests probably not on same disk Disks seek in parallel A set of data is likely to be striped across multiple disks

107 Data is striped across disks 2 copies of each stripe on separate disks
RAID 1 Mirrored Disks Data is striped across disks 2 copies of each stripe on separate disks Read from either Write to both Recovery is simple Swap faulty disk & re-mirror No down time Expensive

108 Disks are synchronized Very small stripes
RAID 2 Disks are synchronized Very small stripes Often single byte/word Error correction calculated across corresponding bits on disks Multiple parity disks store Hamming code error correction in corresponding positions Lots of redundancy Expensive Not used

109 RAID 3 Similar to RAID 2 Only one redundant disk, no matter how large the array Simple parity bit for each set of corresponding bits Data on failed drive can be reconstructed from surviving data and parity info Very high transfer rates

110 RAID 4 Each disk operates independently Good for high I/O request rate Large stripes Bit by bit parity calculated across stripes on each disk Parity stored on parity disk

111 RAID 5 Like RAID 4 Parity striped across all disks Round robin allocation for parity stripe Avoids RAID 4 bottleneck at parity disk Commonly used in network servers N.B. DOES NOT MEAN 5 DISKS!!!!!

112 Two parity calculations Stored in separate blocks on different disks
RAID 6 Two parity calculations Stored in separate blocks on different disks User requirement of N disks needs N+2 High data availability Three disks need to fail for data loss Significant write penalty

113 RAID 0, 1, 2

114 RAID 3 & 4

115 RAID 5 & 6

116 Data Mapping For RAID 0

117 Optical Storage CD-ROM
Originally for audio 650Mbytes giving over 70 minutes audio Polycarbonate coated with highly reflective coat, usually aluminium Data stored as pits Read by reflecting laser Constant packing density Constant linear velocity

118 CD Operation

119 Other speeds are quoted as multiples e.g. 24x
CD-ROM Drive Speeds Audio is single speed Constant linier velocity 1.2 ms-1 Track (spiral) is 5.27km long Gives 4391 seconds = 73.2 minutes Other speeds are quoted as multiples e.g. 24x Quoted figure is maximum drive can achieve

120 CD-ROM Format Mode 0=blank data field Mode 1=2048 byte data+error correction Mode 2=2336 byte data

121 Random Access on CD-ROM
Difficult Move head to rough position Set correct speed Read address Adjust to required location (Yawn!)

122 CD-ROM for & against Large capacity (?) Easy to mass produce Removable Robust Expensive for small runs Slow Read only

123 Other Optical Storage CD-Recordable (CD-R) CD-RW WORM Now affordable
Compatible with CD-ROM drives CD-RW Erasable Getting cheaper Mostly CD-ROM drive compatible Phase change Material has two different reflectivities in different phase states

124 Digital Versatile Disk
DVD - what’s in a name? Digital Video Disk Used to indicate a player for movies Only plays video disks Digital Versatile Disk Used to indicate a computer drive Will read computer disks and play video disks Dogs Veritable Dinner Officially - nothing!!!

125 Very high capacity (4.7G per layer) Full length movie on single disk
DVD - technology Multi-layer Very high capacity (4.7G per layer) Full length movie on single disk Using MPEG compression Finally standardized (honest!) Movies carry regional coding Players only play correct region films Can be “fixed”

126 DVD – Writable Loads of trouble with standards First generation DVD drives may not read first generation DVD-W disks First generation DVD drives may not read CD-RW disks Wait for it to settle down before buying!

127 CD and DVD

128 Magnetic Tape Serial access Slow Very cheap Backup and archive

129 Optical Storage Technology Association
Internet Resources Optical Storage Technology Association Good source of information about optical storage technology and vendors Extensive list of relevant links DLTtape Good collection of technical information and links to vendors Search on RAID


Download ppt "Memory Cache Internal External"

Similar presentations


Ads by Google