Memory Cache Internal External

Memory Cache Internal External
Computer memory is organized into hierarchy.

Location – is memory internal or external?
Characteristics We classify memory systems according to their characteristics. Location – is memory internal or external? Capacity – 1 byte or word = 8 bits or 16bits or 32bits Unit of transfer – equal to the No of data lines into and out of memory module. Access method – method of accessing units of data (sequential, direct, random, associative) Performance – access time (latency), memory cycle time, transfer rate Physical type – semiconductor, magnetic surface, optical Physical characteristics – …of data storage…volatile… Organisation – the physical arrangement of bits to form words.

Location CPU Internal External

Capacity Word size Number of words The natural unit of organisation
or Bytes

Unit of Transfer Internal External Addressable unit
Usually governed by data bus width External Usually a block which is much larger than a word Addressable unit Smallest location which can be uniquely addressed Word internally Cluster on M$ disks

Access Methods (1) Sequential Direct
Start at the beginning and read through in order Access time depends on location of data and previous location e.g. tape Direct Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location e.g. disk

Access Methods (2) Random Associative
Individual addresses identify locations exactly Access time is independent of location or previous access e.g. RAM Associative Data is located by a comparison with contents of a portion of the store e.g. cache

Internal or Main memory
Memory Hierarchy Registers In CPU Internal or Main memory May include one or more levels of cache “RAM” External memory Backing store

Memory Hierarchy - Diagram

Performance Access time Memory Cycle time Transfer Rate
Time between presenting the address and getting the valid data Memory Cycle time Time may be required for the memory to “recover” before next access Cycle time is access + recovery Transfer Rate Rate at which data can be moved

Physical Types Semiconductor Magnetic Optical Others RAM Disk & Tape
CD & DVD Others Bubble Hologram

Physical Characteristics
Decay Volatility Erasable Power consumption

Organisation Physical arrangement of bits into words Not always obvious e.g. interleaved

The Bottom Line How much? How fast? How expensive? Dilemma Capacity
Time is money How expensive? Dilemma Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time Another Dilemma As one goes down the hierarchy… Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor

Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape

This would need no cache This would cost a very large amount
So you want fast? It is possible to build a computer which uses only static RAM (see later) This would be very fast This would need no cache How can you cache cache? This would cost a very large amount

Small amount of fast memory Sits between normal main memory and CPU
Cache …is intended to give memory speed approaching that of the fastest memories available, and to provide a large memory size at the price of less expensive types of semiconductor memory. Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module

Cache/Main Memory Structure
Main memory consists of up to 2n addressable words, with each word having a unique n-bit address. For mapping purposes, this memory is considered to consist of a number of fixed-length blocks of K words each. There are M = 2n /K blocks and the cache consists of C lines Each line contains K words and a tag of a few bits (line size) C << M.

Cache operation – overview
CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

Cache Read Operation - Flowchart

Replacement Algorithm Write Policy Block Size Number of Caches
Cache Design Design Elements Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches

Size does matter Cost Speed More cache is expensive
More cache is faster (up to a point) Checking cache for data takes time

Typical Cache Organization

Comparison of Cache Sizes
Comparison of Cache Sizes Processor Type Year of Introduction L1 cachea L2 cache L3 cache IBM 360/85 Mainframe 1968 16 to 32 KB — PDP-11/70 Minicomputer 1975 1 KB VAX 11/780 1978 16 KB IBM 3033 64 KB IBM 3090 1985 128 to 256 KB Intel 80486 PC 1989 8 KB Pentium 1993 8 KB/8 KB 256 to 512 KB PowerPC 601 32 KB PowerPC 620 1996 32 KB/32 KB PowerPC G4 PC/server 1999 256 KB to 1 MB 2 MB IBM S/390 G4 1997 256 KB IBM S/390 G6 8 MB Pentium 4 2000 IBM SP High-end server/ supercomputer 64 KB/32 KB CRAY MTAb Supercomputer Itanium 2001 16 KB/16 KB 96 KB 4 MB SGI Origin 2001 High-end server Itanium 2 2002 6 MB IBM POWER5 2003 1.9 MB 36 MB CRAY XD-1 2004 64 KB/64 KB 1MB a Two values seperated by a slash refer to instruction and data caches b Both caches are instruction only; no data caches

Mapping Function Cache of 64kByte Cache block of 4 bytes
i.e. cache is 16k (214) lines of 4 bytes 16MBytes main memory 24 bit address (224=16M)

Each block of main memory maps to only one cache line
Direct Mapping Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)

Direct Mapping Address Structure
Tag s-r Line or Slot r Word w 14 2 8 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

Direct Mapping Cache Line Table
Cache line Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1

Direct Mapping Cache Organization

Direct Mapping Example

Direct Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = m = 2r Size of tag = (s – r) bits

Direct Mapping pros & cons
Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive

Fully Associative Cache Organization

Associative Mapping Example

Associative Mapping Address Structure
Word 2 bit Tag 22 bit 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC FFF

Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits

Set Associative Mapping
Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

Set Associative Mapping Example
13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set

Two Way Set Associative Cache Organization

Set Associative Mapping Address Structure
Tag 9 bit Set 13 bit Word 2 bit Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF FFF 001 7FFC FFF

Two Way Set Associative Mapping Example

Set Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s – d) bits

Replacement Algorithms (1) Direct mapping
No choice Each block only maps to one line Replace that line

Replacement Algorithms (2) Associative & Set Associative
Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random

Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes Remember bogus write through caches!

Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

Pentium 4 Cache 80386 – no on chip cache
80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions Pentium III – L3 cache added off chip Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative L3 cache on chip

Processor on which feature first appears
Intel Cache Evolution Problem Solution Processor on which feature first appears External memory slower than the system bus. Add external cache using faster memory technology. 386 Increased processor speed results in external bus becoming a bottleneck for cache access. Move external cache on-chip, operating at the same speed as the processor. 486 Internal cache is rather small, due to limited space on chip Add external L2 cache using faster technology than main memory Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place. Create separate data and instruction caches. Pentium Increased processor speed results in external bus becoming a bottleneck for L2 cache access. Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache. Pentium Pro Move L2 cache on to the processor chip. Pentium II Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small. Add external L3 cache. Pentium III Move L3 cache on-chip. Pentium 4

Pentium 4 Block Diagram

Pentium 4 Core Processor
Fetch/Decode Unit Fetches instructions from L2 cache Decode into micro-ops Store micro-ops in L1 cache Out of order execution logic Schedules micro-ops Based on data dependence and resources May speculatively execute Execution units Execute micro-ops Data from L1 cache Results in registers Memory subsystem L2 cache and systems bus

Pentium 4 Design Reasoning
Decodes instructions into RISC like micro-ops before L1 cache Micro-ops fixed length Superscalar pipelining and scheduling Pentium instructions long & complex Performance improved by separating decoding from scheduling & pipelining (More later – ch14) Data cache is write back Can be configured to write through L1 cache controlled by 2 bits in register CD = cache disable NW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate L2 and L3 8-way set-associative Line size 128 bytes

Internal Memory

Semiconductor Memory Types

Semiconductor Memory RAM
Misnamed as all semiconductor memory is random access Read/Write Volatile Temporary storage Static or dynamic

Memory Cell Operation Functional terminals – they carry electrical signal It sets the state of the cell to 1 or 0.

Bits stored as charge in capacitors Charges leak
Dynamic RAM Bits stored as charge in capacitors Charges leak Need refreshing even when powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory Essentially analogue Level of charge determines value

Dynamic RAM Structure Is activated when the bit value from this cell is to be read or written. Transistor acts as a switch that is closed (allowing current to flow) if a voltage is applied to the address line and open if no voltage is present on the address line. For read operation, when the address line is selected, the transistor turns on and the charge stored on the capacitor is fed out onto a bit line and to a sense amplifier, which compares the capacitor voltage to a reference value and determines if the cell contains a logic 1 or a logic 0. For write operation, a voltage signal is applied to the bit line; a high voltage represents 1 and a low voltage represents 0. A signal is then applied to the address line, allowing a charge to be transferred to the capacitor. The readout from the cell discharges the capacitor, which must be restored to complete the operation.

DRAM Operation Address line active when bit read or written Write Read
Transistor switch closed (current flows) Write Voltage to bit line High for 1 low for 0 Then signal address line Transfers charge to capacitor Read Address line selected transistor turns on Charge from capacitor fed via bit line to sense amplifier Compares with reference value to determine 0 or 1 Capacitor charge must be restored

Bits stored as on/off switches No charges to leak
Static RAM Bits stored as on/off switches No charges to leak No refreshing needed when powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache Digital Uses flip-flops

Stating RAM Structure The circles at the head of T3 and T4 indicate signal negation. Four transistors T1, T2, T3 and T4 are cross connected in an arrangement that produces a stable logic state. off on Both states are stable as long as the direct current (dc) voltage is applied. H L Logic state 1 or 0 on off

Transistor arrangement gives stable logic state State 1
Static RAM Operation Transistor arrangement gives stable logic state State 1 C1 high, C2 low T1 T4 off, T2 T3 on State 0 C2 high, C1 low T2 T3 off, T1 T4 on Address line transistors T5 T6 is switch Write – apply value to B & compliment to B Read – value is on line B

SRAM v DRAM Both volatile Dynamic cell Static
Power needed to preserve data Dynamic cell Simpler to build, smaller More dense Less expensive Needs refresh Larger memory units Static Faster Cache

Microprogramming (see later) Library subroutines
Read Only Memory (ROM) Permanent storage Nonvolatile Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables

Written during manufacture Programmable (once)
Types of ROM Written during manufacture Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory Erase whole memory electrically

Organisation in detail
A 16Mbit chip can be organised as 1M of 16 bit words A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array Reduces number of address pins Multiplex row address and column address 11 pins to address (211=2048) Adding one more pin doubles range of values so x4 capacity

Refreshing Refresh circuit included on chip Disable chip Count through rows Read & Write back Takes time Slows down apparent performance

Typical 16 Mb DRAM (4M x 4)

Packaging

256kByte Module Organisation

1MByte Module Organisation

Detected using Hamming error correcting code
Error Correction Hard Failure Permanent defect Soft Error Random, non-destructive No permanent damage to memory Detected using Hamming error correcting code

Error Correcting Code Function

Advanced DRAM Organization
Basic DRAM same since first RAM chips Enhanced DRAM Contains small SRAM as well SRAM holds last line read (c.f. Cache!) Cache DRAM Larger SRAM component Use as cache or serial buffer

Synchronous DRAM (SDRAM)
Access is synchronized with an external clock Address is presented to RAM RAM finds data (CPU waits in conventional DRAM) Since SDRAM moves data in time with system clock, CPU knows when data will be ready CPU does not have to wait, it can do something else Burst mode allows SDRAM to set up stream of data and fire it out in block DDR-SDRAM sends data twice per clock cycle (leading & trailing edge)

SDRAM Read Timing

Adopted by Intel for Pentium & Itanium Main competitor to SDRAM
RAMBUS Adopted by Intel for Pentium & Itanium Main competitor to SDRAM Vertical package – all pins on one side Data exchange over 28 wires < cm long Bus addresses up to 320 RDRAM chips at 1.6Gbps Asynchronous block protocol 480ns access time Then 1.6 Gbps

RAMBUS Diagram

SDRAM can only send data once per clock
DDR SDRAM SDRAM can only send data once per clock Double-data-rate SDRAM can send data twice per clock cycle Rising edge and falling edge

Integrates small SRAM cache (16 kb) onto generic DRAM chip
Cache DRAM Mitsubishi Integrates small SRAM cache (16 kb) onto generic DRAM chip Used as true cache 64-bit lines Effective for ordinary random access To support serial access of block of data E.g. refresh bit-mapped screen CDRAM can prefetch data from DRAM into SRAM buffer Subsequent accesses solely to SRAM

Chapter 6 External Memory

Types of External Memory
Magnetic Disk RAID Removable Optical CD-ROM CD-Recordable (CD-R) CD-R/W DVD Magnetic Tape

Disk substrate coated with magnetizable material (iron oxide…rust)
Magnetic Disk Disk substrate coated with magnetizable material (iron oxide…rust) Substrate used to be aluminium Now glass Improved surface uniformity Increases reliability Reduction in surface defects Reduced read/write errors Lower flight heights (See later) Better stiffness Better shock/damage resistance

Read and Write Mechanisms
Recording & retrieval via conductive coil called a head May be single read/write head or separate ones During read/write, head is stationary, platter rotates Write Current through coil produces magnetic field Pulses sent to head Magnetic pattern recorded on surface below Read (traditional) Magnetic field moving relative to coil produces current Coil is the same for read and write Read (contemporary) Separate read head, close to write head Partially shielded magneto resistive (MR) sensor Electrical resistance depends on direction of magnetic field High frequency operation Higher storage density and speed

Inductive Write MR Read

Data Organization and Formatting
Concentric rings or tracks Gaps between tracks Reduce gap to increase capacity Same number of bits per track (variable packing density) Constant angular velocity Tracks divided into sectors Minimum block size is one sector May have more than one sector per block

Disk Data Layout

Disk Velocity Bit near centre of rotating disk passes fixed point slower than bit on outside of disk Increase spacing between bits in different tracks Rotate disk at constant angular velocity (CAV) Gives pie shaped sectors and concentric tracks Individual tracks and sectors addressable Move head to given track and wait for given sector Waste of space on outer tracks Lower data density Can use zones to increase capacity Each zone has fixed bits per track More complex circuitry

Disk Layout Methods Diagram

Must be able to identify start of track and sector Format disk
Finding Sectors Must be able to identify start of track and sector Format disk Additional information not available to user Marks tracks and sectors

Winchester Disk Format Seagate ST506

Fixed (rare) or movable head Removable or fixed
Characteristics Fixed (rare) or movable head Removable or fixed Single or double (usually) sided Single or multiple platter Head mechanism Contact (Floppy) Fixed gap Flying (Winchester)

Fixed/Movable Head Disk
Fixed head One read write head per track Heads mounted on fixed ridged arm Movable head One read write head per side Mounted on a movable arm

Removable or Not Removable disk Nonremovable disk
Can be removed from drive and replaced with another disk Provides unlimited storage capacity Easy data transfer between systems Nonremovable disk Permanently mounted in the drive

Heads are joined and aligned
Multiple Platter One head per side Heads are joined and aligned Aligned tracks on each platter form cylinders Data is striped by cylinder reduces head movement Increases speed (transfer rate)

Multiple Platters

Tracks and Cylinders

Floppy Disk 8”, 5.25”, 3.5” Small capacity Slow Universal Cheap
Up to 1.44Mbyte (2.88M never popular) Slow Universal Cheap Obsolete?

Winchester Hard Disk (1)
Developed by IBM in Winchester (USA) Sealed unit One or more platters (disks) Heads fly on boundary layer of air as disk spins Very small head to disk gap Getting more robust

Winchester Hard Disk (2)
Universal Cheap Fastest external storage Getting larger all the time 250 Gigabyte now easily available

Access time = Seek + Latency Transfer rate
Speed Seek time Moving head to correct track (Rotational) latency Waiting for data to rotate under head Access time = Seek + Latency Transfer rate

Timing of Disk I/O Transfer

RAID Redundant Array of Independent Disks Redundant Array of Inexpensive Disks 6 levels in common use Not a hierarchy Set of physical disks viewed as single logical drive by O/S Data distributed across physical drives Can use redundant capacity to store parity information

Data striped across all disks Round Robin striping Increase speed
RAID 0 No redundancy Data striped across all disks Round Robin striping Increase speed Multiple data requests probably not on same disk Disks seek in parallel A set of data is likely to be striped across multiple disks

Data is striped across disks 2 copies of each stripe on separate disks
RAID 1 Mirrored Disks Data is striped across disks 2 copies of each stripe on separate disks Read from either Write to both Recovery is simple Swap faulty disk & re-mirror No down time Expensive

Disks are synchronized Very small stripes
RAID 2 Disks are synchronized Very small stripes Often single byte/word Error correction calculated across corresponding bits on disks Multiple parity disks store Hamming code error correction in corresponding positions Lots of redundancy Expensive Not used

RAID 3 Similar to RAID 2 Only one redundant disk, no matter how large the array Simple parity bit for each set of corresponding bits Data on failed drive can be reconstructed from surviving data and parity info Very high transfer rates

RAID 4 Each disk operates independently Good for high I/O request rate Large stripes Bit by bit parity calculated across stripes on each disk Parity stored on parity disk

RAID 5 Like RAID 4 Parity striped across all disks Round robin allocation for parity stripe Avoids RAID 4 bottleneck at parity disk Commonly used in network servers N.B. DOES NOT MEAN 5 DISKS!!!!!

Two parity calculations Stored in separate blocks on different disks
RAID 6 Two parity calculations Stored in separate blocks on different disks User requirement of N disks needs N+2 High data availability Three disks need to fail for data loss Significant write penalty

RAID 0, 1, 2

RAID 3 & 4

RAID 5 & 6

Data Mapping For RAID 0

Optical Storage CD-ROM
Originally for audio 650Mbytes giving over 70 minutes audio Polycarbonate coated with highly reflective coat, usually aluminium Data stored as pits Read by reflecting laser Constant packing density Constant linear velocity

CD Operation

Other speeds are quoted as multiples e.g. 24x
CD-ROM Drive Speeds Audio is single speed Constant linier velocity 1.2 ms-1 Track (spiral) is 5.27km long Gives 4391 seconds = 73.2 minutes Other speeds are quoted as multiples e.g. 24x Quoted figure is maximum drive can achieve

CD-ROM Format Mode 0=blank data field Mode 1=2048 byte data+error correction Mode 2=2336 byte data

Random Access on CD-ROM
Difficult Move head to rough position Set correct speed Read address Adjust to required location (Yawn!)

CD-ROM for & against Large capacity (?) Easy to mass produce Removable Robust Expensive for small runs Slow Read only

Other Optical Storage CD-Recordable (CD-R) CD-RW WORM Now affordable
Compatible with CD-ROM drives CD-RW Erasable Getting cheaper Mostly CD-ROM drive compatible Phase change Material has two different reflectivities in different phase states

Digital Versatile Disk
DVD - what’s in a name? Digital Video Disk Used to indicate a player for movies Only plays video disks Digital Versatile Disk Used to indicate a computer drive Will read computer disks and play video disks Dogs Veritable Dinner Officially - nothing!!!

Very high capacity (4.7G per layer) Full length movie on single disk
DVD - technology Multi-layer Very high capacity (4.7G per layer) Full length movie on single disk Using MPEG compression Finally standardized (honest!) Movies carry regional coding Players only play correct region films Can be “fixed”

DVD – Writable Loads of trouble with standards First generation DVD drives may not read first generation DVD-W disks First generation DVD drives may not read CD-RW disks Wait for it to settle down before buying!

CD and DVD

Magnetic Tape Serial access Slow Very cheap Backup and archive

Optical Storage Technology Association
Internet Resources Optical Storage Technology Association Good source of information about optical storage technology and vendors Extensive list of relevant links DLTtape Good collection of technical information and links to vendors Search on RAID

Memory Cache Internal External

Similar presentations

Presentation on theme: "Memory Cache Internal External"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Memory Cache Internal External

Similar presentations

Presentation on theme: "Memory Cache Internal External"— Presentation transcript:

Similar presentations

About project

Feedback