Computer Organization and Architecture

Computer Organization and Architecture
Lecture 10: Memory Organisation

Memory Organisation Introduction Cache Internal Memory External Memory

Memory Systems 5 classic components of all computers Input Processor
Control Datapath Output Comp memory exhibits perhaps the widest range of type, technology, organization, performance and cost of any feature of a comp system Memory

Technology Trends Memory DRAM capacity:
increases ~ 60% per year (4x every 3 years) Speed: increases ~ 10% per year Cost per bit: decreases ~25% per year

Types of Memory Property of matter that can be modified (written) and later detected (read) Factors that influence the choice of memory technology: Frequency of access Response time Quantity required Cost (e.g. per bit)

Memory Characteristics
Location Processor Internal (main) External (secondary) Performance Access time Cycle time Transfer rate Capacity Word size Number of words Physical Type Semiconductor Magnetic / Optic Magneto-optical Unit of Transfer Word Block Physical Characteristics Volatile/nonvolatile Erasable/nonerasable Access Method Sequential Random Direct Associative Organisation

Capacity 1 word = 1, 2, 4 bytes (“natural” unit of organisation) Unit of transfer Internal – by words / bytes – governed by data bus width External – by blocks >> word

Access method Sequential Linear, start to finish. Access time depends on location & previous location Direct Blocks have unique address. Jump to vicinity + search Random Individual addresses identify locations exactly Access time independent of location / previous access Associative Locate by comparison with contents of portion of store Access time independent of location / previous access Sequential – time highly variable, e.g. tape Direct – time variable, e.g. disk Random – time constant, e.g. RAM, cache Associative – time constant, e.g. cache

Performance Access time (latency) Read/write operation (rand) Position the mechanism (non-r) Memory cycle time Access + recovery (rand) Transfer rate Transfer rate: Rand = 1/(cycle time) Non-r = TN = TA + n/R TN = Average time to read or write N bits TA = Average access time n= No. of bits R = Transfer rate, in bps

Physical type Semiconductor – RAM Magnetic – Disk & Tape Optical – CD & DVD Others Bubble Hologram

Physical characteristics Volatility – decay Erasable Power consumption Organisation Physical arrangement of bits to form words Volatile semiconductor e.g. RAM Non-v semiconductor e.g. ROM & Flash Contents of modern SDRAM modules fade away in seconds at room temp, but can be extended to minutes at low temp SRAM significantly less power hungry (especially idle) than DRAM

Memory Hierarchy Part of CPU Internal memory Data memory
Register Cache Main Memory Magnetic Disk Tape / Optical Disk Slower & cheaper Data memory Data storage Backup storage Part of CPU Internal memory External memory Trade-off between 3 key characteristics: capacity, time, cost. Key to success: decreasing frequency of access Register: Ultra-Fast (part of CPU) Cache: Very High Speed program + data memory, can be multiple levels Main Mem: Fast Program and Data Memory, usually DRAM (PC)/SRAM (embedded) Magnetic D: "Permanent" program & data storage temp storage for RAM (Memory Management) Tape/Optical D: Backup storage Infrequent access Large and very slow Capacity (size)

Memory Examples RAM Cost: RM 100 per GB Access Time: <60 ns
Hard disk drive Cost: RM 2.5 per GB Access Time: ~10 ms USB flash drive Cost: RM 10 per GB Access Time: ~20 ms

Cache Memory Processor clock speed Memory access rate
+55% Processor clock speed Memory access rate +10% Recently, clock speed improvements have slowed significantly, instead speed improvements come from more efficient architecture Hence need intermediate form of memory that bridges the speed gap

CPU Cache Special buffer storage, small & fast
Copies of data stored elsewhere or computed earlier, where the original data are expensive to fetch or compute. Stores data transparently A CPU cache is a cache used by the CPU of a computer to reduce the average time to access memory. The cache is a smaller, faster (speed almost match CPU) memory which stores copies of the data from the most frequently used main memory locations. As long as most memory accesses are to cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory. When the processor wishes to read or write a location in main memory, it first checks whether that memory location is in the cache. This is accomplished by comparing the address of the memory location to all tags in the cache that might contain that address. If the processor finds that the memory location is in the cache, we say that a cache hit has occurred, otherwise we speak of a cache miss. In the case of a cache hit, the processor immediately reads or writes the data in the cache line. The proportion of accesses that result in a cache hit is known as the hit rate, and is a measure of the effectiveness of the cache. In the case of a cache miss, most caches allocate a new entry, which comprises the tag just missed and a copy of the data from memory. The reference can then be applied to the new entry just as in the case of a hit. Misses are slow because they require the data to be transferred from main memory. This transfer incurs a delay since main memory is much slower than cache memory. As opposed to a buffer, which is managed explicitly by a client, a cache stores data transparently: This means that a client who is requesting data from a system is not aware that the cache exists, which is the origin of the name cache (from French "cacher", to conceal).

CPU Cache CPU requests contents of memory location
Check cache for this data If present, get from cache (cache hit) If not, read required block from main memory to cache (cache miss) Cache includes tags to identify block

CPU Cache

Cache Design Cache Size Line Size Mapping Function Write Policy
Direct Associative Set associative Write Policy Write through Write back Write once Replacement Algorithm LRU LFU FIFO Random Number of caches Single / two level Unified / split

Cache Design Cache Size Bigger more expensive faster (up to a point)
e.g. for Itanium L1 16 kB/16 kB L2 96 kB L3 4 MB

Cache Design Mapping function (example) Cache of 64kB
Cache block of 4 bytes i.e. cache is 16k (214) lines of 4 bytes 16MB main memory 24 bit address (224=16M)

Cache Design Direct mapping
Each main memory block maps to only one cache line Address divided to 2 parts LSB w identify unique words MSB s specify one memory block, split into cache line field r and tag of s-r Formula i = j modulo m i = cache line number j = main memory block number m = no. of lines in the cache

Cache Design Direct mapping 24 bit address = s + w
2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag / 14 bit slot or line Check contents of cache by finding line and checking tag Tag s-r Line or Slot r Word w 14 2 8

Cache Design Direct mapping Simple Inexpensive
Fixed location for given block If program accesses 2 blocks that map to the same line repeatedly, cache misses are very high  hit ratio low

Cache Design Associative mapping
Each main memory block can be loaded into any cache line Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match

Cache Design Associative mapping Word
22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required Tag 22 bit Word 2 bit

Cache Design Set associative mapping Compromise of above
Cache divided into v sets of k lines A given block maps to any line in a given set Formula m = v x k i = j modulo v e.g. 2 lines per set, hence 2-way associative mapping i = cache set number j = main memory block number m = no. of lines in the cache

Cache Design Set associative mapping Word 2 bit Tag 9 bit Set 13 bit
Use set field to determine cache set to look in Compare tag field to see if we have a hit Tag 9 bit Set 13 bit Word 2 bit

Cache Design Replacement Algorithm Direct mapping
Each block only maps to one line Associative & Set associative Least Recently Used (LRU) First In First Out (FIFO) Least Frequently Used (LFU) Random

Cache Design Write Policy
Must not overwrite a cache block unless main memory is up to date Problem: Multiple CPUs may have individual caches I/O may address main memory directly

Cache Design Write Policy Write through Simplest, writes immediately
Lots of traffic, hence slow Write back Only update cache, invalidate main memory, faster Other caches get out of sync I/O must access main memory through cache 15% of memory references are writes

Cache Design Line Size Initially, hit ratio ↑ as block size ↑ due to principle of locality At some point hit ratio ↓ because: Larger blocks = reduced no. of blocks Larger blocks = each word is farther than requested word Principle of locality: memory references tend to cluster

Cache Design Number of Caches Multilevel caches
L1 on-chip, L2 off-chip (SRAM) Unified / split caches Between instructions & data Unified cache has higher hit ratio, simpler Split cache eliminate contention, enable parallel processing, prefetching of predicted future instruction

Pentium 4 Cache 80386 – no on chip cache
80486 – 8k using 16 byte lines and 4 way set associative organisation Pentium (all versions) – two on chip L1 caches Data & instructions Pentium 4 – L1 caches 8kB 64 byte lines 4 way set associative Pentium 4 – L2 cache 256kB 128 byte lines 8 way set associative

Pentium 4 Diagram

Memory Management Uni-program
One for Operating System (resident monitor) One for currently executing program Multi-program “User” part is sub-divided and shared among active processes I/O is so slow that even in multi-programming system, CPU can be idle most of the time

Memory Management Solutions: Increase main memory Expensive
Leads to larger programs Swapping Virtual memory

Memory Management Swapping Long term q of processes stored on disk
Processes “swapped” in as space is available When complete, process is moved out of main memory If none of the processes in memory are ready (i.e. all I/O blocked) Swap out a blocked process to intermediate q Swap in ready process or new process But swapping is an I/O process...

Virtual Memory Main memory = a library Virtual address = a book title
Physical address = location number Page table = card catalogue TLB = piece of paper

Summary Memory Technology Memory characteristics
Types of memory / memory hierarchy Cache Memory CPU cache Cache design Pentium 4 cache Memory Management Virtual memory TLB

Further Reading Stallings, COA, 8th Edition Chapter 4 & 8
Hayes, CAO, 3rd Edition Chapter 6 Patterson, COD, 3rd Edition Chapter 7

Memory Organisation Introduction Cache Internal Memory External Memory

Memory Types

Internal Memory a.k.a. primary storage/main memory May include cache
The only memory directly accessible to CPU, via a memory bus

Semiconductor Memory Types
Category Erasure Write Mechanism Volatility RAM Read-write Electrically, byte-level Electrically Volatile ROM Read-only Not possible Masks Non-volatile Programmable ROM Erasable PROM Read-mostly UV light, chip level EEPROM Flash memory Electrically, block-level

Semiconductor Memory Basic element of semiconductor memory is the memory cell

RAM Random-access memory
Misnamed, as all semiconductor memory is random access Read/Write Volatile Static or dynamic Volatile, so only temporary storage

RAM DRAM Main memory for computers, consoles
Bits stored as charge in capacitors Charges leak, need refreshing circuit Simpler construction Smaller per bit, so higher density Less expensive, but slower Essentially analogue Simpler construction: only one transistor and a capacitor are required per bit Essentially analogue: level of charge determines value

RAM DRAM structure Address line active when bit read or written
Transistor switch closed (current flows) Write Voltage to bit line High for 1 low for 0 Then signal address line Transfers charge to capacitor Read Address line selected transistor turns on Charge from capacitor fed via bit line to sense amplifier Compares with reference value to determine 0 or 1 Capacitor charge must be restored

RAM SRAM Cache, main memory for embedded use
Bits stored as on/off switches No charges to leak, no refreshing needed, so less power hungry More complex construction Larger per bit, so less dense More expensive, but faster Digital: uses flip-flops More complex construction: normally needs 6 transistors (6T) and a capacitor are required per bit Essentially digital: only on and off

RAM SRAM structure Transistor arrangement gives stable logic state
C1 high, C2 low T1 T4 off, T2 T3 on State 0 C2 high, C1 low T2 T3 off, T1 T4 on Address line transistors T5 T6 is switch Write – apply value to B & compliment to B Read – value is on line B

RAM DRAM SRAM Both volatile Power needed to preserve data
Simpler & smaller More dense Less expensive No refresh needed Faster

ROM Read Only Memory Nonvolatile Microprogramming Library subroutines
Systems programs (BIOS) Function tables Nonvolatile, so permanent storage

ROM Written during manufacture Very expensive for small runs
Programmable (PROM) - once Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory

ROM Flash memory Memory card, USB flash drive, SSD Modern EEPROM
Erase whole memory electrically NOR flash / NAND flash NOR and NAND flash differ in two important ways: the connections of the individual memory cells are different the interface for reading and writing the memory is different (NOR allows random-access for reading, NAND allows only page access)

Chip Organisation How to organise a 16Mbit memory? 1M of 16-bit words
2048 x 2048 x 4-bit array (DRAM) (square array) Typical DRAM: Read/write 4 bits at a time

Chip Organisation DRAM
log22048=11 address lines, half what is expected, coz use select logic external to chip and multiplexed, to minimise no. of pins

Chip Packaging EPROM: 1M x 8 32 pins (a standard chip package)
Address A0-A19 Data out D0-D7 Vcc & Vss CE Program voltage Vpp DRAM: 4M x 4 24 pins Address A0-A10 Data in/out D0-D4 Vcc & Vss OE/WE/RAS/CAS NC (to even no. of pins)

Chip Organisation How to refresh? Disable chip Count through rows
Read data & write back Takes time Slows down apparent performance

Error Correction Hard failure Permanent defect
Manufacture defect or wear Soft error Random, non-destructive No permanent damage to memory Power supply, alpha particles Enhance reliability but add complexity

Error Correction Simple parity code can only detect an odd number of errors Hamming code can detect up to two simultaneous bit errors & correct single-bit errors Formula: 2K -1 ≥ M + K M data bits need K check bits Simplest is using Hamming code, named after Richard Hamming, used in RAM 4 data bits need 3 check bits 8 need 4 16 need 5

Error Correction M bits of data to store, function f produce K check bits, so actual storage M+K bits

Error Correction Tambah from

External Memory Magnetic disk RAID Optical CD-ROM CD-R, CD-R/W DVD
Blu-ray Magnetic tape CD Recordable, CD Rewritable

Magnetic Disk Circular disk substrate coated with magnetisable material (iron oxide, cobalt-based alloy) Substrate used to be aluminium, now glass Improved surface uniformity Reduction in surface defects Lower flight heights (see later) Better stiffness Better shock/damage resistance First HDD was invented by IBM in ~4.4 MB 1,200 rpm 8,800 characters/sec Desktop today 3 TB (120 GB – 1.5 GB) 5,400 – 10,000 rpm 0.5 Gbit/sec

Magnetic Disk

Magnetic Disk Recording & retrieval via conductive coil called a head
May be single read/write head or separate During read/write, head is stationary, platter rotates Write Current through coil produces magnetic field Pulses sent to head Magnetic pattern recorded on surface below Read and Write Mechanisms

Magnetic Disk Read (traditional)
Magnetic field moving relative to coil produces current Coil is the same for read and write Read (contemporary) Separate read head, close to write head Partially shielded magnetoresistive (MR) sensor Electrical resistance depends on direction of magnetic field High frequency operation Higher storage density and speed

Magnetic Disk Inductive write/MR read head

Magnetic Disk Concentric rings or tracks
Gaps between tracks, reduce gap to increase capacity Same number of bits per track (variable packing density) Constant angular velocity Tracks divided into sectors Minimum block size is one sector Data Organization and Formatting

Magnetic Disk Disk formatted with additional information not available to user, marks tracks and sectors

Magnetic Disk Constant angular velocity (CAV)
Gives pie shaped sectors and concentric tracks Individual tracks and sectors addressable Move head to given track and wait for given sector Waste of space on outer tracks Lower data density Can use zones to increase capacity Each zone has fixed bits per track Outer track has more zones (& more sectors per zone) than inner track More complex circuitry

Magnetic Disk Winchester Disk Format (Seagate ST506)
600 bytes/sector, but only 512 is data Winchester Disk Format (Seagate ST506)

Disk Characteristics Head Motion Platters Disk Portability
Fixed head Movable head Platters Single platter Double platter Disk Portability Non-removable disk Removable disk Head Mechanism Contact (floppy) Fixed gap Aerodynamic gap Sides Single sided Double sided

Disk Characteristics Head motion wrt platter Fixed head (rare today)
One read write head per track Heads mounted on fixed ridged arm Movable head One read write head per surface Mounted on a movable arm

Disk Characteristics Disk portability Non-removable disk
Permanently mounted in the drive (C:) Removable disk Can be removed from drive and replaced with another disk Provides unlimited storage capacity Easy data transfer between systems

Disk Characteristics Magnetisable coating can be applied to both sides of platter Single sided (less expensive) Double sided

Disk Characteristics Platters can be stacked vertically
Multiple platter One head per side Heads are joined and aligned Aligned tracks on each platter form cylinders Data is striped by cylinder Reduces head movement Increases speed (transfer rate)

Disk Characteristics A: Platters B: Actuator Arm C: R/W Head
D:Cylinder E:Track F: Sectors

Disk Characteristics Head mechanism Contact (floppy disk)
8”, 5.25”, 3.5” Small capacity (3½” HD MB) Cheap & slow Fixed head-to-disk gap

Disk Characteristics Aerodynamic gap Winchester disk (IBM 3340)
Sealed unit – free from contaminants Heads fly on boundary layer of air as disk spins due to aerodynamics Very small gap – greater data density Now standard design

Disk Characteristics Characteristics Barracuda ES.2 Barracuda 7200.10
Momentus Microdrive Application Hi-capacity Server Hi-perform desktop Entry-level desktop Laptop Handheld device Capacity 1 TB 750 GB 160 GB 120 GB 8 GB Min track seek time 0.8 ms 0.3 ms 1.0 ms – Ave. seek time 8.5 ms 3.6 ms 9.5 ms 12.5 ms 12 ms Spindle speed 7200 rpm 5400 rpm 3600 rpm Ave. rotate delay 4.16 ms 4.17 ms 5.6 ms 8.33 ms Max transfer rate 3 GB/s 300 MB/s 150 MB/s 10 MB/s Tracks per cylinder 8 2 Seagate: Barracuda & Momentus Hitachi: Microdrive All have 512 bytes per sector

Disk Performance Disk rotates at constant speed
Access time = Seek + Latency Seek time < 10 ms Moving head to correct track (Rotational) latency Waiting for data to reach head Transfer rate

Disk Performance Total average access time Ta = Ts + 1/2r + b/rN
r = rotation speed (rpm) b = no. of bytes to transfer N = no. of bytes on track May also be delay associated with disk I/O operation . So if data sequential, only one Ts, but if random, then many Ts - importance of defragmentation This is also why transferring one large file is faster than multiple small files

RAID Redundant Array of Independent Disks
Redundant Array of Inexpensive Disks 7 levels in common use Not a hierarchy, but designate different design architectures Set of physical disks viewed as single logical drive by O/S Data distributed across physical drives Can use redundant capacity to store parity information

Optical Memory Originally for audio
680 MB giving over 70 minutes audio Polycarbonate coated with highly reflective coat, usually aluminium Data stored as pits Read by reflecting laser Constant packing density Constant linear velocity – 1.2 m/s

Optical Memory

CD-ROM Mode 0=blank data field
1 block = Mode 0=blank data field Mode 1=2048 byte data error correction Mode 2=2336 byte data Sync: identify beginning of block, 1 byte 0’s, 10 byte 1’s, 1 byte 0’s Header/ID: Block address and mode byte

CD-ROM Single spiral track, start near centre
Magnetic disk use concentric tracks Outer sectors are the same length as inner sectors Outer sectors longer than inner sectors, unless use multiple zoned recording Constant linear velocity (CLV) Simple constant angular velocity (CAV) CLV means harder to do random access

CD-ROM For Easy to mass produce Removable Robust Against Slow
Read only Access time much longer than magnetic disc drive

Other Optical Storage CD-Recordable (CD-R) WORM
Compatible with CD-ROM drives Use dye laser to change reflectivity CD-Rewritable (CD-RW) Erasable Mostly CD-ROM drive compatible Phase change: material has two different reflectivities in different phase states WORM: Write once read many, a bit like PROM 2 phases: amorphous state (molecules exhibit random orientation, poor reflection) & crystalline state (smooth surface reflects light well) Both attractive for archival storage of documents and files

DVD Digital Video Disk Used to indicate a player for movies
Only plays video disks Digital Versatile Disk Used to indicate a computer drive Will read computer disks and play video disks

DVD Very high capacity (4.7G per layer) 7 times as much as a CD
Bits packed more closely Uses 650 nm laser diode light as opposed to 780 nm for CD Full length movie on single disk Using MPEG compression Movies carry regional coding, players only play correct region films May employ second layer to double capacity (8.5 GB) May employ both sides to double again (17 GB)

DVD DVD recordable DVD-R/DVD-RW (DVD minus) DVD+R/DVD+RW (DVD plus)
DVD-RAM All incompatible with each other Multi-format drive can read and write more than one format Like CD-R, also use dye

CD / DVD

Blu-ray Competed with HD DVD to supersede DVD 25 GB single layer
More than 5x normal DVD Uses 405 nm "blue" laser Employs DRM BD-R & BD-RE also exist

Magnetic Tape Flexible polyester tape coated with magnetisable material Analogous to home tape recorder system Sequential-access device Slow Very cheap Used for backup and archive First kind of secondary memory, and still widely used today, dominant technology is linear-tape-open (LTO) Disk drive is direct access drive

Magnetic Tape Serpentine reading and writing

Summary Internal Memory RAM ROM Chip organisation Error correction
External Memory Magnetic disc Optical memory Magnetic tape

Further Reading Stallings, COA, 8th Edition Chapter 5 & 6
Hayes, CAO, 3rd Edition Chapter 6

Computer Organization and Architecture

Similar presentations

Presentation on theme: "Computer Organization and Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Organization and Architecture

Similar presentations

Presentation on theme: "Computer Organization and Architecture"— Presentation transcript:

Similar presentations

About project

Feedback