Memory System Design.

Name: Memory System Design.
Uploaded: 2017-07-09T14:21:33+00:00
Duration: PTM31S57
Channel: Erik Melton
Description: Memory System Design.

Memory System Design

Characteristics of a Memory System
Location Processor Internal (Main) External (Secondary) Capacity Word Size Number of Words Unit of Transfer Word Block

Access Method Sequential (Tape) Start at the beginning and read through in order Access time depends on location of data and previous location Direct (Disk) Individual blocks have unique address Access is by jumping to vicinity plus sequential search Random (RAM/ROM) Individual addresses identify locations exactly Access time is independent of location or previous access

Access Method (Contd.) Associative (Cache) Based on content Data is located by a comparison with contents of a portion of the store Access time is independent of location or previous access

Performance Access Time Time between presenting the address and getting the valid data Cycle Time Time may be required for the memory to “recover” before next access Cycle time is access + recovery Transfer Rate Rate at which data can be moved

Physical Type Semiconductor RAM / ROM Magnetic Disk & Tape Optical CD & DVD Magneto-Optical CD-RW

Physical Characteristics Charge Decay Volatile / Non-Volatile Erasable / Non-Erasable Power Consumption Organization Physical arrangement of bits into words Not always obvious

Memory Hierarchy Memory design is governed by three questions:
How large? How fast? How much? Three rules: Faster access time, greater cost per bit. Greater capacity, smaller cost per bit. Greater capacity, slower access time. To solve this dilemma, designers use a hierarchy of memory systems.

Memory Hierarchy Register Inboard Memory Cache Main Memory
Magnetic Disk CD ROM CD-RW DVD DVD-RW Magnetic Tape WORM Inboard Memory < $ / bit > Capacity > Access Time < Frequency of access Outboard Storage Off-Line Storage

Locality of Reference The memory hierarchy presented works because of a natural phenomena known as “locality of reference”. During the execution of a program, memory references for instructions and data tend to cluster. Keeping the current cluster in the faster memory level allows faster memory access.

Main Memory Relatively large and fast.
Used to store programs and data during the computer operation. The principle technology is based on semiconductor ICs. Usually referred to as Random Access Memory (RAM). The more accurate name would be Read / Write Memory (R / WM)

RAM Allows both read and write operations. Volatile.
Both operations are performed electrically. Volatile. Used for temporary storage only. If the power is disconnected, the contents become invalid. Two main varieties. Static. Dynamic.

Dynamic RAM (DRAM) Usually used for Main Memory in most computer systems. Inexpensive. Uses only one transistor per bit. Data is stored as charge in capacitors. Destructive read. Charge on capacitor is drained during a read. Data must be re-written after a read.

DRAM – (Contd.) Charge on a capacitor decays naturally.
Therefore, DRAM needs refreshing even when powered to maintain the data. Refreshing is done by reading and re-writing each word every few milliseconds. Refresh Rate. During “suspended” operation, notebook computers use power mainly for DRAM refresh.

Static RAM (SRAM) Consists of internal flip flop like structures that store the binary information. No charges to leak. No refreshing is needed. Non-destructive read. More complex construction. Larger cell, Less dense. More expensive. Faster. Usually used for Cache Memory.

SRAM vs. DRAM Storage cells in DRAM are simpler and smaller.
DRAM is more dense. More bits per square area. DRAM is less expensive. DRAM uses less power. DRAM requires extra circuitry to implement refresh mechanism. DRAM is slower.

SRAM Chip Organization

Bi-directional Data in and Out Pins

Read Only Memory (ROM) Read but cannot write. Non volatile. Used for:
Microprogramming. System programs. Whole programs in embedded systems. Library subroutines and function tables. Constants. Manufactured with the data wired into the chip. No room for mistakes.

ROM Structure

Programmable ROM (PROM)
Non volatile. Can be programmed - written into - only once. Programming is done electrically and can be done after manufacturing. Special equipment is needed for the programming process. Uses fuses instead of diodes. Fuses that need to be removed are “vaporized” during the programming process using a high voltage pulase (10 – 30 V). CAN NOT BE ERASED.

Erasable PROM (EPROM) Uses floating-gate MOS transistors with insulating material that changes behavior when exposed to ultraviolet light. Programmed electrically and erased optically. Erasing can be repeated a relatively large but limited number of times (~100,000 times). Erasing time ~20 minutes. Electrically read and written. Before writing, ALL cells must be erased by exposure to ultraviolet light. Non volatile. More expensive than PROM.

Electrically Erasable PROM (EEPROM)
Uses the same floating-gate transistors, except that the insulating material is much thinner. Its operation can be inverted using voltage. Can be written to any time without erasing the previous contents. Only the bytes addressed are modified. Write takes a relatively long time (~100msec/byte). Can be erased only about 10,000 times. Non volatile. Updatable in place. More expensive and less dense than EPROM.

Flash Memory Called flash due to the speed of re-programming.
Uses electrical erasure technology. An entire chip can be erased in 1-2 sec. Possible to erase only blocks of data. Does not provide byte level erasure. Uses one transistor per bit. Very high density. Cost is between EPROM and EEPROM. Non Volatile.

Organization of a Memory Chip
The basic element of a semiconductor memory is the memory cell. There are different types, but they all share some common properties: Two states, 1 and 0. It is possible to write into the cell. (At least once). They can be read to sense the state.

Organization of a Memory Chip
How to organize a-16 Mbit chip? 1 Mega words of 16 bits each. Tall and narrow organization. Chips like to be square. Typical organization is: 2048 x 2048 x 4bit array. Organized internally as a square structure with decoders for row and column. Simplifies decoding logic. Reduces number of address pins. Row and column address bits are multiplexed.

Organization of the Memory Chip

Memory Module Organization
Most high capacity RAM chips contain only a single bit per location. To build a multi-bit per location module, we will need multiple chips. Design a 256K Byte memory system using 8 256K X 1 chips. 256K requires 18 address wires We will apply 9 wires to the row selectors and 9 to the column selectors The outputs of the chips are combined together to form the 8 bit output of the system.

Organization of the 256 K Byte System
Each chip receives all 18 bits of the address. Each chip produces/receives a single bit of the data.

Memory Module Organization
What if the size of the system is not the same as the chips? Design a 1 MByte system using 256K X 1 chips. We will have to arrange the chips themselves into columns and rows. There will be 4 columns of chips. Number of columns = system’s address space / chip’s address space. There will be 8 rows of chips. Number of rows = system’s word size / chip’s word size. Some of the address wires will have to be used for selecting different rows of chips.

Organization of the 1 M Byte System

Associative Memory Many applications require the search for the location of a particular item in a table in memory. Find the name of the student whose ID is 97xxxxx. The easiest way would be to search through all records sequentially to find the matching record. Response varies tremendously based on size of table and location of item in the table. The solution is to find a way to check all entries at the same time and identify the matching one.

Associative Memory Associative memory consists of four main items:
The memory array. The input argument register. A mask register to select specific bits from the argument for matching (if needed). A match register. Memory Array M A T C H Argument Mask

Associative Memory The argument is masked using the contents of the mask register. The argument is then sent to the memory array for comparison. Each entry in the memory array contains a comparator that compares the entry’s contents with the argument. If they match, a bit in the match register is set. The match register contains a bit that corresponds to each location in the memory array. Once the matching is done, the match register will contain an indication of which locations matched the argument. If the request was for all entries containing a certain field, the user gets back the contents of all memory locations containing that field.

Cache Memory Cache Memory is intended to give:
Memory speed approaching that of the fastest memories available. Large memory size at the price of less expensive types of semiconductor memories. Small amount of fast memory. Sits between normal main memory and CPU. May be located on CPU chip or module.

Conceptual Operation Main Memory
Relatively large and slow main memory together with faster, smaller cache. Cache contains a copy of portions of main memory. When processor attempts to read a word from memory, a check is made to determine if the word exists in cache. If it is, the word is delivered to the processor. If not, a block of main memory is read into the cache, then the word is delivered to the processor. Main Memory CPU Word Transfer Block Transfer Cache Memory

Hit Ratio A measure of the efficiency of the cache structure.
When the CPU refers to memory and the word is found in the cache, this called a hit. When the word is not found in cache, this is called a miss. Hit ratio is the total number of hits divided by the total number of access attempts (hits + misses). It has been shown practically that hit rations higher than 0.9 are possible.

Cache vs. Main Memory Structure
1 Tag Block 2 Block (K words) 1 2 3 . C-1 Block Length (K Words) Cache Main Memory 2n - 1 Word Length

Main Memory and Cache Memory
Main Memory consists of 2n addressable words. Each word has a unique n-bit address. We can consider that main memory is made up of blocks of K words each. Usually, K is about 16 Cache consists of C lines of K words each. A block of main memory is copied into a line of Cache. The “tag” field of the line identifies which main memory block each cache line represents

Elements of Cache Design
Size Mapping function Replacement algorithm Write policy Line size Number of caches

Mapping Function There are fewer cache lines than memory blocks.
How do we map a memory block to a cache line? Assume the following: Cache can hold 64 Kbytes. Data is transferred in blocks of 4 bytes. Cache is 16K lines of 4 bytes each. Main memory is 16 Mbytes. Memory is 4M blocks of 4 bytes each. How do we map the 4M blocks into the 16K lines?

Direct Mapping Map each block of memory into only one possible cache line. A block of main memory can only be brought into the same line of cache every time. Cache Line Main memory blocks assigned 0, C, 2C, 3C, … 1 1, C+1, 2C+1, 3C+1, … … C – 1 C-1, 2C-1, 3C-1, 4C-1, …

Direct Mapping A main memory address is considered to be made up of two pieces: Block address Upper bits of the address Word address within a block Lower bits of the address The block address section is further considered to be made of two items: Cache line number Lower bits Tag

Direct Mapping Address Structure
16 Mbytes of memory. 24 bits in address. 4 byte blocks. 2 bits. 16 K lines in cache. 14 bits. Rest is used to identify the block mapped to the line. 8 14 2 Tag Line or Slot Word

Reading From a Direct Mapped System
The processor produces a 24 bit address. The cache uses the middle 14 bits to identify one of its 16 K lines. The upper 8 bits of the address are matched to the tag field of the cache entry. If they match, then the lowest order two bits of the address are used to access the word in the cache line. If not, address is used to fetch the block containing the specified word from main memory to the cache.

Direct Mapping Cache Organization

Direct Mapping Advantages. Disadvantages. Simple.
Inexpensive to implement. Disadvantages. There is a fixed location for each block in the cache. If a program addresses words from two blocks mapped to the same line, the blocks have to be swapped in and out of cache repeatedly.

Associative Mapping To improve the hit ratio of the cache, another mapping techniques is often utilized, “associative mapping”. A block of main memory may be mapped into ANY line of the cache. A block of memory is no longer restricted to a single line of cache.

Associative Mapping A main memory address is considered to be made up of two pieces: Tag Upper bits of the address Word address within a block Lower 2 bits of the address

Associative Mapping Address Structure
16 Mbytes of memory. 24 bits in address. 4 byte blocks. 2 bits. Rest is used to identify the block mapped to the line. 22 2 Tag Word

Reading From an Associative Mapped System
The processor produces a 24 bit address. The upper 22 bits of the address are matched to the tag field of EACH cache entry. This matching must be done simultaneously to each of the entries. i.e. Associative memory.

Associative Mapping Cache Organization

Associative Mapping Advantages. Disadvantages.
Improves hit ratio for certain situations. Disadvantages. Requires very complicated matching hardware for matching the tag and the entries for each line. Expensive.

Set Associative Mapping
Set Associative Mapping helps reduce the complexity of the matching hardware for an associative mapped cache. Cache is divided into a number of sets. Each set contains a number of lines. A 2-way set associative cache has 2 lines per set. A block of memory is restricted to a SPECIFIC set of lines. A block of main memory may map to ANY line in the given set.

A main memory address is considered to be made up of two pieces: Tag. Upper bits of the address. Set number. Middle bits of the address. Word address within a block. Lower 2 bits of the address.

Set Associative Mapping Address Structure
16 Mbytes of memory. 24 bits in address. 4 byte blocks. Lowest order 2 bits. 8K sets in a 2-way associative cache. 13 bits. Rest is used to identify the block mapped to the line. 9 13 2 Tag Set Word

Reading From a Set Associative Mapped System
The processor produces a 24 bit address. The cache uses the middle 13 bits to identify one of its 8 K sets. The upper 9 bits of the address are matched to the tag field of the cache entries that make up the set. The number of lines to match to is very limited. Therefore, the matching hardware is much simpler.

Set Associative Mapping Cache Organization

Advantages. Combines advantages of direct and associative mapping techniques. Disadvantages. Increasing the size of the set does not always improve the hit ratio. 2-way set associative has a much higher hit ratio than direct mapping. Increasing it to 4-way improves the hit ratio slightly more. Beyond that no significant improvement has been seen.

Replacement Algorithms
What happens if there is a “miss” and the cache is already full? One of the items in the cache needs to be “replaced” with the new item. Which one?? Depends on the mapping technique used. Direct mapping. No choice. Memory blocks map into certain cache lines. The entry occupying that line must be swapped out.

Replacement Algorithms
Associative & Set Associative: Random. First-in First-out (FIFO). Least Recently Used (LRU). Least Frequently Used (LFU). The last three require additional bits for each entry to keep track of order, time or number of times used. Usually, these algorithms are implemented in hardware for speed.

Writing Into Cache Cache entries are supposed to be exact “copies” of what is in main memory. What happens when the CPU wants to write into memory?? Which memory does it write too??? Two techniques are possible. Write-through. Write-back.

Write-Through The simplest and most commonly used technique is to update both the cache and main memory at the same time. Advantage. Memory and cache are always in sync. Disadvantage. Memory write becomes slow.

Write-Back The update is done ONLY to the word in the cache and the block containing the word is marked. When the block is to be swapped out of cache, the word is written back to main memory. Advantage. Reduces memory traffic because a word may be updated several times while in cache. Disadvantage. Cache and memory will be out of sync for a while. What about DMA??

Number of Caches When a cache miss occurs, the system suffers through a large delay while the block is read from main memory into the cache. Two possible solutions. Speed up the transfer of information. The transfer rate is limited by issues that may not be under our control. Speed up the source of the information. Main memory is between 7X and 10X slower than cache. We can insert an intermediate level of memory between cache and main memory.

Cache Levels In most of today’s designs, cache sits on the same chip as the CPU. “On-chip cache” Data travels a very short distance No need to use the very slow bus This is known as L1 cache Intel calls this level L0 To reduce the penalty of a cache miss, a second level of cache is inserted between main memory and the on-chip cache. L2 cache

Cache Levels Main Memory CPU Pentium Pro Pentium Off-Chip Memory
On Chip Cache CPU System Bus Off-Chip Memory MPU Chip Data Pentium Pro Pentium Main Memory

“L2” Cache A very fast, SRAM based, cache is placed off-chip.
Slower than the on-chip cache. Larger than the on-chip cache. On-Module Cache. CPU uses a dedicated, internal, fast, memory bus to access cache. On-Mother-Board Cache. The CPU has to use the system bus to get to it. Still much faster than DRAM based main memory.

Cache Strategy On-Chip Cache is optimized to increase “hit rate”.
Block size about 4 words Many blocks Off-Chip Cache is optimized to reduce “miss penalty”. Larger block size Smaller number of blocks.

Advanced DRAM Organization
One of the most critical bottlenecks in the system is the interface to main memory. The design of main memory has mostly not changed in the last 30 years. Still based on slow DRAM design. One possibility of improvement has been the insertion of high speed SRAM caches. Recently, attempts have been made at improving the performance of the basic cell of the DRAM chip itself.

Enhanced DRAM Integrate a small SRAM cache on the DRAM chip. The cache holds the value of the last row read. If the next access is to the same row, the value is accessed from the SRAM Refresh can be done in parallel with a read. Allows a read to partially overlap a previous write operation. Performs similar to a DRAM and external SRAM cache combination.

Cache DRAM Integrate a larger SRAM cache on the DRAM chip. Allow the on-chip cache to be used as a true cache. Allow a series of locations to be pre-fetched into the SRAM cache for later quick access.

Synchronous DRAM (SDRAM). DRAM is asynchronous. Data access is independent of the clock. The CPU must wait and continuously check for data. In SDRAM, data access is synchronized to an external clock running at full bus speed. The CPU knows exactly when the data will be ready. It can do something else while the memory chip is preparing the data. SDRAM allows burst mode. A series of locations can be clocked out very quickly after the first location is accessed. Control Register.

Rambus DRAM (RDRAM). RDRAM chips exchange information with the microprocessor on a special 28-wire bus. The bus can deliver up to 500-Mbps as compared to the normal 33-Mbps for DRAM. The CPU sends all requests to RDRAM over this special bus. The request contains the desired address, the type of operation, and the number of bytes. Using Rambus DRAM requires special consideration during the design of the CPU. Currently only Pentium IV uses Rambus DRAM.

Memory System Design.

Similar presentations

Presentation on theme: "Memory System Design."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Memory System Design.

Similar presentations

Presentation on theme: "Memory System Design."— Presentation transcript:

Similar presentations

About project

Feedback