4Characteristics of Flash Memory Write-OnceNo writing on the same page unless its residing block is erased!Pages are classified into valid, invalid, and free pages.Bulk-ErasingPages are erased in a block unit to recycle used but invalid pages.Wear-LevelingEach block has a limited lifetime in erasing cycles.E.g., 10,000 ~ 100,000 erase cycles for each block
5Terminology Valid data: the latest version of data stored in flash Invalid data: not the latest version of data stored in flashLive page: a page that stores valid dataDead page: a pages that stores invalid dataFree page: a page that is erased and is ready to store dataFree block: a block that is erased and is not allocated to store any dataHot data: frequently updated dataValid hot data might become invalid in the near future.Cold data: non-frequently updated dataValid cold data might stay in the same place for a long time.
6Management Issues – Flash-Memory Characteristics Example 1: Out-place UpdateABCDABDead pagesBecause flash memory is write-once, we do not overwrite data on update.Instead, the new version of data A and data B are written to free pages, and the old versions of data are considered as dead.
7Management Issues – Flash-Memory Characteristics Example 2: Garbage CollectionLDDLDDLDThis block is to be recycled.(3 live pages and 5 dead pages)LLDLLLFDLFLLLLDFAfter a certain usage, the number of free space would be low.Garbage collection intends to reclaim free pages by erasing blocks.In this example, we have 4 blocks, and each block has 8 pages.These are live pages, these are dead pages, and these are free pages.Suppose that we want to recycle the dead pages on the first block.A live pageFLLFLLFDA dead pageA free page
8Management Issues – Flash-Memory Characteristics Example 2: Garbage CollectionDDDDDDDDLive data are copied to somewhere else.LLDLLLLDLFLLLLDLBecause there are live pages on the block, the useful / live data are copied to somewhere else first. The copy is called live page copying.A live pageLLLFLLFDA dead pageA free page
9Management Issues – Flash-Memory Characteristics Example 2: Garbage CollectionFFFFFFFFThe block is then erased.Overheads:live data copyingblock erasing.LLDLLLLDLFLLLLDLFinally, we perform a block erase to erase all pages on the block.There are 5 free pages reclaimed because we consume 3 free pages to perform the live page copyings before the block erase.As we can see, there are overheads to do garbage collection.The overheads are the live data copying s and the block erasing.A live pageLLLFLLFDA dead page
10Management Issues – Flash-Memory Characteristics Example 3: Wear-LevelingWear-leveling might interfere with the decisions of the block-recycling policy.100LDDLDDLDA10LLDLLLFDBLFLLLLDF20CA live pageUnder current technology, a flash memory block has a limitation on the number of erase cycle count.A block could be erased for 1 million times. After that, the write to the block would suffer from frequent write errors.Wear-leveling intends to erased all block evenly.In this example, let the numbers denote the erase cycle count of each blocks.Obviously it is not a good idea to recycle block A since it had been erased for 100 times.Alternatively, we could erased block B,C,or D. However, it is not efficient to recycle block B,C,D because there are not many dead pages.We can see that Wear-leveling might interfere with the decisions of the block-recycling policy.A dead page15FLLFLLFDDA free pageErase cycle counts
11Single-Level Cell (SLC) vs. Multiple-Level Cell (MLC) A limited bound on erase cyclesSLC : 100,000MLCx2: 10,000Bit error probabilitySLC: 10-9MLC: 10-6As the larger capacity of flash memory adapts MLC flash memory, where MLC is a multi-level cell flash memory and each cell can store more than one bit information because of its low cost. It can endure less erase cycles. Comparatively, SLC is a cell that can store only one bit information and has higher erase cycle but more expensive.11
12Electronic Engineering Times, July 2005 MLC vs. SLC (Cont.)Electronic Engineering Times, July 2005
13Price and Read/Write Performance NORNAND SLCNAND MLCx2Price34.55 $/GB6.79 $/GB2.48 $/GBRead23.84 MB/sec15.33 MB/sec13.5 MB/secWrite0.07 MB/sec4.57 MB/sec2.34 MB/secErase0.22 MB/sec85.33 MB/secMB/sec*NOR: Silicon Storage Technology (SST). NAND SLC: Samsung Electronics. K9F1G08Q0M. NAND MLCx2: ST STMicroelectronics[1,2]1. Jian-Hong Lin, Yuan-Hao Chang, Jen-Wei Hsieh, Tei-Wei Kuo, and Cheng-Chih Yang, "A NOR Emulation Strategy over NAND Flash Memory," the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Daegu, Korea , August 21-24, 2007.2. Yuan-Hao Chang and Tei-Wei Kuo, “A Log-based Management Scheme for Low-cost Flash-memory Storage Systems of Embedded Systems”
14Flash Memory Management Management issuesWrite constraints imposed by flash memoryScalability issueGarbage collectionPerformance considerationsReliability issuesCell error rate problem imposed by MLC flash memoryError correction coding vs. wear levelingRead/write disturbance problemData retention problem
16Flash Translation Layer (FTL) FTL adopts a page-level address translation mechanism.The main problem of FTL is on large memory space requirements for storing the address translation information.
17Address Translation Table (in main-memory) NAND Flash Translation Layer (NFTL)A logical address under NFTL is divided into a virtual block address and a block offset.e.g., LBA=1019 => virtual block address (VBA) = 1019 / 8 = 127 and block offset = 1019 % 8 = 3NFTLAddress Translation Table (in main-memory)A Primary BlockAddress = 9A Replacement BlockAddress = 23Write data to LBA=1019FreeUsedFreeUsed.FreeUsedUsedFree(9,23)Block Offset=3FreeFreeVBA=127If the page has been used.Write to the first free pageFreeFreeFreeFreeFreeFree
18FTL or NFTL FTL NFTL Memory Space Requirements Large Small Address Translation TimeShortLongGarbage Collection OverheadLessMoreSpace UtilizationHighLowThe memory-space requirements for one 1GB NAND (2KB/Page, 4B/Table Entry, 128 Pages/Block)FTL: 2MB (= 4*(1024*1024*1024)/2K)NFTL: 32KB (= 2*4*(1024*1024*1024)/(2K * 128))
19Size of Translation Tables 1GB32GB1TB32TBFTL2MB64MB2GB64GBNFTL32KB1MB32MBNo matter which kind of granularity of address translation is adopted, the fast growing flash memory capacity would eventually make the translation table too large to be fitted in RAM.
20Adaptive Two-Level Mapping Mechanism The coarse-grained hash table maintains the primary block and its replacement blockThis will be mainly used to identify the primary and replacement blocksThe fine-grained hash table has limited entries to store the excessive live pages in the replacement blockWhen the fine-grained table is full, the replacement policy has to be considered to move some pages to the coarse-grained tableImprove the space utilization.The delayed recycling of any primary block lets free pages of a primary block be likely used in the future.Chin-Hsien Wu and Tei-Wei Kuo, 2006, “An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems,”IEEE/ACM 2006 International Conference on Computer-Aided Design (ICCAD), November 5-9, 2006.
21New Write Constraints of MLC Flash Pages can only be written sequentially in a block.Partial page write/programming is prohibited.Impact on NFTLData can’t be written to any free page of primary blocks.The space utilization in primary blocks is even lower.Most writes are forced to be placed in the replacement block.Pages of invalid data can’t be marked as dead.Each read operation should scan pages of the replacement block.
22Intuitive and Practical Solution Level-paging translation tablesPages of translation tables are stored in flash, and are cached in RAM.Problems:Hit ratio of cached pagesExtra page reads and writes for translation informationCrash recovery for translation table and lost data
23Performance Considerations MLC flash has growing market share:Reason: low cost and high densityDrawbacks: low speed, low endurance, and low reliabilitySolutions:Hardware multi-channel programmingSoftware multi-bank or multi-channel programming
25Static Wear Leveling Random policy It randomly select a block to reclaim after a fixed number of block erases or write requests.It doesn’t track the locality of data acceses, such that it might move hot data that might turn dead in the near future.
26Static Wear Leveling (Cont.) Random policy with block-erasing tableEach-bit flag of the table is to indicate whether the corresponding blocks have been erased.Whenever the block erases are not even enough, select blocks whose corresponding bit flag are not set.Pros and Cons:Pros: it can identify the locality of data accesses.Cons: the block-erasing table needs extra RAM space even it is comparatively small.Yuan-Hao Chang, Jen-Wei Hsieh, Tei-Wei Kuo: Endurance Enhancement of Flash-Memory Storage, Systems:An Efficient Static Wear Leveling Design. DAC 2007:
27The Block Erasing Table (BET) A bit-array: Each bit is for 2k consecutive blocks.Small k – in favor of hot-cold data separationLarge k – in favor of small RAM spaceecnt=1fcnt =1ecnt=3fcnt =2ecnt=2fcnt =2ecnt=0fcnt =0ecnt=0fcnt =0ecnt=1fcnt =1ecnt=2fcnt =2ecnt=3fcnt =2ecnt=4fcnt =2123456789101112131415123456789101112131415Flash1111BETk=0k=2: a block that has been erased in the current resetting interval: an index to a block that the Cleaner wants to erasefcnt: the number of 1’s in the BETecnt: the total number of block erases done since the BET is reset
28A SimpleStatic-Wear Leveler An unevenness level (ecnt / fcnt) >= T Triggering of the SW LevelerResetting of BET when all flags are set.ecnt=2004fcnt =3ecnt=3004fcnt =4ecnt=3999fcnt =4ecnt=2000fcnt =2ecnt=4000fcnt =4ecnt=1999fcnt =2ecnt=3000fcnt =3ecnt=0fcnt =0ecnt=1998fcnt =2ecnt=2998fcnt =3ecnt=2999fcnt =3123456789101112131415T: A threshold, T=1000 in this exampleecnt: the total number of block erases since the BET is erasedfcnt: The number of 1’s in the BET1111: An index that SW Leveler triggers the Cleaner to do garbage collectionNote that the sequential scanning of blocks in the selection of block sets for static wear leveling is very efficitive in the implementation. We surmise that the design is close to that in a random selection policy in reality because cold data could virtually exist in any block in the physical address space of the flash memory.k=2: An index in the selection of a block set: An index to a block that the Cleaner wants to eraseAfter a period of time, the total erase count reaches 2998.4000 / 4 = 1000>=1000 (ecnt/ fcnt>=1000) , but all flags in BET are 1 reset BETAfter a period of time, the total erase count reaches 3999.The Cleaner is triggered to 1. Copy valid data of selected block set to free area,2. Erase block in the selected block set, and3. Inform the Allocator to update the address mapping between LBA and PBAReset to a randomly selected block set (flag)2000 / 2 = 1000 >= 1000 (Ecnt / fcnt >= T)3000 / 3 = 1000 >= 1000 (Ecnt / fcnt >= T): A block that has been erased in the current resetting interval
30File-System Considerations - Observations Archive: Linux sourceNumber of files19,535Number of directories1,200Average file size11 KBArchive size215,666 KBFAT is the default file system for removable storage devicesWrite the files to a removable storage deviceFile systemNTFSFATNumber of write requests24,513179,670Time taken (min:sec)4:3354:21FAT file systems introduce excessive write requests! why ?Read the files from a removable storage deviceFile systemNTFSFATNumber of read requests14,56823,528Time taken (min:sec)2:533:19
31File-System Considerations - Observations (Cont.) Layout of FAT filesystemLBA 0File #1FAT #1FAT #2Directory EntryContent of FileDirectory EntryFile #2FAT #1FAT #2Directory EntryContent of FileDirectory Entryhttps://www.pjrc.com/tech/8051/ide/fat32.html
32File-System Considerations HostCard ReadersApplicationsOperating System APIFile System DriversDisk DriversDevice DriversFilter DriversUSB Flash DrivesBus DriversHostFlash MemoryCardsDeviceDeviceControllerDeviceControllerDeviceStorageMediaStorageMedia(a)(b)
33Overview Cohesive-caching algorithm USB Mass Storage Device Driver CacheTransport UnitDebugUnitDispatch UnitViewerUSB Mass StorageDevice DriverUSB Bus DriverFilesystemIdentifierI/ORequestPartition Table& Boot SectorsFile SystemLayoutTraceRead and Write IoPacketsResponseOtherIoPacketsFilterDriverFIFO QueueNotificationsWhen ?Cohesive-caching algorithm此一 filter driver 介於 USB mass storage device driver 與 USB bus driver 之間，並包含了五個 component，其中cohesive-caching policy 是實作在 cache 這個 component 之中。當裝置一插上時，Transport Unit 就先access devices 的 boot sector，並把資訊傳給 Filesystem identifier，用來偵測device目前的partition 與 file system。然後 filesystem identifier 把偵測到的訊息傳給 Cache。當上層有read/write request來時，就會先傳到 Cache component 裏，但是如果是其他 control commands，就bypass 到 transport unit。而 Dispatch unit 同時會傳送訊息通知 Cache 什麼時候要把cache 的資料flush 到device，以維持資料的完整性。另外 Debug unit 是用來把 debug 的訊息送到console 給 debug viewer。問題是Dispatch unit 什麼時候該通知 cache unit 呢?
34The “Allow/Prevent Medium Removal” Operation Code in the USB Bulk-Only Transport <CBW> Allow Medium Removal<CBW> Prevent Medium Removal<CSW> SuccessUSB Mass Storage Class, Bulk-Only Transport ProtocolCBW: Command Block WrapperCSW: Command Status WrapperFAT File SystemDisk Driver1st FAT2nd FATFile ContentDirectory EntryWWWW…DiskUSB Mass StorageDevice DriverUSB Bus<CBW> Write10 LBA=1446 LEN=1<CSW> Success<DATA>…An IoPacket…hrtimer.cintermodule.citimer.ckallsyms.cSend a notification我們可以從下面的例子來說明:當file system 要求寫入一個檔案時，它會對 disk driver 發出一連串的write request，但是每一個 write command 透過 USB mass storage device driver 時，就會需要發出三個 USB command，Command block wrapper, command status wrapper。因此 write amplification ratio 非常高。然而，USB mass storage command 會在發出一連串的command之前先發出一個 Prevent medium removal，且在一連串的command一後馬上發出 Allow medium removal來通知可能把隨身碟移除。這表示從收到 prevent medium removal command時就可以開始cache 資料，並且在收到 allow medium removal 時可以把cache的資料 flush 掉，這只需要幾個 millisecond就能完成。由於一般的使用者不會在copy 動作還在進行時拔除隨身碟，因此資料流失的機率就會非常低。如果沒有這個特性的話就不知道什麼時候該 cache 資料，什麼時候該 flush資料，而這樣的特性是一般在 file system 裏的 caching system 做不到的
35Worsening Reliability - Narrow Threshold Voltage Window MLC/TLC/QLC technology must squeeze the available window of threshold voltage for each logical stateHigher Bit Error RateLower EnduranceSLCMLCSource:
36Reliability Issues The low-cost MLC flash Has lower erase cycles Has higher and higher bit error rate.Ways to improve reliabilityError correction coding (ECC) passivelyNon-erasure code such as BCH and RSWear leveling positivelyDistribute block erases as even as possibleNon-erasure code: Errors in the received packet can be detected/correctedErasure code: Lost packets can be recovered from other received packets.Note: Erasure code is used in communication Non-erasure code is used in storage systems
37Error Correction Coding Error correction code of a page is stored in its spare area.The time on error correction might be affected bythe location of error bits orthe number of error bits.The space of the ECC hardware is increased as the number of supported error bits increases.Trend of MLC flashThe page size is getting largerThe bit error rate is getting higherFast erasing bitsThe fast worn-out flash cellsVt
38Data Retention Problem The guaranteed data retention: 10 yearsAs the cell size is getting smaller, the number of electrons in the floating gate of a flash cell is getting smaller.For example:A programmed cell can store 10,000 electrons.A lost of only 10% in this number can lead to a wrong read. A loss of less than 2 electrons per week can be tolerated.