International Conference on Supercomputing June 12, 2009

FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications
International Conference on Supercomputing June 12, 2009 Ji-Yong Shin12, Zeng-Lin Xia1, Ning-Yi Xu1, Rui Gao1, Xiong-Fei Cai1, Seungryoul Maeng2, and Feng-Hsiung Hsu1 1Microsoft Research Asia 2Korea Advanced Institute of Science and Technology Thank you chairman. Good morning ladies and gentlemen, I’m Ji-Yong Shin from KAIST. Today I’ll talk about FTL design exploration in reconfigurable high performance SSD for server application. In this paper we have conducted some simulation based experiments to find the best configuration of flash translation layer, or the FTL, in SSD for a fixed application environment. While carrying out the simulation based on our Reconfigurable High Performance SSD, or the RHPSSD architecture, we have found out some tradeoffs and facts that can be helpful to customize FTLs. For example we have found out that rather than reducing the number of erase operations in SSDs, parallelizing operations are a more effective way to gain higher performance and have shown the trade off between large wear leveling cluster and the performance.

Introduction and Background (1/3)
Growing popularity of flash memory and SSD Low latency Low power Solid state reliability SSD widening its range of application Embedded devices Desktop and laptop PC Server and supercomputer SSD expected to revolutionize storage subsystem

Flash memory Erase needed before write Unit of read/write and erase differs Read/Write: page (typically 2 to 4KB) Erase: block (typically 64 pages) Latency for read, write, erase differs Read (25us) < write (250us) < erase (500us) Erase carried out on demand: cleaning or garbage collection Wear-leveling necessary Memory cells wears out when erased Typically a block endures 100K erase operations

Flash translation layer (FTL) Provides abstraction of flash memory characteristics Maintains logical to physical address mapping Carries out cleaning operations Conducts wear leveling FTL in multiple flash chip environment Manages parallelism and wear level among chips Host Machine IO Request Flash Memory FTL Flash Request Flash Request Flash Request Flash Memory module Flash Memory module Flash Memory module

Customized SSD Can Boost Up Servers and Supercomputers
Motivation (1/2) Servers and Supercomputing Environment High performance storage subsystem required Applications are usually fixed SSD performance characteristics Highly dependent on FTL design and workloads Customized SSD Can Boost Up Servers and Supercomputers

Based on Reconfigurable High-Performance SSD Architecture,
Motivation (2/2) Related Work Flash memory for embedded system or generic SSD Internal hardware’s organizational tradeoffs of SSD [Agrawal et al. USENIX 08] Configuring RAID system considering disk and workload characteristics Our Focus High performance SSD with abundant resource FTL design tradeoffs using different algorithms in each functionalities Customizing FTL considering flash memory and workload characteristics Based on Reconfigurable High-Performance SSD Architecture, we will explore FTL design considerations and tradeoffs and propose guidelines for customizing FTL

Reconfigurable High-Performance SSD (RHPSSD)
2. Wear Leveling for Endurance Among all blocks Among chips, dies and planes RHPSSD architecture High performance 36 independent flash channels 4GB/s PCI Express host-to-SSD interface Flexibility from FPGA for reconfiguring of FTL PCI Express (4GB/s) FPGA FTL or flash controller flash channel controllers for each flash channel Random Access Memory Flash Daughter Board Flash Chip with Independent Channel … Maintaining High Parallelism for Performance! Chip Die Plane

FTL Design Exploration and Analysis
Simulation-based method to discover: Logical page to physical flash plane allocation Effect of hot/cold data separation Wear leveling and Cleaning Cleaning analysis for different allocation Wear leveling in different clusters

Simulation Environment and Workloads (1/2)
Modified DiskSim 4.0 and SSD plug-in of MSR SVC Various FTL algorithms implemented Basic Configurations RHPSSD architecture Flash chip Latencies (read - 25us, write - 250us, erase - 500us) Two types of chip for different SSD capacities 4GB (2 dies with 2 planes) chip 8GB (4 dies with 2 planes) chip

Simulation Environment and Workloads (2/2)
Traces used for simulation Workload Sequential Random Highly IO intensive High data locality SSD Setting Postmark O 144GB SSD IOzone WebDC 288GB SSD TPC-C SQL Exchange 2 x 288GB SSD

Logical Page to Physical Plane Allocation (1/2)
Allocation is directly related to parallelism Static allocation Binding logical page address to specific plane Striping methods Wide striping, page striping unit: high parallelism, more cleaning Narrow striping, block striping unit: low parallelism, less cleaning Dynamic allocation Allocate page request to idle plane on runtime Binding logical address to Chip: less degree of freedom SSD: maximum degree of freedom Wide Striping Narrow Striping

Logical Page to Physical Plane Allocation (2/2)
Response Time Normalized to STATIC W-PAGE

Hot/Cold Data Separation (1/2)
Separating pages according to temperature in each plane Block with hot data are likely to be full of invalid page Block with cold data are likely to maintain its condition Known to reduce erase operation and valid page migration Also leads to smaller response time

Hot/Cold Data Separation (2/2)
Improvement after applying the separation (%)

Wear Leveling and Cleaning
High performance and wear level of SSD is a different story Static allocation Logical addresses are bounded to plane so no page migration can take place to the outside of the dedicated plane (only local wear leveling) Selecting allocation to evenly wear out each plane is important Dynamic allocation Wear leveling can be carried out in different clusters (chip, SSD) Cluster is the scope where the lifetime of blocks will be maintained evenly The Larger the cluster is, the more even the wear level is in SSD as a whole The Larger the cluster is, the greater the overhead is

Number of Cleaning and Erase Distribution without Wear Leveling
# of Operations Normalized to W-Page

Wear Leveling in Different Clusters
Wear leveling cluster Group of blocks that wear leveling algorithm maintains the age even The larger the cluster the worse the performance becomes The larger the cluster the evener the age of blocks are

Summary Static vs. dynamic allocation Hot/Cold data separation
Static wide striping: dominant sequential IO workloads Page striping unit: small response time, more cleaning Block striping unit: large response time, less cleaning Trade off between response time and cleaning operations Dynamic: dominant random IO workloads Hot/Cold data separation Effective for evenly distributed IO Wear leveling cluster Large cluster: large overhead, even distribution of wear level Small cluster: small overhead, uneven distribution of wear level Trade off between response time and even wear level

Conclusion Algorithms in each FTL functionality studied for high performance SSD Tradeoffs and simple guidelines for designing customized FTL in different workload and SSD’s lifetime requirements presented Please read the paper for more details

Thank you. Questions?

International Conference on Supercomputing June 12, 2009

Similar presentations

Presentation on theme: "International Conference on Supercomputing June 12, 2009"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

International Conference on Supercomputing June 12, 2009

Similar presentations

Presentation on theme: "International Conference on Supercomputing June 12, 2009"— Presentation transcript:

Similar presentations

About project

Feedback