International Conference on Supercomputing June 12, 2009

Slides:

Advertisements

Similar presentations

Tuning the Dennis Shasha and Philippe Bonnet, 2013.

Advertisements

Solid State Drive. Advantages Reliability in portable environments and no noise No moving parts Faster start up Does not need spin up Extremely low.

Flash storage memory and Design Trade offs for SSD performance

Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)

1 Stochastic Modeling of Large-Scale Solid-State Storage Systems: Analysis, Design Tradeoffs and Optimization Yongkun Li, Patrick P. C. Lee and John C.S.

Trading Flash Translation Layer For Performance and Lifetime

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Impact of Data Locality on Garbage Collection in SSDs: A General Analytical Study Yongkun Li, Patrick P. C. Lee, John C. S. Lui, Yinlong Xu The Chinese.

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.

Boost Write Performance for DBMS on Solid State Drive Yu LI.

Embedded Real-Time Systems Design Selecting memory.

Modeling a NAND Flash Memory Storage Subsystem Toward a Unified Performance and Power Consumption NAND Flash Memory Model of Embedded and Solid State Secondary.

Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious.

Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,

Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…

Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.

Understanding Intrinsic Characteristics and System Implications of Flash Memory based Solid State Drives Feng Chen, David A. Koufaty, and Xiaodong Zhang.

Redundant Array of Independent Disks

Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.

Origianal Work Of Hyojun Kim and Seongjun Ahn

Embedded System Lab. 서동화 HIOS: A Host Interface I/O Scheduler for Solid State Disk.

Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.

/38 Lifetime Management of Flash-Based SSDs Using Recovery-Aware Dynamic Throttling Sungjin Lee, Taejin Kim, Kyungho Kim, and Jihong Kim Seoul.

Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.

Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,

2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), Lakshmi Ganesh (UT Austin), and Hakim Weatherspoon.

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.

Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.

Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:

+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.

Wei-Shen, Hsu 2013 IEE5011 –Autumn 2013 Memory Systems Solid State Drive with Flash Memory Wei-Shen, Hsu Department of Electronics Engineering National.

Operating Systems CMPSC 473 I/O Management (3) December 07, Lecture 24 Instructor: Bhuvan Urgaonkar.

A Semi-Preemptive Garbage Collector for Solid State Drives

연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin

Lecture 22 SSD. LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?

ExLRU : A Unified Write Buffer Cache Management for Flash Memory EMSOFT '11 Liang Shi 1,2, Jianhua Li 1,2, Chun Jason Xue 1, Chengmo Yang 3 and Xuehai.

Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.

GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.

대용량 플래시 SSD의 시스템 구성, 핵심기술 및 기술동향

Application-Managed Flash

 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.

Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #

Solid State Disk Prof. Moinuddin Qureshi Georgia Tech.

Internal Parallelism of Flash Memory-Based Solid-State Drives

COS 518: Advanced Computer Systems Lecture 8 Michael Freedman

Storage Devices CS 161: Lecture 11 3/21/17.

QoS-aware Flash Memory Controller

Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.

Shiqin Yan, Huaicheng Li, Mingzhe Hao,

The (Solid) State Of Drive Technology

CS 105 Tour of the Black Holes of Computing

An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.

Storage Virtualization

Design Tradeoffs for SSD Performance

HashKV: Enabling Efficient Updates in KV Storage via Hashing

reFresh SSDs: Enabling High Endurance, Low Cost Flash in Datacenters

COS 518: Advanced Computer Systems Lecture 8 Michael Freedman

CS 105 Tour of the Black Holes of Computing

CLUSTER COMPUTING.

Storage Systems Sudhanva Gurumurthi.

PARAMETER-AWARE I/O MANAGEMENT FOR SOLID STATE DISKS

Parallel Garbage Collection in Solid State Drives (SSDs)

COS 518: Advanced Computer Systems Lecture 9 Michael Freedman

CS 295: Modern Systems Storage Technologies Introduction

Endurance Group Management: Host control of SSD Media Organization

Dong Hyun Kang, Changwoo Min, Young Ik Eom

Design Tradeoffs for SSD Performance

Presentation transcript:

FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications International Conference on Supercomputing June 12, 2009 Ji-Yong Shin12, Zeng-Lin Xia1, Ning-Yi Xu1, Rui Gao1, Xiong-Fei Cai1, Seungryoul Maeng2, and Feng-Hsiung Hsu1 1Microsoft Research Asia 2Korea Advanced Institute of Science and Technology Thank you chairman. Good morning ladies and gentlemen, I’m Ji-Yong Shin from KAIST. Today I’ll talk about FTL design exploration in reconfigurable high performance SSD for server application. In this paper we have conducted some simulation based experiments to find the best configuration of flash translation layer, or the FTL, in SSD for a fixed application environment. While carrying out the simulation based on our Reconfigurable High Performance SSD, or the RHPSSD architecture, we have found out some tradeoffs and facts that can be helpful to customize FTLs. For example we have found out that rather than reducing the number of erase operations in SSDs, parallelizing operations are a more effective way to gain higher performance and have shown the trade off between large wear leveling cluster and the performance.

Introduction and Background (1/3) Growing popularity of flash memory and SSD Low latency Low power Solid state reliability SSD widening its range of application Embedded devices Desktop and laptop PC Server and supercomputer SSD expected to revolutionize storage subsystem

Introduction and Background (2/3) Flash memory Erase needed before write Unit of read/write and erase differs Read/Write: page (typically 2 to 4KB) Erase: block (typically 64 pages) Latency for read, write, erase differs Read (25us) < write (250us) < erase (500us) Erase carried out on demand: cleaning or garbage collection Wear-leveling necessary Memory cells wears out when erased Typically a block endures 100K erase operations

Introduction and Background (3/3) Flash translation layer (FTL) Provides abstraction of flash memory characteristics Maintains logical to physical address mapping Carries out cleaning operations Conducts wear leveling FTL in multiple flash chip environment Manages parallelism and wear level among chips Host Machine IO Request Flash Memory FTL Flash Request Flash Request Flash Request Flash Memory module Flash Memory module Flash Memory module

Customized SSD Can Boost Up Servers and Supercomputers Motivation (1/2) Servers and Supercomputing Environment High performance storage subsystem required Applications are usually fixed SSD performance characteristics Highly dependent on FTL design and workloads Customized SSD Can Boost Up Servers and Supercomputers

Based on Reconfigurable High-Performance SSD Architecture, Motivation (2/2) Related Work Flash memory for embedded system or generic SSD Internal hardware’s organizational tradeoffs of SSD [Agrawal et al. USENIX 08] Configuring RAID system considering disk and workload characteristics Our Focus High performance SSD with abundant resource FTL design tradeoffs using different algorithms in each functionalities Customizing FTL considering flash memory and workload characteristics Based on Reconfigurable High-Performance SSD Architecture, we will explore FTL design considerations and tradeoffs and propose guidelines for customizing FTL

Reconfigurable High-Performance SSD (RHPSSD) 2. Wear Leveling for Endurance Among all blocks Among chips, dies and planes RHPSSD architecture High performance 36 independent flash channels 4GB/s PCI Express host-to-SSD interface Flexibility from FPGA for reconfiguring of FTL PCI Express (4GB/s) FPGA FTL or flash controller flash channel controllers for each flash channel Random Access Memory Flash Daughter Board Flash Chip with Independent Channel … Maintaining High Parallelism for Performance! Chip Die Plane

FTL Design Exploration and Analysis Simulation-based method to discover: Logical page to physical flash plane allocation Effect of hot/cold data separation Wear leveling and Cleaning Cleaning analysis for different allocation Wear leveling in different clusters

Simulation Environment and Workloads (1/2) Modified DiskSim 4.0 and SSD plug-in of MSR SVC Various FTL algorithms implemented Basic Configurations RHPSSD architecture Flash chip Latencies (read - 25us, write - 250us, erase - 500us) Two types of chip for different SSD capacities 4GB (2 dies with 2 planes) chip 8GB (4 dies with 2 planes) chip

Simulation Environment and Workloads (2/2) Traces used for simulation Workload Sequential Random Highly IO intensive High data locality SSD Setting Postmark O 144GB SSD IOzone WebDC 288GB SSD TPC-C SQL Exchange 2 x 288GB SSD

Logical Page to Physical Plane Allocation (1/2) Allocation is directly related to parallelism Static allocation Binding logical page address to specific plane Striping methods Wide striping, page striping unit: high parallelism, more cleaning Narrow striping, block striping unit: low parallelism, less cleaning Dynamic allocation Allocate page request to idle plane on runtime Binding logical address to Chip: less degree of freedom SSD: maximum degree of freedom Wide Striping Narrow Striping

Logical Page to Physical Plane Allocation (2/2) Response Time Normalized to STATIC W-PAGE

Hot/Cold Data Separation (1/2) Separating pages according to temperature in each plane Block with hot data are likely to be full of invalid page Block with cold data are likely to maintain its condition Known to reduce erase operation and valid page migration Also leads to smaller response time

Hot/Cold Data Separation (2/2) Improvement after applying the separation (%)

Wear Leveling and Cleaning High performance and wear level of SSD is a different story Static allocation Logical addresses are bounded to plane so no page migration can take place to the outside of the dedicated plane (only local wear leveling) Selecting allocation to evenly wear out each plane is important Dynamic allocation Wear leveling can be carried out in different clusters (chip, SSD) Cluster is the scope where the lifetime of blocks will be maintained evenly The Larger the cluster is, the more even the wear level is in SSD as a whole The Larger the cluster is, the greater the overhead is

Number of Cleaning and Erase Distribution without Wear Leveling # of Operations Normalized to W-Page

Wear Leveling in Different Clusters Wear leveling cluster Group of blocks that wear leveling algorithm maintains the age even The larger the cluster the worse the performance becomes The larger the cluster the evener the age of blocks are

Summary Static vs. dynamic allocation Hot/Cold data separation Static wide striping: dominant sequential IO workloads Page striping unit: small response time, more cleaning Block striping unit: large response time, less cleaning Trade off between response time and cleaning operations Dynamic: dominant random IO workloads Hot/Cold data separation Effective for evenly distributed IO Wear leveling cluster Large cluster: large overhead, even distribution of wear level Small cluster: small overhead, uneven distribution of wear level Trade off between response time and even wear level

Conclusion Algorithms in each FTL functionality studied for high performance SSD Tradeoffs and simple guidelines for designing customized FTL in different workload and SSD’s lifetime requirements presented Please read the paper for more details

Thank you. Questions?