Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University.

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
Snapshots in a Flash with ioSnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Copyright © 2014.
Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System"
The Zebra Striped Network File System Presentation by Joseph Thompson.
CS-3013 & CS-502, Summer 2006 More on File Systems1 More on Disks and File Systems CS-3013 & CS-502 Operating Systems.
Allocation Methods - Contiguous
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
Chapter 11: File System Implementation
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
1 Outline File Systems Implementation How disks work How to organize data (files) on disks Data structures Placement of files on disk.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Computer Organization and Architecture
File System Implementation
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
1 File System Implementation Operating Systems Hebrew University Spring 2010.
CS 6560 Operating System Design Lecture 13 Finish File Systems Block I/O Layer.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 8: Main Memory.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials– 8 th Edition Chapter 10: File System Implementation.
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), Lakshmi Ganesh (UT Austin), and Hakim Weatherspoon.
Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Advanced file systems: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin)
CS333 Intro to Operating Systems Jonathan Walpole.
LINUX 15 : Memory Management
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
Part III Storage Management
File System Department of Computer Science Southern Illinois University Edwardsville Spring, 2016 Dr. Hiroshi Fujinoki CS 314.
Application-Managed Flash
CPSC 426: Building Decentralized Systems Persistence
Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #
Introduction to Operating Systems Concepts
Module 12: I/O Systems I/O hardware Application I/O Interface
Jonathan Walpole Computer Science Portland State University
CE 454 Computer Architecture
Chapter 11: File System Implementation
FileSystems.
Operating System I/O System Monday, August 11, 2008.
Swapping Segmented paging allows us to have non-contiguous allocations
Filesystems 2 Adapted from slides of Hank Levy
Main Memory Background Swapping Contiguous Allocation Paging
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
Selecting a Disk-Scheduling Algorithm
CS703 - Advanced Operating Systems
Overview Continuation from Monday (File system implementation)
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 14: File-System Implementation
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
Department of Computer Science
Page Cache and Page Writeback
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University

Gecko Save power by spinning/powering down disks – E.g. RAID-1 mirror scheme with 5 primary/mirrors – File system (FS) access pattern of disk is arbitrary Depends on FS internals, and gets worse as FS ages – When to turn disks off? What if prediction is wrong? Block Device write(fd,…) read(fd,…)

Predictable Writes Access same disks predictably for long periods – Amortize the cost of spinning down & up disks Idea: Log Structured Storage/File System – Writes go to the head of the log until disk(s) full Block Device write(fd,…) Log head Log tail

Unpredictable Reads What about reads? May access any part of log! – Keep only the primary disks spinning Trade off read throughput for power savings – Can afford to spin up disks on demand as load surges File/buffer cache absorbs read traffic anyway Block Device write(fd,…) Log head Log tail read(fd,…)

Stable Throughput Unlike LSF, reads do not interfere with writes – Keep data from head (written) disks in file cache – Log cleaning not on the critical path Afford to incur penalty of on-demand disk spin-up Return reads from primary, clean log from mirror Block Device write(fd,…) Log head Log tail read(fd,…)

Design Virtual File System (VFS) File/Buffer Cache Mapping Layer Disk Filesystem Block Device File Disk Filesystem Generic Block Layer Device Mapper I/O Scheduling Layer (anticipatory, CFQ, deadline, null) Block Device Drivers

Design Overview Log structured storage at block level – Akin to SSD wear-leveling Actually, supersedes on-chip wear-leveling of SSDs – The design works with RAID-1, RAID-5, and RAID-6 RAID-5 RAID-4 due to the append-nature of log – The parity drive(s) are not a bottleneck since writes are appends Prototype as a Linux kernel dm (device-mapper) – Real, high-performance, deployable implementation

Challenges dm-gecko – All IO requests at this storage layer are asynchronous – SMP-safe: leverages all available CPU cores – Maintain in-core (RAM) large memory maps battery backed NVRAM, and persistently stored on SSD Map: virtual block linear block disk block (8 sectors) To keep maps manageable: block size = page size (4K) – FS layered atop uses block size = page size – Log cleaning/garbage collection (gc) in the background Efficient cleaning policy: when write IO capacity is available

Commodity Architecture Dual Socket Multi-core CPUs OCZ RevoDrive PCIe x4 SSD 2TB Hitachi HDS72202 Disks Battery Backed RAM Dell PowerEdge R710

dm-gecko In-memory map (one-level of indirection) virtual block: conventional block array exposed to VFS linear block: the collection of blocks structured as a log – Circular ring structure E.g.: READs are simply indirected Linear Block Device Log head Log tail Virtual Block Device read block Free blocks Used blocks

dm-gecko WRITE operations are append to log head – Allocate/claim the next free block Schedule log compacting/cleaning (gc) if necessary – Dispatch write IO on new block Update maps & log on IO completion Linear Block Device Log head Log tail Virtual Block Device write block Free blocks Used blocks

dm-gecko TRIM operations free the block Schedule log compacting/cleaning (gc) if necessary – Fast forward the log tail if the tail block was trimmed Linear Block Device Log head Log tail Virtual Block Device trim block Free blocks Used blocks

Log Cleaning Garbage collection (gc) block compacting – Relocate the used block that is closest to tail Repeat until compact (e.g. watermark), or fully contiguous – Use spare IO capacity, do not run when IO load is high – More than enough CPU cycles to spare (e.g. 2x quad core) Linear Block Device Log head Log tail Virtual Block Device Free blocks Used blocks

Gecko IO Requests All IO requests at storage layer are asynchronous – Storage stack is allowed to reorder requests – VFS, file system mapping, and file/buffer cache play nice – Un-cooperating processes may trigger inconsistencies Read/write and write/write conflicts are fair game Log cleaning interferes w/ storage stack requests – SMP-safe solution that leverages all available CPU cores – Request ordering is enforced as needed At block granularity

Request Ordering Block b has no prior pending requests – Allow read or write request to run, mark block w/ pending IO – Allow gc to run, mark block as being cleaned Block b has prior pending read/write requests – Allow read or write requests, track the number of `pending IO – If gc needs to run on block b, defer until all read/write requests have completed (zero `pending IOs on block b) Block b is being relocated by the gc – Discard gc requests on same block b (doesnt actually occur) – Defer all read/write requests until gc has completed on block b

Limitations In-core memory map (there are two maps) – Simple, direct map requires lots of memory – Multi-level map is complex Akin to virtual memory paging, only simpler – Fetch large portions of the map on demand from larger SSD Current prototype uses two direct maps: Linear (total)disk capacity Block size# of map entries Size of map entryMemory per map 6 TB4 KB3 x bytes / 32 bits6 GB 8 TB4 KB bytes / 32 bits8 GB