Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University.

Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University

Gecko Save power by spinning/powering down disks – E.g. RAID-1 mirror scheme with 5 primary/mirrors – File system (FS) access pattern of disk is arbitrary Depends on FS internals, and gets worse as FS ages – When to turn disks off? What if prediction is wrong? Block Device write(fd,…) read(fd,…)

Predictable Writes Access same disks predictably for long periods – Amortize the cost of spinning down & up disks Idea: Log Structured Storage/File System – Writes go to the head of the log until disk(s) full Block Device write(fd,…) Log head Log tail

Unpredictable Reads What about reads? May access any part of log! – Keep only the primary disks spinning Trade off read throughput for power savings – Can afford to spin up disks on demand as load surges File/buffer cache absorbs read traffic anyway Block Device write(fd,…) Log head Log tail read(fd,…)

Stable Throughput Unlike LSF, reads do not interfere with writes – Keep data from head (written) disks in file cache – Log cleaning not on the critical path Afford to incur penalty of on-demand disk spin-up Return reads from primary, clean log from mirror Block Device write(fd,…) Log head Log tail read(fd,…)

Design Virtual File System (VFS) File/Buffer Cache Mapping Layer Disk Filesystem Block Device File Disk Filesystem Generic Block Layer Device Mapper I/O Scheduling Layer (anticipatory, CFQ, deadline, null) Block Device Drivers

Design Overview Log structured storage at block level – Akin to SSD wear-leveling Actually, supersedes on-chip wear-leveling of SSDs – The design works with RAID-1, RAID-5, and RAID-6 RAID-5 RAID-4 due to the append-nature of log – The parity drive(s) are not a bottleneck since writes are appends Prototype as a Linux kernel dm (device-mapper) – Real, high-performance, deployable implementation

Challenges dm-gecko – All IO requests at this storage layer are asynchronous – SMP-safe: leverages all available CPU cores – Maintain in-core (RAM) large memory maps battery backed NVRAM, and persistently stored on SSD Map: virtual block linear block disk block (8 sectors) To keep maps manageable: block size = page size (4K) – FS layered atop uses block size = page size – Log cleaning/garbage collection (gc) in the background Efficient cleaning policy: when write IO capacity is available

Commodity Architecture Dual Socket Multi-core CPUs OCZ RevoDrive PCIe x4 SSD 2TB Hitachi HDS72202 Disks Battery Backed RAM Dell PowerEdge R710

dm-gecko In-memory map (one-level of indirection) virtual block: conventional block array exposed to VFS linear block: the collection of blocks structured as a log – Circular ring structure E.g.: READs are simply indirected Linear Block Device Log head Log tail Virtual Block Device read block Free blocks Used blocks

dm-gecko WRITE operations are append to log head – Allocate/claim the next free block Schedule log compacting/cleaning (gc) if necessary – Dispatch write IO on new block Update maps & log on IO completion Linear Block Device Log head Log tail Virtual Block Device write block Free blocks Used blocks

dm-gecko TRIM operations free the block Schedule log compacting/cleaning (gc) if necessary – Fast forward the log tail if the tail block was trimmed Linear Block Device Log head Log tail Virtual Block Device trim block Free blocks Used blocks

Log Cleaning Garbage collection (gc) block compacting – Relocate the used block that is closest to tail Repeat until compact (e.g. watermark), or fully contiguous – Use spare IO capacity, do not run when IO load is high – More than enough CPU cycles to spare (e.g. 2x quad core) Linear Block Device Log head Log tail Virtual Block Device Free blocks Used blocks

Gecko IO Requests All IO requests at storage layer are asynchronous – Storage stack is allowed to reorder requests – VFS, file system mapping, and file/buffer cache play nice – Un-cooperating processes may trigger inconsistencies Read/write and write/write conflicts are fair game Log cleaning interferes w/ storage stack requests – SMP-safe solution that leverages all available CPU cores – Request ordering is enforced as needed At block granularity

Request Ordering Block b has no prior pending requests – Allow read or write request to run, mark block w/ pending IO – Allow gc to run, mark block as being cleaned Block b has prior pending read/write requests – Allow read or write requests, track the number of `pending IO – If gc needs to run on block b, defer until all read/write requests have completed (zero `pending IOs on block b) Block b is being relocated by the gc – Discard gc requests on same block b (doesnt actually occur) – Defer all read/write requests until gc has completed on block b

Limitations In-core memory map (there are two maps) – Simple, direct map requires lots of memory – Multi-level map is complex Akin to virtual memory paging, only simpler – Fetch large portions of the map on demand from larger SSD Current prototype uses two direct maps: Linear (total)disk capacity Block size# of map entries Size of map entryMemory per map 6 TB4 KB3 x 2 29 4 bytes / 32 bits6 GB 8 TB4 KB2 31 4 bytes / 32 bits8 GB

Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University.

Similar presentations

Presentation on theme: "Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University.

Similar presentations

Presentation on theme: "Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University."— Presentation transcript:

Similar presentations

About project

Feedback