Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed.

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

More on File Management
File Systems.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories Presented by Sri.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
File System Implementation
The design and implementation of a log-structured file system The design and implementation of a log-structured file system M. Rosenblum and J.K. Ousterhout.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
1 Operating Systems Chapter 7-File-System File Concept Access Methods Directory Structure Protection File-System Structure Allocation Methods Free-Space.
1 File Management in Representative Operating Systems.
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
1 File System Implementation Operating Systems Hebrew University Spring 2010.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
RAPID-Cache – A Reliable and Inexpensive Write Cache for Disk I/O Systems Yiming Hu Qing Yang Tycho Nightingale.
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
Serverless Network File Systems Overview by Joseph Thompson.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Embedded System Lab. 정영진 The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout ACM Transactions.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Chapter 5 Record Storage and Primary File Organizations
W4118 Operating Systems Instructor: Junfeng Yang.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Jonathan Walpole Computer Science Portland State University
Chapter 11: File System Implementation
Distributed File Systems
CSC 4250 Computer Architectures
Chapter 8: Main Memory.
Operating System Concepts
Chapter 9: Virtual-Memory Management
Filesystems 2 Adapted from slides of Hank Levy
Computer Architecture
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM
Overview Continuation from Monday (File system implementation)
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
CSE 542: Operating Systems
CSE 542: Operating Systems
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed parity: All disks have same number of check units mapped to them, –Balance the accesses to check units during writes or when a failure has occurred Distributed reconstruction: There is a constant K such that, for every pair of disks, K stripes have units mapped to both disks –Ensure that accesses to surviving disks during on-line reconstruction are spread evenly Large write optimization: each stripe contains a contiguous interval of client’s data, to process a write of all (k-1) data units without pre-reading the prior contents of any disk Maximal parallelism: Whenever a client requests a read of n contiguous data units, all n disks are accessed in parallel Efficient mapping: The functions that map client addresses to array locations are efficiently computable, with low time and space requirements

Disk Caching Disk Disk caching disk –Uses inexpensive disk space to cache Raid provides high throughput due to parallelism, does’nt reduce access latency Caching is used to reduce latency –DRAM used for reads Non-volatile RAM can be used for writes but buffer sizes are usually small due to expense LFS can be used to optimize write performance

Drawbacks of LFS Limited implementation –Sprite LFS, BSD-LFS, RAID systems (e.g. HP) –Extensive file system rewrite –Sensitivity to disk capacity utilization –Problems with read performance when read and write access patterns significantly differ due to dispersal of locally nearby blocks

I/O Bursty Large difference between average and maximum I/O rates Large I/O peaks can overwhelm DRAM caches Need DRAM cache or large RAID to rapidly process large number of I/O requests in bursty workload

DCD Disk used to cache logged write data Small DRAM buffer captures background write requests When large write burst occurs, data blocks in DRAM buffer are written to cache disk in a single large transfer Cache disk appears as extension of DRAM buffer RAID systems analogous to interleaved memory, DCD system analogous to multilevel CPU cache Use temporal locality of I/O transfers and idea of LFS to minimize seek time and rotational latency In effect, this provides a fast “cache” disk

DCD Architecture DCD can be implemented using a separate cache disk or cache disk can be a logical partition in the data disk (Figure 1) Data organization on disk is conventional file system DCD’s RAM buffer (figure 2) contains –Two data caches –Physical block address to logical block address table –Destaging buffer

Writing Small writes go directly to a data cache in the RAM buffer –Only one data cache is active at a given time Large requests are sent directly to the data disk If active data cache is full, –controller makes other data cache active –Controller writes the entire contents of the previous active data cache into the cache disk, in one large log format –When log write finishes, data cache is again available Data is written to cache disk whenever cache disk is available (data does not wait until data cache is full) Data stored within 10’s to 100’s of milliseconds –Longest time data has to stay in RAM is time required to write a full log

Reading First search RAM buffer and cache disk Simulation results – 99% of read requests go to data disk DCD is downstream from large read cache which captures most requests for newly written data

Cache Data Organization Host sends request to disk system to access disk block –Provides LBA to indicate the block’s position in the data disk –DCD may cache data in cache disk using different address – PBA –Hash table indexed by LBA’s –Entries are also in a linked list, ordered by their PBA values –Free entry list also maintained to keep track of empty blocks that can be used –CLP (Current log position) is used to indicate end of last log –CLP is like a stack pointer – new log is pushed into the cache disk by writing the log to the disk starting from CLP and appending corresponding PBA-LBA mapping entries for the blocks in the log to the end of the mapping list –When log write finishes, cache disk head is located about the track indicated by the CLP –Figure 3

Destaging Reads large data block from cache disk to destaging buffer Reorders blocks according to their LBA numbers to reduce seek overheads Writes blocks to original locations on disks Once data block is written, destaging processes can invalidate entry in PBA-LBA mapping table to indicate that data is not in cache disk anymore This occurs during idle time if disk cache is close to empty; destaging priority increases as disk cache fills

Enhancing DCD with NVRAM cache NVRAM buffer uses LRU cache and staging buffer instead of double data caches Staging buffer holds 64KB to 256KB PBA-LBA mapping table also in NVRAM Destaging buffer can be RAM as non-volatile data copies are in NVRAM Controller does not flush LRU cache’s contents to cache disk –Modified data is kept in LRU cache to capture write request locality When write request occurs and LRU cache is full, DCD controller copies LRU blocks to staging buffer until buffer is full or LRU cache is empty Space in LRU cache can be released as staging buffer is NVRAM Write request can be satisfied by allocating free block in cache and copying data Data written from staging buffer to cache disk in one large log format

Normal DCD v.s. Traditional Disk Figure 5 Immediate report mode Physical DCD –Simulates two physical drives at soame time Logical DC –Two disk partitions on a single drive to simulate two logical drives DCD’s RAM buffer 512KB Baseline system –RAM cache 4MB Workload – HP traces with I/O requests made by different HP-UX systems Workload traces are overlaid

Enhanced DC with NVRAM Compared enhanced DCD with LRC cached baseline system RAM buffer from 256KB to 4MB Figure 7

Serverless Network File System Workstations cooperate as peers to provide all file system services Any machine can assume responsibilities of failed component Fast LAN can be used as an I/O backplane Prototype file system – xFS Uses cooperative caching –Harvest portions of client memory as large global file cache –Uses ideas from non-uniform memory shared memory architectures (e.g. DASH) to provide scalable cache consistency Dynamically distributes data storage across server disks via software RAID using log based striping

Multiprocessor/File System Cache Consistency Provide uniform view of system –Track where blocks are cached –Invalidate stale cached copies –Common multiprocessor strategy is to divide control of system’s memory between processors –Each processor manages cache consistency state for its own physical memory locations In xFS, control can be located anywhere in the system and can be dynamically migrated during execution Uses token-based cache consistency scheme that manages consistency on a per-block basis (rather than per-file for Sprite or AFS)

xFS’s Consistency Scheme Before client modifies block, must acquire write ownership of the block –Client sends message to block’s manager –Manager invalidates other cached copies of block, updates cache consistency information to indicate new owner, replies to client, giving permission to write –Once client owns block, it may write repeatedly without having to ask manager for ownership each time –When another client reads or writes block, manager revokes ownership, forces client to stop writing to block, flushes any changes to stable storage and forwards data to new client Scheme supports cooperative caching –Read requests can be forwarded to clients with valid cached copies of data

How xFS Distributes Key Datastructures Manager Map –Globally replicated manager map used to locate file’s manager from file’s index number –File’s index number bits extracted and used to index into manager map –Map is table that indicates which physical machines manage which groups of index numbers –Manager of a file tracks disk location metadata and cache consistency state Cache consistency state lists clients caching a block or the client that has write ownership Imap –Each file’s index number has entry in imap that points to that file’s disk metadata in the log –Imap is distributed between managers according to the manager map so managers handle imap entries and cache consistency state of the same files –File’s imap entry contains log address of file’s index node Index node contains disk addresses of file’s data blocks For large files, index node can contain log addresses of indirect and double indirect blocks