Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures.

Slides:



Advertisements
Similar presentations
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
Advertisements

I/O Management and Disk Scheduling Chapter 11. I/O Driver OS module which controls an I/O device hides the device specifics from the above layers in the.
High-Performance Computing Seminar © Toni Cortes A Case for Heterogeneous Disk Arrays Toni Cortes.
CS 6560: Operating Systems Design
CSCE430/830 Computer Architecture
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Storage Systems: Advanced Topics Learning Objectives: To understand limitations of “one file system per partition” model To understand Logical Volume Management.
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories Presented by Sri.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Storage (cont’d) Disk scheduling Reducing seek time (cont’d) Reducing rotational latency RAIDs.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
Multiprocessing Memory Management
CS 104 Introduction to Computer Science and Graphics Problems Operating Systems (4) File Management & Input/Out Systems 10/14/2008 Yang Song (Prepared.
Chapter 12 – Disk Performance Optimization Outline 12.1 Introduction 12.2Evolution of Secondary Storage 12.3Characteristics of Moving-Head Disk Storage.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Cloud Data Center/Storage Power Efficiency Solutions Junyao Zhang 1.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
Redundant Array of Independent Disks
1 File System Implementation Operating Systems Hebrew University Spring 2010.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Section 11.1 Identify customer requirements Recommend appropriate network topologies Gather data about existing equipment and software Section 11.2 Demonstrate.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Topic: Disks – file system devices. Rotational Media Sector Track Cylinder Head Platter Arm Access time = seek time + rotational delay + transfer time.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
1 Multimedia Storage Issues. NUS.SOC.CS5248 OOI WEI TSANG 2 Media vs. Documents large file size write once, read many deadlines!
High Availability in Clustered Multimedia Servers Renu Tewari Daniel M. Dias Rajat Mukherjee Harrick M. Vin.
Virtualization for Storage Efficiency and Centralized Management Genevieve Sullivan Hewlett-Packard
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
A Prediction-based Fair Replication Algorithm in Structured P2P Systems Xianshu Zhu, Dafang Zhang, Wenjia Li, Kun Huang Presented by: Xianshu Zhu College.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Deconstructing Storage Arrays Timothy E. Denehy, John Bent, Florentina I. Popovici, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin,
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Fast File System 2/17/2006. Introduction Paper talked about changes to old BSD 4.2 File System (FS) Motivation - Applications require greater throughput.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
File management and Performance. File Systems Architecture device drivers physical I/O (PIOCS) logical I/O (LIOCS) access methods File organization and.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
1 Virtual Memory. Cache memory: provides illusion of very high speed Virtual memory: provides illusion of very large size Main memory: reasonable cost,
NUS.SOC.CS5248 Ooi Wei Tsang 1 Course Matters. NUS.SOC.CS5248 Ooi Wei Tsang 2 Make-Up Lecture This Saturday, 23 October TR7, 1-3pm Topic: “CPU scheduling”
Tackling I/O Issues 1 David Race 16 March 2010.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 27 – Media Server (Part 2) Klara Nahrstedt Spring 2009.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
W4118 Operating Systems Instructor: Junfeng Yang.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
OPERATING SYSTEMS CS 3502 Fall 2017
Measurement-based Design
Disks and RAID.
So far we have covered … Basic visualization algorithms
Storage Virtualization
ICOM 6005 – Database Management Systems Design
UNIT IV RAID.
Data Placement Problems in Database Applications
Virtual Memory: Working Sets
Presentation transcript:

Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

What Storage Networks are? Persistent Storage – Hard Disks Device capacity is doubled every months – data grows faster Use many disks Need to protect, access, and manage the ever-growing volume of storage assets Storage Networks – Motivation 2

Hardware Failures Storage Networks – Motivation Trace collected from the Internet Archive (March 2003) courtesy of David Pease (UCSC) & Kelly Gottlib 3

Heterogen Storage Networks Increasing system speed, capacity: add new disks New disks usually have different characteristics than the older disks in the system. Many modern storage systems are distributed: Ethernet, FibreChannel. How to exploit this heterogeneity? Storage Networks – Motivation 4

Goal Storage system requirements: –space and access balance –availability –resource efficiency –access efficiency –heterogeneity –adaptivity –locality Very difficult to meet ALL requirements. Storage Networks – Motivation 5

Outline Model AdaptRaid HERA RIO Conclusions Storage Networks 6

What Model to Use? Why not to use the layout of external memory algorithms? –We need solution for all the (sub)problems –One has to bypass operating system: complex task Therefore different abstraction level: –Set of disks characterized by capacity and bandwidth –Connection network is unrestricted: e.g. SCSI, P2P Storage Networks – Model 7

Model assumptions Disk access patterns generated by file system (OS) Difficult to predict these and can change Assume uniform pattern, our goal is to distribute data evenly Storage Networks – Model 8

Outline Model AdaptRaid HERA RIO Conclusions Storage Networks 9

Heterogeneous Storage Networks Straightforward solution: –Clustering disks according their characteristics –We can have many clusters –Easy to extend –New, faster do not improve overall response time Randomized batched solution [Sanders] : –Map randomly data to disks –Schedule a batch of accesses by solving a network flow problem –Unfeasible for large systems: many flow problems to be solved –Batch like behavior is a disadvantage. 10 Storage Networks – Heterogeneity

RAID Redundant Array of Inexpensive Disks RAID level 0: –Striping data across a set of disks RAID level 5: –Add a redundancy block per stripe –Distribute redundancy information evenly on every disk 11 Storage Networks – AdaptRaid

AdaptRaid 0 12 Storage Networks – AdaptRaid Basic idea: –Load each disk depending on its characteristics First solution: –Use all disks like in RAID0 until smallest disk is full –Then, discard full disks, and continue the same way –Distribution continues until all disks are full Lower portion of address space has better access times Extend RAID layout for heterogeneity [Cortes, Labarta]

AdaptRaid 0 – Reducing Variance 13 Storage Networks – AdaptRaid Reduce variance: –Algorithm temporarly assumes that disks are smaller. –Repeat pattern more times Stripes in a Pattern (SIP) defines the size of the pattern and the degree of variance Each disk has the same number of blocks like before

AdaptRaid 5 14 Storage Networks – AdaptRaid Similar idea, but one block is used for parity information Difference: A write implies updating of the parity. If not all the blocks in the stripe are written, a write needs additional read: small-write problem

AdaptRaid 5 – Small-write Solution 15 Storage Networks – AdaptRaid Reference stripe: OS assumes to be a full stripe Size of every stripe is a divisor of the reference stripe Logically three steps: –Decrease strip size –Distribute evenly empty space on all disks –Apply Tetris like method to fill empty blocks

AdaptRaid 5 – variance reduction Storage Networks – AdaptRaid We can use similar variance reduction like in AdaptRaid 0: –Repeat more times a smaller pattern 16

AdaptRaid – generalization Storage Networks – AdaptRaid What if bigger disks are not the faster ones? Until now we tried to use all blocks in a disk, now we want to use less blocks on slow disks Utilization Factor (UF): –0..1 value per disk UF can be set based: –disk size (until now) –performance 17

AdaptRaid – summary Storage Networks – AdaptRaid Decide UF for every disk: –How much we want to load a disk Decide SIP for the system: –How big the pattern is Performance: Adaptivity Speedup AdaptRaid 0: RAID 0 8%-35% AdaptRaid 5: ? < 30% Performance measured by simulators. 18

Outline Model AdaptRaid HERA RIO Conclusions Storage Networks 19

Heterogeneous Extension of RAID Disk merging tehnique Disks are partitioned into logical disks Logical disks have the same bandwidth and capacity We group logical disks in G parity groups We have G homogeneous systems. Storage Networks – HERA 20

Heterogeneous Extension of RAID Constraint: Each logical disk in a parity group should map to different physical disk Storage Networks – HERA 21

Heterogeneous Extension of RAID Read: online load balancing algorihtm directs request for a block to the disk with the least loaded disk. Every disk has a queue with all reads and deadlines. Deliver requested blocks based on deadline, and location on disk (to minimize seek-time overhead) Storage Networks – HERA 22

Heterogeneous Extension of RAID The availability is almost as good as the homogeneous case (RAID 5). But much more flexible than RAID 5. Performance relies on logical disk distribution, which is the task of administrator The authors recently proposed a configuration planning algorithm which optimizes for bandwidth and storage: [Zimmermann, Ghandeharizadeh: Highly Available and Heterogeneous Continuous Media Storage Systems] December 2004 Storage Networks – HERA 23

Outline Model AdaptRaid HERA RIO Conclusions Storage Networks 24

Random I/O Mediaserver Randomized distribution strategy Concentrates on delivering multimedia objects. Optimized for real-time reading: –Video on demand –3D interactive virtual world navigation –Interactive scientific visualization Idea: place data unit on a random disk at a random position. This will insure a long term load balance. Storage Networks – RIO 25

Homogeneous RIO – Data Placement A multimedia object is composed of a sequence of constant size data block. Data block is placed on random disk on random location -> long term load balancing By replicating a fraction of the data blocks, we allow short term balancing Storage Networks – RIO 26

Homogeneous RIO – Read Scheduler All reads have a deadline. Non real-time request have infinite deadline. Request for a block is routed to the disk with the least load A disk serves more blocks request in a cycle: –A number of blocks are selected from the disk request queue –The selected requests are reordered according to their location on disk to minimize the seek-time overhead and serviced. Storage Networks – RIO 27

Heterogeneous RIO – Data Placement Place data to a disk with probability proportional to its size: Probability to place data on disk: Note that: Disk capacity increasing faster than disk bandwidth -> faster, bigger disks are going to be bottleneck Storage Networks – RIO 28

Heterogeneous RIO – BSR n disks ( D i ): –Capacity: C i –Bandwidth: B i Total capacity: Total bandwidth: Bandwidth space ratio (BSR): BSR is a hint how much load disk can take Storage Networks – RIO 29

Heterogeneous RIO – Clusters Goal: redirect load from small BSR disks to higher BSR disks. Group disks in clusters based on their BSRs. Low BSP clusters would have high load. How much replication do we need to sustain a certain load? Storage Networks – RIO 30

Heterogeneous RIO – Replication Factor We want to sustain a maximum load of Data without replicas: Maximum load on a cluster is: To use all bandwidth we need : -> Storage Networks – RIO 31

Heterogeneous RIO – Summary Randomized data placement Read scheduler to optimized read bandwidth Based on disk characteristics we need different replication factor to sustain certain bandwidth Authors claim that in a few years 10% to 40% replication is sufficient to allow to use the full aggregate bandwidth of the network Storage Networks – RIO 32

Outline Model AdaptRaid HERA RIO Conclusions Storage Networks 33

Conclusions All three methods concentrate on optimizing bandwidth and space utilization. Adaptivity is hard to achieve AdaptRaid and HERA –Deterministic –Extend homogeneous RAID –AdaptRaid 5 wastes space? RIO –Randomized –How fast is read scheduler? –The only one where the autors showed a real-life implementation (Virtual World Data Center) Storage Networks – Conclusions 34

Storage Networks Thank You! Questions? Bálint Miklós 35