CS 295: Modern Systems Organizing Storage Devices

Slides:



Advertisements
Similar presentations
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
Advertisements

RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID: Redundant Array of Inexpensive Disks Supplemental Material not in book.
RAID Redundant Array of Independent Disks
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
1 CSC 486/586 Network Storage. 2 Objectives Familiarization with network data storage technologies Understanding of RAID concepts and RAID levels Discuss.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
High Performance Computing Course Notes High Performance Storage.
Storage Area Network (SAN)
Storage Networking. Storage Trends Storage growth Need for storage flexibility Simplify and automate management Continuous availability is required.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
By : Nabeel Ahmed Superior University Grw Campus.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
Server Types Different servers do different jobs. Proxy Servers Mail Servers Web Servers Applications Servers FTP Servers Telnet Servers List Servers Video/Image.
Managing Storage Lesson 3.
Architecture of intelligent Disk subsystem
TPT-RAID: A High Performance Multi-Box Storage System
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Trends In Network Industry - Exploring Possibilities for IPAC Network Steven Lo.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Day11 Devices/LAN/WAN. Network Devices Hub Switches Bridge Router Gateway.
The concept of RAID in Databases By Junaid Ali Siddiqui.
VMware vSphere Configuration and Management v6
Ethernet. Ethernet  Ethernet is the standard communications protocol embedded in software and hardware devices, intended for building a local area network.
RAID Systems Ver.2.0 Jan 09, 2005 Syam. RAID Primer Redundant Array of Inexpensive Disks random, real-time, redundant, array, assembly, interconnected,
Storage Networking. Storage Trends Storage grows %/year, gets more complicated It’s necessary to pool storage for flexibility Intelligent storage.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
SAN 2001 Session 201 The Future of Storage Area Networks Milan Merhar Chief Scientist, Pirus Networks
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.
Server Performance, Scaling, Reliability and Configuration Norman White.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Managing Storage Module 3.
Network customization
Storage HDD, SSD and RAID.
Storage Area Networks The Basics.
Video Security Design Workshop:
Network Attached Storage Overview
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Storage Networking.
Unit OS10: Fault Tolerance
Introduction to Networks
Introduction to Networks
Storage Virtualization
Module – 7 network-attached storage (NAS)
Storage Networking.
RAID RAID Mukesh N Tekwani
Storage Networks and Storage Devices
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Distributed File Systems
Mark Zbikowski and Gary Kimura
Distributed File Systems
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Distributed File Systems
RAID RAID Mukesh N Tekwani April 23, 2019
Distributed File Systems
Network customization
Lecture 4: File-System Interface
Distributed File Systems
Improving performance
Presentation transcript:

CS 295: Modern Systems Organizing Storage Devices FPGA-1

Redundant Array of Independent Disks (RAID) Technology of managing multiple storage devices Typically in a single machine/array, due to limitations of fault-tolerance Multiple levels, depending on how to manage fault-tolerance RAID 0 and RAID 5 most popular right now RAID 0: No fault tolerance, blocks striped across however many drives Fastest performance Drive failure results in data loss Block size configurable Similar in use cases to the Linux Logical Volume manager (LVM)

Fault-Tolerance in RAID 5 RAID 5 stripes blocks across available storage, but also stores a parity block Parity block calculated using xor (A1^A2^A3=AP) One disk failure can be recovered by re-calculating parity A1 = AP^A2^A3, etc Two disk failure cannot be recovered Slower writes, decreased effective capacity A1 A2 A3 AP B1 B2 BP B3 Storage 1 Storage 2 Storage 3 Storage 4

Degraded Mode in RAID 5 In case of a disk failure it enters the “degraded mode” Accesses from failed disk is served by reading all others and xor’ing them (slower performance) The failed disk must be replaced, and then “rebuilt” All other storages are read start-to-finish, and parity calculated to recover the original data With many disks, it takes long to read everything – “Declustering” to create multiple parity domains Sometimes a “hot spare” disk is added to be idle, and quickly replace a failed device

Storage in the Network Prepare for lightning rounds of very high-level concepts!

Network-Attached Storage (NAS) Intuition: Server dedicated to serving files “File Server” File-level abstraction NAS device own the local RAID, File system, etc Accessed via file system/network protocol like NFS (Network File System), or FTP Fixed functionality, using embedded systems with acceleration Hardware packet processing, etc Regular Linux servers also configured to act as NAS Each NAS node is a separate entity – Larger storage cluster needs additional management

Network-Attached Storage (NAS) Easy to scale and manage compared to direct-attached storage Buy a NAS box, plug it into an Ethernet port Need more storage? Plug in more drives into the box Difficult to scale out of the centralized single node limit Single node performance limitations Server performance, network performance Client Client Mem CPU Ethernet, etc Client

Storage-Area Networks (SAN) In the beginning: separate network just for storage traffic Fibre Channel, etc, first created because Ethernet was too slow Switch, hubs, and the usual infrastructure Easier to scale, manage by adding storage to the network Performance distributed across many storage devices Block level access to individual storage nodes in the network Controversial opinion: Traditional separate SAN is dying out Ethernet is unifying all networks in the datacenter 10 GbE, 40 GbE slowly subsuming Fibre Channel, Infiniband, …

Converged Infrastructure Computation, Memory, Storage converged into a single unit, and replicated Became easier to manage compared to separate storage domains Software became better (Distributed file systems, MapReduce, etc) Decreased complexity – When a node dies, simply replace the whole thing Cost-effective by using commercial off-the-shelf parts (PCs) Economy of scale No special equipment (e.g., SAN) Chris von Nieda, “How Does Google Work,” 2010

Hyper-Converged Infrastructure Still (relatively) homogenous units of compute, memory, storage Each unit is virtualized, disaggregated via software E.g., storage is accessed as a pool as if on a SAN Each unit can be scaled independently A cloud VM can be configured to access an arbitrary amount of virtual storage Example: vmware vSAN

Object Storage Instead of managing content-oblivious blocks, the file system manages objects with their own metadata Instead of directory/file hierarchies, each object addressed via global identifier Kind of like key-value stores, in fact, the difference is ill-defined e.g., Lustre, Ceph object store An “Objest Storage Device” is storage hardware that exposes an object interface Still mostly in research phases High level semantics of storage available to the hardware controller for optimization

Research Object Storage Device: LightStore