236601 - Coding and Algorithms for Memories Lecture 14 1.

Slides:



Advertisements
Similar presentations
LECTURE 2 Map-Reduce for large scale similarity computation.
Advertisements

Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
current hadoop architecture
CSCE430/830 Computer Architecture
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 36 Virtual Memory Read.
Coding and Algorithms for Memories Lecture 12 1.
Coding for Modern Distributed Storage Systems: Part 1. Locally Repairable Codes Parikshit Gopalan Windows Azure Storage, Microsoft.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Algorithm Analysis (Big O) CS-341 Dick Steflik. Complexity In examining algorithm efficiency we must understand the idea of complexity –Space complexity.
Algorithm Analysis (Big O) CS-341 Dick Steflik. Complexity In examining algorithm efficiency we must understand the idea of complexity –Space complexity.
6/5/ TRAP-Array: A Disk Array Architecture Providing Timely Recovery to Any Point-in-time Authors: Qing Yang,Weijun Xiao,Jin Ren University of Rhode.
File System Implementation
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
1 CS/INFO 430 Information Retrieval Lecture 17 Web Search 3.
03/24/2004CSCI 315 Operating Systems Design1 Memory Management and Virtual Memory (Problem session)
More Codes Never Enough. 2 EVENODD Code Basics of EVENODD code  each storage node as a single column # of data nodes k = p (prime) # of total nodes n.
PARTITIONING “ A de-normalization practice in which relations are split instead of merger ”
Codes with local decoding procedures Sergey Yekhanin Microsoft Research.
DatacenterMicrosoft Azure Consistency Connectivity Code.
Failures in the System  Two major components in a Node Applications System.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Cloud Computing All Copyrights reserved to Talal Abu-Ghazaleh Organization
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Amazon Web Services BY, RAJESH KANDEPU. Introduction  Amazon Web Services is a collection of remote computing services that together make up a cloud.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
CS212: DATA STRUCTURES Lecture 1: Introduction. What is this course is about ?  Data structures : conceptual and concrete ways to organize data for efficient.
Main memory DB PDT Ján GENČI. 2 Obsah Motivation DRDBMS MMDBMS DRDBMS versus MMDBMS Commit processing Support in commercial systems.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
File Processing - Hash File Considerations MVNC1 Hash File Considerations.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Memory Management -Memory allocation -Garbage collection.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Coding and Algorithms for Memories Lecture 13 1.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
BIG DATA/ Hadoop Interview Questions.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Lecture 17 Page 1 CS 188,Winter 2015 A Design Problem in Distributed Systems CS 188 Distributed Systems March 10, 2015.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
DATA ENCODING METHODS. When the data is received from a serial digital circuit, it is in a format know as NRZ or Non- Return to Zero. This is most common.
Map reduce Cs 595 Lecture 11.
A Tale of Two Erasure Codes in HDFS
Big Data is a Big Deal!.
Specialized systems are  Inevitable  Already the norm  Practical
CSS534: Parallel Programming in Grid and Cloud
Repair Pipelining for Erasure-Coded Storage
Section 7 Erasure Coding Overview
The Basics of Apache Hadoop
ICOM 6005 – Database Management Systems Design
Computer Programming Machine and Assembly.
Maximally Recoverable Local Reconstruction Codes
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
CS110: Discussion about Spark
Clouds & Containers: Case Studies for Big Data
UNIT IV RAID.
13.3 Accelerating Access to Secondary Storage
X y y = x2 - 3x Solutions of y = x2 - 3x y x –1 5 –2 –3 6 y = x2-3x.
Presentation transcript:

Coding and Algorithms for Memories Lecture 14 1

Large Scale Storage Systems 2 Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) Failures are the norm

Problem Setup Disks are stored together in a group (rack) Disk failures should be supported Requirements: – Support as many disk failures as possible – And yet… Optimal and fast recovery Low complexity 3

Reed Solomon Codes Advantages: – Support the maximum number of disk failures – Are very comment in practice and have relatively efficient encoding/decoding schemes Disadvantages – Require to work over large fields Solution: EvenOdd Codes – Need to read all the disks in order to recover even a single disk failure – not efficient rebuild Solution: ZigZag Codes – Solution: Locally Recoverable Codes (LRC) 4

Locally Recoverable Codes (LRC) k k k+1 n n 1 2 r r-1 2 2

Locally Recoverable Codes (LRC) A Locally Recoverable Code (n,k,r) is a code of length n, dimension k, and locality r The problem: Given n,k,r what is the best minimum distance d of the code? A code achieving the maximum d is called an optimal LRC code 6