A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.

Slides:



Advertisements
Similar presentations
Weed File System Simple and highly scalable distributed file system (NoFS)
Advertisements

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Sage Weil Scott Brandt Ethan Miller Darrell Long Carlos Maltzahn University of California, Santa.
Ceph scalable, unified storage files, blocks & objects Tommi Virtanen / DreamHostOpenStack Conference
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Case Study - GFS.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
The Hadoop Distributed File System
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Managing Disks and Drives Chapter 13 powered by dj.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Ceph: A Scalable, High-Performance Distributed File System
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
VMware vSphere Configuration and Management v6
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
“Big Storage, Little Budget” Kyle Hutson Adam Tygart Dan Andresen.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Awesome distributed storage system
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
An Introduction to GPFS
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
File system: Ceph Felipe León fi Computing, Clusters, Grids & Clouds Professor Andrey Y. Shevel ITMO University.
Section 4 Block Storage with SES
Services DFS, DHCP, and WINS are cluster-aware.
Software Systems Development
Chapter 11: File System Implementation
CSS534: Parallel Programming in Grid and Cloud
Large-scale file systems and Map-Reduce
Section 7 Erasure Coding Overview
A Survey on Distributed File Systems
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
湖南大学-信息科学与工程学院-计算机与科学系
Hadoop Technopoints.
Lecture 15 Reading: Bacon 7.6, 7.7
Specialized Cloud Architectures
CS 295: Modern Systems Organizing Storage Devices
Presentation transcript:

A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing – SICS; Yahoo! Developer Network MapReduce Tutorial

EXTRA MATERIAL

CEPH – A HDFS replacement

What is Ceph? Ceph is a distributed, highly available unified object, block and file storage system with no SPOF running on commodity hardware

Ceph Architecture – Host Level At the host level… We have Object Storage Devices (OSDs) and Monitors Monitors keep track of the components of the Ceph cluster (i.e. where the OSDs are) The device, host, rack, row, and room are stored by the Monitors and used to compute a failure domain OSDs store the Ceph data objects A host can run multiple OSDs, but it needs to be appropriately provisioned

Ceph Architecture – Block Level At the block device level... Object Storage Device (OSD) can be an entire drive, a partition, or a folder OSDs must be formatted in ext4, XFS, or btrfs (experimental).

Ceph Architecture – Data Organization Level At the data organization level… Data are partitioned into pools Pools contain a number of Placement Groups (PGs) Ceph data objects map to PGs (via a modulo of hash of name) PGs then map to multiple OSDs.

Ceph Placement Groups Ceph shards a pool into placement groups distributed evenly and pseudo-randomly across the cluster The CRUSH algorithm assigns each object to a placement group, and assigns each placement group to a set of OSDs—creating a layer of indirection between the Ceph client and the OSDs storing the copies of an object The CRUSH algorithm dynamically assigns each object to a placement group and then assigns each placement group to a set of Ceph OSDs This layer of indirection allows the Ceph storage cluster to re-balance dynamically when new Ceph OSD come online or when Ceph OSDs fail RedHat Ceph Architecture v1.2.3

Ceph Architecture – Overall View vities/tf- storage/ws16/slides/ low_cost_storage_ceph- openstack_swift.pdf

Ceph Architecture – RADOS An Application interacts with a RADOS cluster RADOS (Reliable Autonomic Distributed Object Store) is a distributed object service that manages the distribution, replication, and migration of objects On top of that reliable storage abstraction Ceph builds a range of services, including a block storage abstraction (RBD, or Rados Block Device) and a cache-coherent distributed file system (CephFS).

Ceph Architecture – RADOS Components

Ceph Architecture – Where Do Objects Live?

Ceph Architecture – Where Do Objects Live? Contact a Metadata server?

Ceph Architecture – Where Do Objects Live? Or calculate the placement via static mapping?

Ceph Architecture – CRUSH Maps

Ceph Architecture – CRUSH Maps Data objects are distributed across Object Storage Devices (OSD), which refers to either physical or logical storage units, using CRUSH (Controlled Replication Under Scalable Hashing) CRUSH is a deterministic hashing function that allows administrators to define flexible placement policies over a hierarchical cluster structure (e.g., disks, hosts, racks, rows, datacenters) The location of objects can be calculated based on the object identifier and cluster layout (similar to consistent hashing), thus there is no need for a metadata index or server for the RADOS object store

Ceph Architecture – CRUSH – 1/2

Ceph Architecture – CRUSH – 2/2

Ceph Architecture – librados nz.dlr.de/pages/stora ge2014/present/2.% 20Konferenztag/13_ 06_2014_06_Inktank.pdf

Ceph Architecture – RADOS Gateway nz.dlr.de/pages/stora ge2014/present/2.% 20Konferenztag/13_ 06_2014_06_Inktank.pdf

Ceph Architecture – RADOS Block Device (RBD) – 1/3

Ceph Architecture – RADOS Block Device (RBD) – 2/3 Virtual Machine storage using RDB Live Migration using RBD

Ceph Architecture – RADOS Block Device (RBD) – 3/3 Direct host access from Linux

Ceph Architecture – CephFS – POSIX F/S

Ceph – Read/Write Flows m/en- us/blogs/2015/04/06/ce ph-erasure-coding- introduction

Ceph Replicated I/O RedHat Ceph Architecture v1.2.3

Ceph – Erasure Coding – 1/5 Erasure Code is a theory started at 1960s. The most famous algorithm is the Reed-Solomon. Many variations came out, like the Fountain Codes, Pyramid Codes and Local Repairable Codes. Erasure Codes usually defines the number of total disks (N) and the number of data disks (K), and it can tolerate N – K failures with overhead of N/K E,g, a typical Reed Solomon scheme: (8, 5), where 8 is the total disks, 5 is the data disks. In this case, the data in disks would be like: RS (8, 5) can tolerate 3 arbitrary failures. If there’s some data chunks missing, then one could use the rest available data to restore the original content.

Ceph – Erasure Coding – 2/5 Like replicated pools, in an erasure-coded pool the primary OSD in the up set receives all write operations In replicated pools, Ceph makes a deep copy of each object in the placement group on the secondary OSD(s) in the set For erasure coding, the process is a bit different. An erasure coded pool stores each object as K+M chunks. It is divided into K data chunks and M coding chunks. The pool is configured to have a size of K+M so that each chunk is stored in an OSD in the acting set. The rank of the chunk is stored as an attribute of the object. The primary OSD is responsible for encoding the payload into K+M chunks and sends them to the other OSDs. It is also responsible for maintaining an authoritative version of the placement group logs.

Ceph – Erasure Coding – 3/5 5 OSDs (K+M=5); sustain loss of 2 (M=2) Object NYAN with data “ABCDEGHI” is split into 3 chunks; padded if length is not a multiple of K Coding blocks are YXY and QGC RedHat Ceph Architecture v1.2.3

Ceph – Erasure Coding – 4/5 On reading object NYAN from an erasure coded pool, decoding function retrieves chunks 1, 2, 3 and 4 If any two chunks are missing (ie an erasure is present), decoding function can reconstruct other chunks RedHat Ceph Architecture v1.2.3

Ceph – Erasure Coding – 4/5 5 OSDs (K+M=5); sustain loss of 2 (M=2) Object NYAN with data “ABCDEGHI” is split into 3 chunks; padded if length is not a multiple of K Coding blocks are YXY and QGC RedHat Ceph Architecture v1.2.3