Presentation is loading. Please wait.

Presentation is loading. Please wait.

A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.

Similar presentations


Presentation on theme: "A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah."— Presentation transcript:

1 A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing – SICS; Yahoo! Developer Network MapReduce Tutorial

2 EXTRA MATERIAL

3 CEPH – A HDFS replacement

4 What is Ceph? Ceph is a distributed, highly available unified object, block and file storage system with no SPOF running on commodity hardware

5 Ceph Architecture – Host Level At the host level… We have Object Storage Devices (OSDs) and Monitors Monitors keep track of the components of the Ceph cluster (i.e. where the OSDs are) The device, host, rack, row, and room are stored by the Monitors and used to compute a failure domain OSDs store the Ceph data objects A host can run multiple OSDs, but it needs to be appropriately provisioned http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

6 Ceph Architecture – Block Level At the block device level... Object Storage Device (OSD) can be an entire drive, a partition, or a folder OSDs must be formatted in ext4, XFS, or btrfs (experimental). https://hkg15.pathable.com/static/attachments/112267/1423597913.pdf?1423597913

7 Ceph Architecture – Data Organization Level At the data organization level… Data are partitioned into pools Pools contain a number of Placement Groups (PGs) Ceph data objects map to PGs (via a modulo of hash of name) PGs then map to multiple OSDs. https://hkg15.pathable.com/static/attachments/112267/1423597913.pdf?1423597913

8 Ceph Placement Groups Ceph shards a pool into placement groups distributed evenly and pseudo-randomly across the cluster The CRUSH algorithm assigns each object to a placement group, and assigns each placement group to a set of OSDs—creating a layer of indirection between the Ceph client and the OSDs storing the copies of an object The CRUSH algorithm dynamically assigns each object to a placement group and then assigns each placement group to a set of Ceph OSDs This layer of indirection allows the Ceph storage cluster to re-balance dynamically when new Ceph OSD come online or when Ceph OSDs fail RedHat Ceph Architecture v1.2.3

9 Ceph Architecture – Overall View https://www.terena.org/acti vities/tf- storage/ws16/slides/14021 0-low_cost_storage_ceph- openstack_swift.pdf

10 Ceph Architecture – RADOS An Application interacts with a RADOS cluster RADOS (Reliable Autonomic Distributed Object Store) is a distributed object service that manages the distribution, replication, and migration of objects On top of that reliable storage abstraction Ceph builds a range of services, including a block storage abstraction (RBD, or Rados Block Device) and a cache-coherent distributed file system (CephFS). http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

11 Ceph Architecture – RADOS Components http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

12 Ceph Architecture – Where Do Objects Live? http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

13 Ceph Architecture – Where Do Objects Live? Contact a Metadata server? http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

14 Ceph Architecture – Where Do Objects Live? Or calculate the placement via static mapping? http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

15 Ceph Architecture – CRUSH Maps http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

16 Ceph Architecture – CRUSH Maps Data objects are distributed across Object Storage Devices (OSD), which refers to either physical or logical storage units, using CRUSH (Controlled Replication Under Scalable Hashing) CRUSH is a deterministic hashing function that allows administrators to define flexible placement policies over a hierarchical cluster structure (e.g., disks, hosts, racks, rows, datacenters) The location of objects can be calculated based on the object identifier and cluster layout (similar to consistent hashing), thus there is no need for a metadata index or server for the RADOS object store http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

17 Ceph Architecture – CRUSH – 1/2 http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

18 Ceph Architecture – CRUSH – 2/2 http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

19 Ceph Architecture – librados http://konferenz- nz.dlr.de/pages/stora ge2014/present/2.% 20Konferenztag/13_ 06_2014_06_Inktank.pdf

20 Ceph Architecture – RADOS Gateway http://konferenz- nz.dlr.de/pages/stora ge2014/present/2.% 20Konferenztag/13_ 06_2014_06_Inktank.pdf

21 Ceph Architecture – RADOS Block Device (RBD) – 1/3 http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

22 Ceph Architecture – RADOS Block Device (RBD) – 2/3 Virtual Machine storage using RDB Live Migration using RBD http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

23 Ceph Architecture – RADOS Block Device (RBD) – 3/3 Direct host access from Linux http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

24 Ceph Architecture – CephFS – POSIX F/S http://konferenz-nz.dlr.de/pages/storage2014/present/2.%20Konferenztag/13_06_2014_06_Inktank.pdf

25 Ceph – Read/Write Flows https://software.intel.co m/en- us/blogs/2015/04/06/ce ph-erasure-coding- introduction

26 Ceph Replicated I/O RedHat Ceph Architecture v1.2.3

27 Ceph – Erasure Coding – 1/5 Erasure Code is a theory started at 1960s. The most famous algorithm is the Reed-Solomon. Many variations came out, like the Fountain Codes, Pyramid Codes and Local Repairable Codes. Erasure Codes usually defines the number of total disks (N) and the number of data disks (K), and it can tolerate N – K failures with overhead of N/K E,g, a typical Reed Solomon scheme: (8, 5), where 8 is the total disks, 5 is the data disks. In this case, the data in disks would be like: RS (8, 5) can tolerate 3 arbitrary failures. If there’s some data chunks missing, then one could use the rest available data to restore the original content. https://software.intel.com/en-us/blogs/2015/04/06/ceph-erasure-coding-introduction

28 Ceph – Erasure Coding – 2/5 Like replicated pools, in an erasure-coded pool the primary OSD in the up set receives all write operations In replicated pools, Ceph makes a deep copy of each object in the placement group on the secondary OSD(s) in the set For erasure coding, the process is a bit different. An erasure coded pool stores each object as K+M chunks. It is divided into K data chunks and M coding chunks. The pool is configured to have a size of K+M so that each chunk is stored in an OSD in the acting set. The rank of the chunk is stored as an attribute of the object. The primary OSD is responsible for encoding the payload into K+M chunks and sends them to the other OSDs. It is also responsible for maintaining an authoritative version of the placement group logs. https://software.intel.com/en-us/blogs/2015/04/06/ceph-erasure-coding-introduction

29 Ceph – Erasure Coding – 3/5 5 OSDs (K+M=5); sustain loss of 2 (M=2) Object NYAN with data “ABCDEGHI” is split into 3 chunks; padded if length is not a multiple of K Coding blocks are YXY and QGC RedHat Ceph Architecture v1.2.3

30 Ceph – Erasure Coding – 4/5 On reading object NYAN from an erasure coded pool, decoding function retrieves chunks 1, 2, 3 and 4 If any two chunks are missing (ie an erasure is present), decoding function can reconstruct other chunks RedHat Ceph Architecture v1.2.3

31 Ceph – Erasure Coding – 4/5 5 OSDs (K+M=5); sustain loss of 2 (M=2) Object NYAN with data “ABCDEGHI” is split into 3 chunks; padded if length is not a multiple of K Coding blocks are YXY and QGC RedHat Ceph Architecture v1.2.3


Download ppt "A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah."

Similar presentations


Ads by Google