Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon.

Slides:



Advertisements
Similar presentations
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Workshop on Basics & Hands on Kapil Bhosale M.Tech (CSE) Walchand College of Engineering, Sangli. (Worked on Hadoop in Tibco) 1.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
The Hadoop Distributed File System, by Dhyuba Borthakur and Related Work Presented by Mohit Goenka.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
The Hadoop Distributed File System
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Software Architecture
Emalayan Vairavanathan
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Hadoop & Condor Dhruba Borthakur Project Lead, Hadoop Distributed File System Presented at the The Israeli Association of Grid Technologies.
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.
Introduction to Hadoop Owen O’Malley Yahoo!, Grid Team
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
CLOUD COMPUTING FILE STORAGE SYSTEMS Sindhuja Venkatesh 21 Oct 2011 University at Buffalo CSE 726 Hot Topics in Cloud Computing.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Parallel IO for Cluster Computing Tran, Van Hoai.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
2011 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. GPFS-FPO: A Cluster File System for Big Data Analytics Prasenjit Sarkar.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
File Systems for Cloud Computing Chittaranjan Hota, PhD Faculty Incharge, Information Processing Division Birla Institute of Technology & Science-Pilani,
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Data Management with Google File System Pramod Bhatotia wp. mpi-sws
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
Software Systems Development
An Open Source Project Commonly Used for Processing Big Data Sets
Large-scale file systems and Map-Reduce
Introduction to MapReduce and Hadoop
Introduction to HDFS: Hadoop Distributed File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Hadoop Technopoints.
Overview of big data tools
CS 345A Data Mining MapReduce This presentation has been altered.
Distributed File Systems
THE GOOGLE FILE SYSTEM.
Presentation transcript:

Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University

In this work … Compare and contrast large storage system architectures Internet services High performance computing Can we use a parallel file system for Internet service applications? Hadoop, an Internet service software stack HDFS, an Internet service file system for Hadoop PVFS, a parallel file system Wittawat Tantisiriroj © February 09http://

Wittawat Tantisiriroj © February 09http:// Today’s Internet services Applications are becoming data-intensive Large input data set (e.g. the entire web) Distributed, parallel application execution Distributed file system is a key component Define new semantics for anticipated workloads –Atomic append in Google FS –Write-once in HDFS Commodity hardware and network –Handle failures through replication

The HPC world Equally large applications Large input data set (e.g. astronomy data) Parallel execution on large clusters Use parallel file systems for scalable I/O e.g. IBM’s GPFS, Sun’s Lustre FS, PanFS, and Parallel Virtual File System (PVFS) Wittawat Tantisiriroj © February 09http://

Why use parallel file systems? Handle a wide variety of workloads High concurrent reads and writes Small file support, scalable metadata Offer performance vs. reliability tradeoff RAID-5 (e.g., PanFS) Mirroring Failover (e.g., LustreFS) Standard Unix FS interface & POSIX semantics pNFS standard (NFS v4.1) Wittawat Tantisiriroj © February 09http://

Wittawat Tantisiriroj © February 09http:// Outline  A basic shim layer & preliminary evaluation Three add-on features in a shim layer Evaluation

HDFS & PVFS: high level design Meta-data servers Store all file system metadata Handle all metadata operations Data servers Store actual file system data Handle all read and write operations Files are divided into chunks Chunks of a file are distributed across servers Wittawat Tantisiriroj © February 09http://

PVFS shim layer under Hadoop Wittawat Tantisiriroj © February 09http://

Preliminary Evaluation Text search (“grep”) common workloads in Internet service applications Search for a rare pattern in 100-byte records 64GB data set 32 nodes Each node serves as storage and compute nodes Wittawat Tantisiriroj © February 09http://

Vanilla PVFS is disappointing … Wittawat Tantisiriroj © February 09http:// 2.5 times slower

Wittawat Tantisiriroj © February 09http:// Outline A basic shim layer & preliminary evaluation  Three add-on features in a shim layer Readahead buffer File layout information Replication Evaluation

Read operation in Hadoop Typical read workload: Small (less than 128 KB) Sequential through an entire chunk HDFS prefetches an entire chunk No cache coherence issue with its write-once semantic Wittawat Tantisiriroj © February 09http://

Readahead buffer PVFS has no client buffer cache Avoid a cache coherence issue with concurrent writes Readahead buffer can be added to PVFS shim layer In Hadoop, a file can become immutable after it is closed No need for cache coherence mechanism Wittawat Tantisiriroj © February 09http://

PVFS with 4MB buffer Wittawat Tantisiriroj © February 09http:// still quite slow

Wittawat Tantisiriroj © February 09http:// Outline A basic shim layer & preliminary evaluation  Three add-on features in a shim layer Readahead buffer File layout information Replication Evaluation

Collocation in Hadoop File layout information Describe where chunks are located Collocate computation and data Ship computation to where data is located Reduce network traffic Wittawat Tantisiriroj © February 09http://

Hadoop without collocation Wittawat Tantisiriroj © February 09http:// Node B Chunk 1 Node C Chunk 2 Node A Chunk 3 Chunk1Chunk2Chunk3 Computation Chunk1Chunk2Chunk3 Compute Node Storage Node 3 data transfers over network Chunk1Chunk2Chunk3

Hadoop with collocation Wittawat Tantisiriroj © February 09http:// Node B Chunk 1 Node C Chunk 2 Node A Chunk 3 Chunk1Chunk2Chunk3 Chunk1Chunk2Chunk3 Compute Node no data transfer over network Chunk1Chunk3Chunk2 Computation Storage Node

Expose file layout information File layout information in PVFS Stored as extended attributes Different format from Hadoop format A shim layer converts file layout information from PVFS format to Hadoop format Enable Hadoop to collocate computation and data Wittawat Tantisiriroj © February 09http://

PVFS with file layout information Wittawat Tantisiriroj © February 09http:// comparable performance

Wittawat Tantisiriroj © February 09http:// Outline A basic shim layer & preliminary evaluation  Three add-on features in a shim layer Readahead buffer File layout information Replication Evaluation

Replication in HDFS Rack-awareness replication By default, 3 copies for each file (triplication) 1.Write to a local storage node 2.Write to a storage node in the local rack 3.Write to a storage node in the other rack Wittawat Tantisiriroj © February 09http://

Replication in PVFS No replication in the public release of PVFS Rely on hardware based reliability solutions Per server RAID inside logical storage devices Replication can be added in a shim layer Write each file to three servers No reconstruction/recovery in the prototype Wittawat Tantisiriroj © February 09http://

PVFS with replication Wittawat Tantisiriroj © February 09http://

PVFS shim layer under Hadoop Wittawat Tantisiriroj © February 09http://

Wittawat Tantisiriroj © February 09http:// Outline A basic shim layer & preliminary evaluation Three add-on features in a shim layer  Evaluation Micro-benchmark (non MapReduce) MapReduce benchmark

Micro-benchmark Cluster configuration 16 nodes Pentium D dual-core 3.0GHz 4 GB Memory One 7200 rpm SATA 160 GB (8 MB buffer) Gigabit Ethernet Use file system API directly without Hadoop involvement Wittawat Tantisiriroj © February 09http://

N clients, each reads 1/N of single file Round-robin file layout in PVFS helps avoid contention Wittawat Tantisiriroj © February 09http://

Why is PVFS better in this case? Without scheduling, clients read in a uniform pattern Client1 reads A1 then A4 Client2 reads A2 then A5 Client3 reads A3 then A6 PVFS Round-robin placement HDFS Random placement Wittawat Tantisiriroj © February 09http:// A1 A3 A2 A5 A4 A6 A1 A4 A2 A5 A3 A6 Contention A1 A3 A2 A5 A4 A6 A1 A4 A2 A5 A3 A6

HDFS with Hadoop’s scheduling Example 1: Client1 reads A1 then A4 Client2 reads A2 then A5 Client3 reads A6 then A3 Example 2: Client1 reads A1 then A3 Client2 reads A2 then A5 Client3 reads A4 then A6 Wittawat Tantisiriroj © February 09http:// A1 A3 A2 A5 A4 A6 A1 A3 A2 A5 A4 A6 A1 A3 A2 A5 A4 A6 A1 A3 A2 A5 A4 A6

Read with Hadoop’s scheduling Hadoop’s scheduling can mask a problem with a non-uniform file layout in HDFS Wittawat Tantisiriroj © February 09http://

N clients write to n distinct files By writing one of three copies locally, HDFS write throughput grows linearly Wittawat Tantisiriroj © February 09http://

Concurrent writes to a single file By allowing concurrent writes in PVFS, “copy” completes faster by using multiple writers Wittawat Tantisiriroj © February 09http://

Wittawat Tantisiriroj © February 09http:// Outline A basic shim layer & preliminary evaluation Three add-on features in a shim layer  Evaluation Micro-benchmark (non MapReduce) MapReduce benchmark

MapReduce benchmark setting Yahoo! M45 cluster Use nodes Xeon quad-core 1.86 GHz with 6GB Memory One 7200 rpm SATA 750 GB (8 MB buffer) Gigabit Ethernet Use Hadoop framework for MapReduce processing Wittawat Tantisiriroj © February 09http://

MapReduce benchmark Grep: Search for a rare pattern in hundred million 100-byte records (100GB) Sort: Sort hundred million 100-byte records (100GB) Never-Ending Language Learning (NELL): (J. Betteridge, CMU) Count the numbers of selected phrases in 37GB data-set Wittawat Tantisiriroj © February 09http://

Read-Intensive Benchmark PVFS’s performance is similar to HDFS Wittawat Tantisiriroj © February 09http://

Write-Intensive Benchmark By writing one of three copies locally, HDFS does better than PVFS Wittawat Tantisiriroj © February 09http://

Summary PVFS can be tuned to deliver promising performance for Hadoop applications Simple shim layer in Hadoop No modification to PVFS PVFS can expose file layout information Enable Hadoop to collocate computation and data Hadoop application can benefit from concurrent writing supported by parallel file systems Wittawat Tantisiriroj © February 09http://

Acknowledgements Sam Lang and Rob Ross for help with PVFS internals Yahoo! for the M45 cluster Julio Lopez for help with M45 and Hadoop Justin Betteridge, Kevin Gimpel, Le Zhao, Jamie Callan, Shay Cohen, Noah Smith, U Kang and Christos Faloutsos for their scientific applications Wittawat Tantisiriroj © February 09http://