A Comparison of Join Algorithms for Log Processing in MapReduce Spyros Blanas, Jignesh M. Patel(University of Wisconsin-Madison) Eugene J. Shekita, Yuanyuan.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
SDN + Storage.
Overview of MapReduce and Hadoop
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Parallel Computing MapReduce Examples Parallel Efficiency Assignment
Spark: Cluster Computing with Working Sets
Clydesdale: Structured Data Processing on MapReduce Jackie.
A Comparison of Join Algorithms for Log Processing in MapReduce Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, Yuanyuan Tian.
Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud Computing 2012.
Distributed Computations
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Distributed Computations MapReduce
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
1 A Comparison of Approaches to Large-Scale Data Analysis Pavlo, Paulson, Rasin, Abadi, DeWitt, Madden, Stonebraker, SIGMOD’09 Shimin Chen Big data reading.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
HADOOP ADMIN: Session -2
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君 陳威遠 洪浩哲.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Introduction to Hadoop and HDFS
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
MapReduce and Data Management Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
A Comparison of Join Algorithms for Log Processing in MapReduce SIGMOD 2010 Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita,
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
CS 440 Database Management Systems Parallel DB & Map/Reduce Some slides due to Kevin Chang 1.
Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.
1 VLDB, Background What is important for the user.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Presented by: Omar Alqahtani Fall 2016
Large-scale file systems and Map-Reduce
Introduction to MapReduce and Hadoop
Hadoop Clusters Tess Fulkerson.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Cse 344 May 2nd – Map/reduce.
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Map Reduce, Types, Formats and Features
Presentation transcript:

A Comparison of Join Algorithms for Log Processing in MapReduce Spyros Blanas, Jignesh M. Patel(University of Wisconsin-Madison) Eugene J. Shekita, Yuanyuan Tian(IBM Almaden Research Center) SIGMOD 2010 August 1, 2010 Presented by Hyojin Song

Contents  Introduction  Join Algorithms In MapReduce  Experimental Evaluation  Discussion  Conclusion 2 / 30

Introduction(1/3)  Log Processing –Important type of data analysis commonly done with MapReduce –A log of events  click-stream  log of phone call records  a sequence of transactions –To compute various statistics for business insight  filtered  aggregated  mined for patterns –Often needs to be join  Log data and Reference data(user information) 3 / 30 Log Table Call recordsNumber : : : : : …… Reference Table NumberName 송효진 안철수 한효주 안인석 마음이 ……

Introduction(2/3)  MapReduce Framework –Used to analyze large volumes of data –The success of MapReduce  Simple programming framework  To manage parallelization, fault tolerance, and load balancing –The critics of MapReduce  lack of a schema  lack of a declarative query language  lack of indexes –Difficult for joins  Not originally designed to combine information from several data sources  To use simple but inefficient algorithms to perform joins 4 / 30

Introduction(3/3)  The benefits of MapReduce for log processing –Scalability  China Mobile gathers 5-8TB of phone call records per day  Facebook collect almost 6TB of new log data everyday with totally 1.7PB –Schema free  flexibility  a log record may also change over time –Simple scans preferable ( index scans) –Time consuming work  gracefully fault tolerance support ( parallel RDBMS) 5 / 30  The goal of this paper –the implementation of several well-known join strategies in MapReduce –comprehensive experiments to compare these join techniques

Contents  Introduction  Join Algorithms In MapReduce  Experimental Evaluation  Discussion  Conclusion 6 / 30 Problem Statement 1.Repartition Join 2.Improved Repartition Join 3.Directed Join 4.Broadcast Join 5.Semi-Join 6.Per-split Semi-Join

Join Algorithms in MR Problem Statement  An equi-join between a log table L and a reference table R on single column, with |L| >> |R| 7 / 30  To propose further improving its performance with some preprocessing techniques –Well-known in the RDBMS literature –Adapting them to MapReduce is not always straightforward –Crucial implementation details of these join algorithms  To implement two additional functions: init() and close() –These are called before and after each map or reduce task

Join Algorithms in MR 1. Repartition Join  The most commonly used join strategy in the MapReduce framework –L and R are dynamically partitioned on the join key –The corresponding pairs of partitions are joined –Similar to partitioned sort-merge join in the parallel RDBMS 8 / 30 Log Table logStudent ID DB B KRR A Opt A ML C OS A NL D …… User Table Student IDName Ahn Jaemin Kim Somin Song Hyojin Lee taewhi An Inseok ……  Example Tables(Log table & User table) –Log table  500,000 records  Log has a lecture name and degree –User table  10,000 records –Join key is the student ID

Join Algorithms in MR 1. Repartition Join 9 / 30 Song An ……. A split of R or L (Distributed File System) DB B KRR A NL D ML C OPT A Map Phase Reduce Phase L: DB B R L L L: KRR A R: An L: NL D L: ML C R: Song L: OPT A Local disk Intermediate results L: NL D L: KRR A R: An R: Song L: OPT A L: DB B L: ML C Buffer

Join Algorithms in MR 1. Repartition Join 10 / 30 Output File (Distributed File System) Reduce Phase Student IDNameLog An In SeokKRR A Song Hyo JinML C BLBL L: NL D L: KR A BRBR R: An BRBR R: Song BLBL L: OPT A L: DB B L: ML C Buffer L: DB B L: KRR A R: An L: NL D L: ML C R: Song L: OPT A Local disk

Join Algorithms in MR 1. Repartition Join  Standard Repartition Join –Potential problem  all records have to be buffered. –May not fit in memory  The data is highly skewed  The key cardinality is small –Variants of the standard repartition join are used in Pig, Hive, and Jaql today.  They all suffer from the buffering problem 11 / 30  Improved Repartition Join –The output key is changed to a composite of the join key and the table tag –The partitioning & grouping function is customized –Records from the smaller table R are buffered and L records are streamed to generate the join output

Join Algorithms in MR 2. Improved Repartition Join 12 / 30 Song An ……. A split of R or L (Distributed File System) DB B KRR A NL D ML C OPT A Map Phase Reduce Phase LL: DB B R L L LL: KRR A RR: An LL: NL D LL: ML C RR: Song LL: OPT A Local disk Intermediate results LL: NL D LL: KRR A RR: An RR: Song LL: OPT A LL: DB B LL: ML C Buffer

Join Algorithms in MR 2. Improved Repartition Join 13 / 30 Output File (Distributed File System) Reduce Phase Student IDNameLog An In SeokKRR A Song Hyo JinML C L records are streamed BRBR R: An BRBR R: Song Buffer LL: DB B LL: KRR A RR: An LL: NL D LL: ML C RR: Song LL: OPT A Local disk L records are streamed

Join Algorithms in MR 3. Directed Join  Preprocessing for Repartition Join (Directed Join) –Both L and R have already been partitioned on the join key  Pre-partitioning L on the join key  Then at query time, matching partitions from L and R can be directly joined –A map-only MapReduce job.  During the init phase, R i is retrieved from the DFS  To use a main memory hash table, if it’s not already in local storage 14 / 30

Join Algorithms in MR 4. Broadcast Join  Broadcast Join –In most applications, |R| << |L| –Instead of moving both R and L across the network, –To broadcast the smaller table R to avoids the network overhead –A map-only job –Each map task uses a main-memory hash table for either L or R 15 / 30

Join Algorithms in MR 4. Broadcast Join  Broadcast Join –If R < a split of L  To build the hash table on R –If R > a split of L  To build the hash table on a split of L 16 / 30  Preprocessing for Broadcast Join –Most nodes in the cluster have a local copy of R in advance –To avoid retrieving R from the DFS in its init() function

Join Algorithms in MR 5. Semi-Join  Semi-Join –Some applications, |R| << |L|  In Facebook, user table has hundreds of millions of records  A few million unique active users per hour –To avoid sending the records in R over the network that will not join with L  Preprocessing for Semi-Join –First two phases of semi-join can preprocess 17 / 30

Join Algorithms in MR 6. Per-Split Semi-Join  Per-Split Semi-Join –The problem of Semi-join : All records of extracted R will not join L i –L i can be joined with R i directly  Preprocessing for Per-split Semi-join –Also benefit from moving its first two phases 18 / 30

Contents  Introduction  Join Algorithms In MapReduce  Experimental Evaluation  Discussion  Conclusion 19 / 30 1.Environment 2.Datasets 3.MapReduce Time Breakdown 4.Experimental Results

Experimental Evaluation 1. Environment  System Specification –All experiments run on a 100-node cluster –Single 2.4GHz Intel Core 2 Duo processor –4GB of DRAM and two SATA disks –Red Hat Enterprise Server 5.2 running Linux / 30  Network Specification –The 100 nodes were spread across two racks –Each node can execute two map and two reduce tasks concurrently –Each rack had its own gigabit Ethernet switch –The rack level bandwidth is 32Gb/s –Under full load, 35MB/s cross-rack node-to-node bandwidth  version , HDFS (128MB block size)

Experimental Evaluation 2. Datasets  Datasets 21 / 30 Event Log (L)User Info (R) Join column size10 bytes5 bytes Record size100bytes (average)100 bytes (exactly) Total size500GB10MB~100GB Join result is a 10 bytes join key n-to-1 join many users are inactive All the records in L always appear in the result To fix the fraction of R that was referenced by L to be 0.1%, 1%, or 10% To simulate some active users, a Zipf distribution was used

Experimental Evaluation 3. MapReduce Time Breakdown 22 / 30

Experimental Evaluation 3. MapReduce Time Breakdown  MapReduce Time Breakdown –What transpires during the execution of a MapReduce job –The overhead of various execution components of MapReduce –System Environment  The standard repartition join algorithm  500GB log table and 30MB reference table  1% actually referenced by the log records  4000 map tasks and 200 reduce tasks  A node was assigned 40 map and 2 reduce tasks 23 / 30

Experimental Evaluation 3. MapReduce Time Breakdown  Interesting Observations on MapReduce –The map phase was clearly CPU-bound –The reduce phase was limited by the network bandwidth  Writing the three copies of the join result to HDFS –The disk and the network activities were moderate and periodic during map phase  The peaks were related to the output generation in the map task  The shuffle phase in the reduce task –Almost idle for about 30 seconds between the 9 min and 10 min mark  Waiting for the slowest map task –By enabling independent and concurrent map tasks, almost all CPU, disk and network activities can be overlapped 24 / 30

Experimental Evaluation 4. Experimental Results 25 / 30 ▣ No preprocessing ▣ preprocessing

Experimental Evaluation 4. Experimental Results 26 / 30

Contents  Introduction  Join Algorithms In MapReduce  Experimental Evaluation  Discussion  Conclusion 27 / 30

Discussion  Choosing the Right Strategy –To determine what is the right join strategy for a given circumstance –To provide an important first step for query optimization 28 / 30

Contents  Introduction  Join Algorithms In MapReduce  Experimental Evaluation  Discussion  Conclusion 29 / 30

Conclusion  Joining log data with reference data in MapReduce has emerged as an important part –Analytic operations for enterprise customers –Web 2.0 companies 30 / 30  To design a series of join algorithms on top of MapReduce –Without requiring any modification to the actual framework –To propose many details for efficient implementation  Two additional function: Init(), close()  Practical preprocessing techniques  Future work –Multi-way joins –Indexing methods to speedup join queries –Optimization module (selecting appropriate join algorithms) –New programming models to extend the MapReduce framework