Part III BigData Analysis Tools (Dremel) Yuan Xue Part of the Slides made by the authors.

Slides:



Advertisements
Similar presentations
Dremel: Interactive Analysis of Web-Scale Datasets
Advertisements

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Parallel Computing MapReduce Examples Parallel Efficiency Assignment
Resource Management with YARN: YARN Past, Present and Future
Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.
Distributed Computations
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Processing Data using Amazon Elastic MapReduce and Apache Hive Team Members Frank Paladino Aravind Yeluripiti.
MapReduce.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis (Google) VLDB 2010 Presented by Raghav Ayyamani.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
Cloud Computing Other High-level parallel processing languages Keke Chen.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Introduction to Hadoop and HDFS
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Brigham Young University February 17, 2011 Matt Tolton Dremel: Interactive Analysis of Web-Scale Datasets.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
MapReduce and Data Management Based on slides from Jimmy Lin’s lecture slides ( (licensed.
What is Big Query?.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Big Data Yuan Xue CS 292 Special topics on.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
2013: year of real-time access to Big Data?
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Percolator: Incrementally Indexing the Web OSDI’10.
Microsoft Ignite /28/2017 6:07 PM
Image taken from: slideshare
Hadoop Aakash Kag What Why How 1.
Hadoop.
Software Systems Development
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Map Reduce.
Introduction to MapReduce and Hadoop
Powering real-time analytics on Xfinity using Kudu
Hadoop EcoSystem B.Ramamurthy.
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
MapReduce Simplied Data Processing on Large Clusters
1 Demand of your DB is changing Presented By: Ashwani Kumar
CS6604 Digital Libraries IDEAL Webpages Presented by
湖南大学-信息科学与工程学院-计算机与科学系
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Cse 344 May 2nd – Map/reduce.
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Introduction to Apache
Overview of big data tools
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Part III BigData Analysis Tools (Dremel) Yuan Xue Part of the Slides made by the authors

Overview ► Big Query is google's database service based on the Dremel. ► Big Query is hosted by Google. ► Impala and Drill is open source database inspired by the Dremel paper. ► Impala is part of the Cloudera Hadoop distribution. ► Drill is part of the MapR Hadoop distribution. Dremel BigQuery (Google hosted) Impala (Cloudera Release) Drill (MapR Release)

Motivation ► Google faced MapReduce main problem – latency. ► The problem was propagated to engines on top of MapReduce also. ► Google first approached it by developing real time query capability for big data. Speed matters

Philosophy Dremel's Philosophy ► Lets do SQL subset which do have fast and scalable implementation ► It is somewhat similar to other NoSQLs – we do what we can do VERY FAST and scalable. The rest – application problem.  Main idea is to harness huge cluster of machines for the single query

Why call it Dremel  Brand of power tools that primarily rely on their speed as opposed to torque  Data analysis tool that uses speed instead of raw power 5

Widely used inside Google 6  Analysis of crawled web documents  Tracking install data for applications on Android Market  Crash reporting for Google products  OCR results from Google Books  Spam analysis  Debugging of map tiles on Google Maps  Tablet migrations in managed Bigtable instances  Results of tests run on Google's distributed build system  Disk I/O statistics for hundreds of thousands of disks  Resource monitoring for jobs run in Google's data centers  Symbols and dependencies in Google's codebase

Dremel vs. MapReduce ► Dremel is not replacement for the MapReduce or Tenzing but complements it. (Tenzing is Google's Hive) ► Analyst can make many fast queries using Dremel ► After getting good idea what is needed – run slow MapReduce (or SQL based on MapReduce) to get precise results

Example Usage: data exploration 8 Runs a MapReduce to extract billions of signals from web pages Googler Alice DEFINE TABLE t AS /path/to/data/* SELECT TOP(signal, 100), COUNT(*) FROM t... More MR-based processing on her data (FlumeJava [PLDI'10], Sawzall [Sci.Pr.'05] ) Ad hoc SQL against Dremel

Dremel system  Trillion-record, multi-terabyte datasets at interactive speed  Scales to thousands of nodes  Fault and straggler tolerant execution  Nested data model  Complex datasets; normalization is prohibitive  Columnar storage and processing  Tree architecture (as in web search)  Interoperates with Google's data mgmt tools  In situ data access (e.g., GFS, Bigtable)  MapReduce pipelines 9

Impala HDFS Map Reduce Hive and Pig Impala

Details of Dremel Design  Nested columnar storage  Query processing  Experiments  Observations 11

Records vs. columns 12 A B CD E * * *... r1r1 r2r2 r1r1 r2r2 r1r1 r2r2 r1r1 r2r2 Challenge: preserve structure, reconstruct from a subset of fields Read less, cheaper decompression DocId: 10 Links Forward: 20 Name Language Code: 'en-us' Country: 'us' Url: ' Name Url: ' r1r1

Nested data model 13 message Document { required int64 DocId; [1,1] optional group Links { repeated int64 Backward; [0,*] repeated int64 Forward; } repeated group Name { repeated group Language { required string Code; optional string Country; [0,1] } optional string Url; } } DocId: 10 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us' Country: 'us' Language Code: 'en' Url: ' Name Url: ' Name Language Code: 'en-gb' Country: 'gb' r1r1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: ' r2r2 multiplicity:

Column-striped representation valuerd valuerd valuerd NULL DocId valuerd NULL11 Name.Url valuerd en-us02 en22 NULL11 en-gb12 NULL01 Name.Language.CodeName.Language.Country Links.BackwardLinks.Forward valuerd us03 NULL22 11 gb13 NULL01

Repetition and definition levels 15 DocId: 10 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us' Country: 'us' Language Code: 'en' Url: ' Name Url: ' Name Language Code: 'en-gb' Country: 'gb' r1r1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: ' r2r2 valuerd en-us02 en22 NULL11 en-gb12 NULL01 Name.Language.Code r: At what repeated field in the field's path the value has repeated d: How many fields in paths that could be undefined (opt. or rep.) are actually present record (r=0) has repeated r=2r=1 Language (r=2) has repeated (non-repeating)

Record assembly FSM 16 Name.Language.CountryName.Language.Code Links.BackwardLinks.Forward Name.Url DocId ,1,2 2 0, For record-oriented data processing (e.g., MapReduce) Transitions labeled with repetition levels

Reading two fields 17 DocId Name.Language.Country 1,2 0 0 DocId: 10 Name Language Country: 'us' LanguageName Country: 'gb' DocId: 20 Name s1s1 s2s2 Structure of parent fields is preserved. Useful for queries like /Name[3]/Language[1]/Country

Outline  Nested columnar storage  Query processing  Experiments  Observations 18

Query processing  Optimized for select-project-aggregate  Very common class of interactive queries  Single scan  Within-record and cross-record aggregation 19

SQL dialect for nested data 20 Id: 10 Name Cnt: 2 Language Str: ' Str: ' Name Cnt: 0 t1t1 SELECT DocId AS Id, COUNT(Name.Language.Code) WITHIN Name AS Cnt, Name.Url + ',' + Name.Language.Code AS Str FROM t WHERE REGEXP(Name.Url, '^http') AND DocId < 20; message QueryResult { required int64 Id; repeated group Name { optional uint64 Cnt; repeated group Language { optional string Str; } } } Output table Output schema

Serving tree 21 storage layer (e.g., GFS)... leaf servers (with local storage) intermediate servers root server client Parallelizes scheduling and aggregation Fault tolerance Stragglers Designed for "small" results (<1M records) [Dean WSDM'09] histogram of response times

Example: count() 22 SELECT A, COUNT(B) FROM T GROUP BY A T = {/gfs/1, /gfs/2, …, /gfs/100000} SELECT A, SUM(c) FROM (R 1 1 UNION ALL R 1 10) GROUP BY A SELECT A, COUNT(B) AS c FROM T 1 1 GROUP BY A T 1 1 = {/gfs/1, …, /gfs/10000} SELECT A, COUNT(B) AS c FROM T 1 2 GROUP BY A T 1 2 = {/gfs/10001, …, /gfs/20000} SELECT A, COUNT(B) AS c FROM T 3 1 GROUP BY A T 3 1 = {/gfs/1} R11R11R12R12 Data access ops...

Outline  Nested columnar storage  Query processing  Experiments  Observations 23

Experiments Table name Number of records Size (unrepl., compressed) Number of fields Data center Repl. factor T185 billion87 TB270A3× T224 billion13 TB530A3× T34 billion70 TB1200A3× T41+ trillion105 TB50B3× T51+ trillion20 TB30B2× 24 1 PB of real data (uncompressed, non-replicated) 100K-800K tablets per table Experiments run during business hours

Read from disk 25 columns records objects from records from columns (a) read + decompress (b) assemble records (c) parse as C++ objects (d) read + decompress (e) parse as C++ objects time (sec) number of fields "cold" time on local disk, averaged over 30 runs Table partition: 375 MB (compressed), 300K rows, 125 columns 2-4x overhead of using records 10x speedup using columnar storage

MR and Dremel execution 26 Sawzall program ran on MR: num_recs: table sum of int; num_words: table sum of int; emit num_recs <- 1; emit num_words <- count_words(input.txtField); execution time (sec) on 3000 nodes SELECT SUM(count_words(txtField)) / COUNT(*) FROM T1 Q1: 87 TB0.5 TB MR overheads: launch jobs, schedule 0.5M tasks, assemble records Avg # of terms in txtField in 85 billion record table T1

Impact of serving tree depth 27 execution time (sec) SELECT country, SUM(item.amount) FROM T2 GROUP BY country SELECT domain, SUM(item.amount) FROM T2 WHERE domain CONTAINS ’.net’ GROUP BY domain Q2: Q3: 40 billion nested items (returns 100s of records)(returns 1M records)

Scalability 28 execution time (sec) number of leaf servers SELECT TOP(aid, 20), COUNT(*) FROM T4 Q5 on a trillion-row table T4:

Outline  Nested columnar storage  Query processing  Experiments  Observations 29

Interactive speed 30 execution time (sec) percentage of queries Most queries complete under 10 sec Monthly query workload of one 3000-node Dremel instance

Observations  Possible to analyze large disk-resident datasets interactively on commodity hardware  1T records, 1000s of nodes  MR can benefit from columnar storage just like a parallel DBMS  But record assembly is expensive  Interactive SQL and MR can be complementary  Parallel DBMSes may benefit from serving tree architecture just like search engines 31

BigQuery: powered by Dremel Upload 2. Process Upload your data to Google Storage Import to tables Run queries 3. Act Your Data BigQuery Your Apps

Impala architecture