Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

Slides:



Advertisements
Similar presentations
Chen Zhang Hans De Sterck University of Waterloo
Advertisements

Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Introduction to Data Center Computing Derek Murray October 2010.
Omid Efficient Transaction Management and Incremental Processing for HBase Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution.
CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Presented By Alon Adler – Based on OSDI ’12 (USENIX Association)
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
Distributed storage for structured data
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By Daniel Peng and Frank Dabek Presented By Michael Over.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
A Web Crawler Design for Data Mining
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
Introduction to Hadoop and HDFS
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Performance Evaluation on Hadoop Hbase By Abhinav Gopisetty Manish Kantamneni.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Session 1 Module 1: Introduction to Data Integrity
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Cloudera Kudu Introduction
CS 540 Database Management Systems
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
An Introduction to Super-Scalability But first…
© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania Bigtable and Percolator April 25, 2016.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Percolator: Incrementally Indexing the Web OSDI’10.
Bigtable A Distributed Storage System for Structured Data.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Percolator Data Management in the Cloud
HBase Mohamed Eltabakh
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Chapter 15 QUERY EXECUTION.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Chapter 9: Database Systems
Presentation transcript:

Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 IDB Lab. Seminar Presented by Jee-bum Park

Outline  Introduction  Design –Bigtable overview –Transactions –Notifications  Evaluation  Conclusion  Good and Not So Good Things 2

Introduction  How can Google find the documents on the web so fast? 3

Introduction  Google uses an index, built by the indexing system, that can be used to answer search queries 4

Introduction  What does the indexing system do? –Crawling every page on the web –Parsing the documents –Extracting links –Clustering duplicates –Inverting links –Computing PageRank –... 5

Introduction  PageRank 6

Introduction  Compute PageRank using MapReduce  Job 1: compute R(1)  Job 2: compute R(2)  Job 3: compute R(3) ... 7 □□□□□□□□ R(t) =

Introduction  Now, consider how to update that index after recrawling some small portion of the web 8

Introduction  Now, consider how to update that index after recrawling some small portion of the web  Is it okay to run the MapReduces over just new pages? 9

Introduction  Now, consider how to update that index after recrawling some small portion of the web  Is it okay to run the MapReduces over just new pages?  Nope, there are links between the new pages and the rest of the web 10

Introduction  Now, consider how to update that index after recrawling some small portion of the web  Is it okay to run the MapReduces over just new pages?  Nope, there are links between the new pages and the rest of the web  Well, how about this? 11

Introduction  Now, consider how to update that index after recrawling some small portion of the web  Is it okay to run the MapReduces over just new pages?  Nope, there are links between the new pages and the rest of the web  Well, how about this?  MapReduces must be run again over the entire repository 12

Introduction  Google’s web search index was produced in this way –Running over the entire pages  It was not a critical issue, –Because given enough computing resources, MapReduce’s scalability makes this approach feasible  However, reprocessing the entire web –Discards the work done in earlier runs –Makes latency proportional to the size of the repository, rather than the size of an update 13

Introduction  An ideal data processing system for the task of maintaining the web search index would be optimized for incremental processing  Incremental processing system: Percolator 14

Outline  Introduction  Design –Bigtable overview –Transactions –Notifications  Evaluation  Conclusion  Good and Not So Good Things 15

Design  Percolator is built on top of the Bigtable distributed storage system  A Percolator system consists of three binaries that run on every machine in the cluster –A Percolator worker –A Bigtable tablet server –A GFS chunkserver  All observers (user applications) are linked into the Percolator worker 16

Design  Dependencies 17 ObserversPercolator workerBigtable tablet serverGFS chunkserver

Design  System architecture 18 Timestamp oracle service Lightweight lock service

Design  The Percolator worker –Scans the Bigtable for changed columns –Invokes the corresponding observers as a function call in the worker process  The observers –Perform transactions by sending read/write RPCs to Bigtable tablet servers 19 ObserversPercolator workerBigtable tablet serverGFS chunkserver

Design  The Percolator worker –Scans the Bigtable for changed columns –Invokes the corresponding observers as a function call in the worker process  The observers –Perform transactions by sending read/write RPCs to Bigtable tablet servers 20 ObserversPercolator workerBigtable tablet serverGFS chunkserver

Design  The Percolator worker –Scans the Bigtable for changed columns –Invokes the corresponding observers as a function call in the worker process  The observers –Perform transactions by sending read/write RPCs to Bigtable tablet servers 21 ObserversPercolator workerBigtable tablet serverGFS chunkserver

Design  The Percolator worker –Scans the Bigtable for changed columns –Invokes the corresponding observers as a function call in the worker process  The observers –Perform transactions by sending read/write RPCs to Bigtable tablet servers 22 ObserversPercolator workerBigtable tablet serverGFS chunkserver

Design  The timestamp oracle service –Provides strictly increasing timestamps  A property required for correct operation of the snapshot isolation protocol  The lightweight lock service –Workers use it to make the search for dirty notifications more efficient 23 Timestamp oracle service Lightweight lock service

Design  Percolator provides two main abstractions –Transactions  Cross-row, cross-table with ACID snapshot-isolation semantics –Observers  Similar to database triggers or events 24 TransactionsObserversPercolator

Design – Bigtable overview  Percolator is built on top of the Bigtable distributed storage system  Bigtable presents a multi-dimensional sorted map to users –Keys are (row, column, timestamp) tuples  Bigtable provides lookup, update operations, and transactions on individual rows  Bigtable does not provide multi-row transactions 25 Observers Percolator worker Bigtable tablet serverGFS chunkserver

Design – Transactions  Percolator provides cross-row, cross-table transactions with ACID snapshot-isolation semantics 26

Design – Transactions  Percolator stores multiple versions of each data item using Bigtable’s timestamp dimension –Multiple versions are required to provide snapshot isolation  Snapshot isolation 27

Design – Transactions  Case 1: use exclusive locks 28

Design – Transactions  Case 1: use exclusive locks 29

Design – Transactions  Case 1: use exclusive locks 30

Design – Transactions  Case 1: use exclusive locks 31

Design – Transactions  Case 1: use exclusive locks 32

Design – Transactions  Case 1: use exclusive locks 33

Design – Transactions  Case 2: do not use any locks 34

Design – Transactions  Case 2: do not use any locks 35

Design – Transactions  Case 2: do not use any locks 36

Design – Transactions  Case 2: do not use any locks 37

Design – Transactions  Case 2: do not use any locks 38

Design – Transactions  Case 2: do not use any locks 39

Design – Transactions  Case 2: do not use any locks 40

Design – Transactions  Case 3: use multiple versioning & timestamp 41

Design – Transactions  Case 3: use multiple versioning & timestamp 42

Design – Transactions  Case 3: use multiple versioning & timestamp 43

Design – Transactions  Case 3: use multiple versioning & timestamp 44

Design – Transactions  Case 3: use multiple versioning & timestamp 45

Design – Transactions  Case 3: use multiple versioning & timestamp 46

Design – Transactions  Case 3: use multiple versioning & timestamp 47

Design – Transactions  Case 3: use multiple versioning & timestamp 48

Design – Transactions  Case 3: use multiple versioning & timestamp 49

Design – Transactions  Case 3: use multiple versioning & timestamp 50

Design – Transactions  Case 3: use multiple versioning & timestamp 51

Design – Transactions  Percolator stores its locks in special in-memory columns in the same Bigtable 52

Design – Transactions  Percolator transaction demo 53

Design – Transactions  Percolator transaction demo 54

Design – Transactions  Percolator transaction demo 55

Design – Transactions  Percolator transaction demo 56

Design – Transactions  Percolator transaction demo 57

Design – Notifications  In Percolator, the user writes code (“observers”) to be triggered by changes to the table  Each observer registers a function and a set of columns  Percolator invokes the functions after data is written to one of those columns in any row 58 ObserversPercolator workerBigtable tablet serverGFS chunkserver

A Percolator application Design – Notifications  Percolator applications are structured as a series of observers –Each observer completes a task and creates more work for “downstream” observers by writing to the table 59 Observer 1 Observer 2 Observer 4 Observer 5 Observer 3 Observer 6

Google’s new indexing system Design – Notifications 60 Document Processor (parse, extract links, etc.) ClusteringExporter ObserversPercolator workerBigtable tablet serverGFS chunkserver

Design – Notifications  To implement notifications, Percolator needs to efficiently find dirty cells with observers that need to be run  To identify dirty cells, Percolator maintains a special “notify” Bigtable column, containing an entry for each dirty cell –When a transaction writes an observed cell, it also sets the corresponding notify cell 61

Design – Notifications  Each Percolator worker chooses a portion of the table to scan by picking a region of the table randomly –To avoid running observers on the same row concurrently, each worker acquires a lock from a lightweight lock service before scanning the row 62 Timestamp oracle service Lightweight lock service

Outline  Introduction  Design –Bigtable overview –Transactions –Notifications  Evaluation  Conclusion  Good and Not So Good Things 63

Evaluation  Experiences with converting a MapReduce-based indexing pipeline to use Percolator  Latency –100x faster than the previous system  Simplification –The number of observers in the new system: 10 –The number of MapReduces in the previous system: 100  Easier to operate –Far fewer moving parts: tablet servers, Percolator workers, chunkservers –In the old system, each of a hundred different MapReduces needed to be individually configured and could independently fail 64

Evaluation  Crawl rate benchmark on 240 machines 65

Evaluation  Versus Bigtable 66

Evaluation  Fault-tolerance 67

Outline  Introduction  Design –Bigtable overview –Transactions –Notifications  Evaluation  Conclusion  Good and Not So Good Things 68

Conclusion  Percolator provides two main abstractions –Transactions  Cross-row, cross-table with ACID snapshot-isolation semantics –Observers  Similar to database triggers or events 69 TransactionsObserversPercolator

Outline  Introduction  Design –Bigtable overview –Transactions –Notifications  Evaluation  Conclusion  Good and Not So Good Things 70

Good and Not So Good Things  Good things –Simple and neat design –Purpose of use is clear –Detailed description based on real example: Google’s indexing system  Not so good things –Lack of observer examples (Google’s indexing system in particular) 71

Thank You! Any Questions or Comments?