Percolator: Incrementally Indexing the Web OSDI’10.

Slides:



Advertisements
Similar presentations
Chen Zhang Hans De Sterck University of Waterloo
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Introduction to Data Center Computing Derek Murray October 2010.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
SkewTune: Mitigating Skew in MapReduce Applications
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Distributed Computations
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
Distributed Computations MapReduce
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By Daniel Peng and Frank Dabek Presented By Michael Over.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
NOVA: CONTINUOUS PIG/HADOOP WORKFLOWS. storage & processing scalable file system e.g. HDFS distributed sorting & hashing e.g. Map-Reduce dataflow programming.
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MapReduce M/R slides adapted from those of Jeff Dean’s.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania Bigtable and Percolator April 25, 2016.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Percolator Data Management in the Cloud
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Google Filesystem Some slides taken from Alan Sussman.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
SQL 2014 In-Memory OLTP What, Why, and How
Presentation transcript:

Percolator: Incrementally Indexing the Web OSDI’10

Outline Google Percolator Trigger based computing engine

Problem: Index the web times.com mit.edu g.cn fark.com nyt.com URLIn LinksBodyPageRank times.commit.edu... 1 mit.edutimes.com... 1 fark.comtimes.com... 3 g.cnfark.com, times.com indexing Output: Documents ready for serving Input: Raw documents

Duplicate Elimination with MapReduce Map Reduce Indexing system is a chain of many MapReduces Parse Document Cluster By Checksum Invert Links

Index Refresh with MapReduce How we index the new document? o New doc could be a dup of any previously crawled o Requires that we map over entire repository Map repository refresh

Indexing System Goals What do we want from an ideal indexing system? Large repository of documents(scalable, throughput) o Upper bound on index size o Higher-quality index: e.g. more links Small delay between crawl and index: "freshness" MapReduce indexing system: Days from crawl to index

Incremental Indexing Random-RW: Maintain a random-access repository in Bigtable Global view: Indices let us avoid a global scan Incremental computing: Incrementally mutate state as URLs are crawled URLContentsPagerankChecksumLanguage CFP,....60xabcdef01ENGLISH Lede...90xbeefcafeENGLISH

Incremental Indexing on Bigtable Checksum Canonical URL Checksum PageRank IsCanonical? nyt.com 0xabcdef01 6 yes 0xabcdef01 nyt.com nytimes.com 0xabcdef01 9 nytimes.com no New challenge: What happens if we process both URLs simultaneously? yes

Percolator: Incremental Infrastructure Built on Big table: free random-access (RW) Provide cross-row transaction – Big table only provide single-row transaction Snapshot isolation: timestamp oracle server Notification and Observer – Notification: find the change – Observer: execute transaction

Bigtable Background Bigtable is a sorted (row, column, timestamp) store: Column1Column2Column3 RowOne 3: 2: 1: "I'm at timestamp 1" 3: 2: "I'm at timestamp 2" 1: 3: 2: 1: RowTwo 3: 2: 1: 3: 2: 1: 3: 2: 1: Data is partitioned into row ranges called tablets Tablets spread across many machines

Cross row transaction Adds distributed transactions to Bigtable Simple API: Get(), Set(), Commit(), Iterate

Implementing Distributed Transactions Provides snapshot isolation semantics Multi-version protocol (mapped to Bigtable timestamps ) Two phase commit, coordinated by client Due to unable to handle BT updates Locks stored in special Bigtable columns: "balance" balance:databalance:commitbalance:lock Alice 5: 4: 3: $10 5: 4: 3 3: 5: 4: 3:

Transaction Commit balance:databalance:commit Alice 3: $10 4: 3 3: Bob 3: $10 4: 3 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 4: 3 3: 6: 5: lock 4: 3: Bob 4: 3: $10 4: 3 3: 6: 5: 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 4: 3 3: 6: 5: lock 4: 3: Bob 5: $5 4: 3: $10 4: 3 3: 6: 5: lock 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 6: 5 5: 4: 3 3: 5: 4: 3: Bob 5: $5 4: 3: $10 4: 3 3: 5: lock 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 6: 5 5: 4: 3 3: 5: 4: 3: Ben 5: $5 4: 3: $10 6: 5 5: 4: 3 3: 5: 4: 3: Transaction t; int a_bal = t.Get("Alice", "balance"); int b_bal = t.Get("Bob", "balance"); t.Set("Alice", "balance", a_bal + 5); t.Set("Bob", "balance", b_bal - 5); t.Commit();

Notifications and Observers Users register "observers" on a column: Executed when any row in that column is written Each observer runs in a new transaction Run at most once per write: "message collapsing" Applications are structured as a series of Observers: DocumentProcessor RawDocumentLoader DocumentProcessor DocumentExporter LinkInverter

Implementing Notifications Dirty column: set if observers must be run in that row Randomized distributed scan: Finds pending work, runs observers in thread pool Scan is efficient: only scans over bits themselves Dirty?balance:data... AliceYes 5: $15 BobNo 5: $5

Running Percolator Each machine runs: Worker binary linked with observer code. Bigtable tablet server GFS chunkserver Observer Code Percolator::RunWorker() Tablet Server... x N GFS

Optimizing an Incremental System Documents / day 0

Experiment How much Percolator will outperforms MapReduce? What’s the cost of transaction of Percolator? What about Percolator’s scalability?

Very Different Access Patterns Percolator: small, random disk I/O many RPCs per phase, per document MapReduce: streaming I/O Many documents per RPC, per phase Infrastructure is much better suited to the MR model. Even though it does "extra" work, it does so very efficiently.

MR v. Percolator: Performance

MR v. Percolator: Experience We converted an MR-based pipeline to Percolator. Pros: Freshness: indexing delay dropped from days to minutes Scalability: o More throughput: Just buy more CPUs o Bigger repository: Only limited by disk space Utilization: immune to stragglers Cons: Need to reason about concurrency More expensive per document processed (~2x)

The overhead of Percolator transaction operations

Percolator’s scalability

Conclusion Percolator now building the "Caffeine" websearch index 50% fresher results 3x larger repository Existence proof for distributed transactions at web scale.

Some Observations The strong consistency lead a high latency – Most app do not required: e.g., graph computing, iterative computing When client fail, locks behind in the system will block later transaction Notification need actively check data changes More a database than a computing Engine

Trigger based computing engine … Update Trigger Service In-memory distributed data structure SSD/Disk Persist State storage Pub/Sub Interface

Why trigger? Task can be scheduled Resource can be allocated as need

Example: Pagerank Update PageRank Trigger Service Page{ list out_edges; list in_edges; double rank; } … 1: M = getValues( in_edges); 2: new_r = compute(M) 3: update(new_r, rank) 4: if (|new_r-rank|>e ) 5: trigger(out_edges) add/delete links add/delete page

Batch system input Process task output MapReduce Dryad

Streaming System One pass processing – Unable adapting for further changes Output pool Available for use