Percolator: Incrementally Indexing the Web OSDI’10.

Percolator: Incrementally Indexing the Web Google @ OSDI’10

Outline Google Percolator Trigger based computing engine

Problem: Index the web times.com mit.edu g.cn fark.com nyt.com URLIn LinksBodyPageRank times.commit.edu... 1 mit.edutimes.com... 1 fark.comtimes.com... 3 g.cnfark.com, times.com.... 7 indexing Output: Documents ready for serving Input: Raw documents

Duplicate Elimination with MapReduce Map Reduce Indexing system is a chain of many MapReduces Parse Document Cluster By Checksum Invert Links

Index Refresh with MapReduce How we index the new document? o New doc could be a dup of any previously crawled o Requires that we map over entire repository Map repository refresh

Indexing System Goals What do we want from an ideal indexing system? Large repository of documents(scalable, throughput) o Upper bound on index size o Higher-quality index: e.g. more links Small delay between crawl and index: "freshness" MapReduce indexing system: Days from crawl to index

Incremental Indexing Random-RW: Maintain a random-access repository in Bigtable Global view: Indices let us avoid a global scan Incremental computing: Incrementally mutate state as URLs are crawled URLContentsPagerankChecksumLanguage http://usenix.org/osdi10 CFP,....60xabcdef01ENGLISH http://nyt.com/ Lede...90xbeefcafeENGLISH

Incremental Indexing on Bigtable Checksum Canonical URL Checksum PageRank IsCanonical? nyt.com 0xabcdef01 6 yes 0xabcdef01 nyt.com nytimes.com 0xabcdef01 9 nytimes.com no New challenge: What happens if we process both URLs simultaneously? yes

Percolator: Incremental Infrastructure Built on Big table: free random-access (RW) Provide cross-row transaction – Big table only provide single-row transaction Snapshot isolation: timestamp oracle server Notification and Observer – Notification: find the change – Observer: execute transaction

Bigtable Background Bigtable is a sorted (row, column, timestamp) store: Column1Column2Column3 RowOne 3: 2: 1: "I'm at timestamp 1" 3: 2: "I'm at timestamp 2" 1: 3: 2: 1: RowTwo 3: 2: 1: 3: 2: 1: 3: 2: 1: Data is partitioned into row ranges called tablets Tablets spread across many machines

Cross row transaction Adds distributed transactions to Bigtable Simple API: Get(), Set(), Commit(), Iterate

Implementing Distributed Transactions Provides snapshot isolation semantics Multi-version protocol (mapped to Bigtable timestamps ) Two phase commit, coordinated by client Due to unable to handle BT updates Locks stored in special Bigtable columns: "balance" balance:databalance:commitbalance:lock Alice 5: 4: 3: $10 5: 4: data @ 3 3: 5: 4: 3:

Transaction Commit balance:databalance:commit Alice 3: $10 4: data @ 3 3: Bob 3: $10 4: data @ 3 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 4: data @ 3 3: 6: 5: lock 4: 3: Bob 4: 3: $10 4: data @ 3 3: 6: 5: 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 4: data @ 3 3: 6: 5: lock 4: 3: Bob 5: $5 4: 3: $10 4: data @ 3 3: 6: 5: lock 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 6: data @ 5 5: 4: data @ 3 3: 5: 4: 3: Bob 5: $5 4: 3: $10 4: data @ 3 3: 5: lock 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 6: data @ 5 5: 4: data @ 3 3: 5: 4: 3: Ben 5: $5 4: 3: $10 6: data @ 5 5: 4: data @ 3 3: 5: 4: 3: Transaction t; int a_bal = t.Get("Alice", "balance"); int b_bal = t.Get("Bob", "balance"); t.Set("Alice", "balance", a_bal + 5); t.Set("Bob", "balance", b_bal - 5); t.Commit();

Notifications and Observers Users register "observers" on a column: Executed when any row in that column is written Each observer runs in a new transaction Run at most once per write: "message collapsing" Applications are structured as a series of Observers: DocumentProcessor RawDocumentLoader DocumentProcessor DocumentExporter LinkInverter

Implementing Notifications Dirty column: set if observers must be run in that row Randomized distributed scan: Finds pending work, runs observers in thread pool Scan is efficient: only scans over bits themselves Dirty?balance:data... AliceYes 5: $15 BobNo 5: $5

Running Percolator Each machine runs: Worker binary linked with observer code. Bigtable tablet server GFS chunkserver Observer Code Percolator::RunWorker() Tablet Server... x N GFS

Optimizing an Incremental System Documents / day 0

Experiment How much Percolator will outperforms MapReduce? What’s the cost of transaction of Percolator? What about Percolator’s scalability?

Very Different Access Patterns Percolator: small, random disk I/O many RPCs per phase, per document MapReduce: streaming I/O Many documents per RPC, per phase Infrastructure is much better suited to the MR model. Even though it does "extra" work, it does so very efficiently.

MR v. Percolator: Performance

MR v. Percolator: Experience We converted an MR-based pipeline to Percolator. Pros: Freshness: indexing delay dropped from days to minutes Scalability: o More throughput: Just buy more CPUs o Bigger repository: Only limited by disk space Utilization: immune to stragglers Cons: Need to reason about concurrency More expensive per document processed (~2x)

The overhead of Percolator transaction operations

Percolator’s scalability

Conclusion Percolator now building the "Caffeine" websearch index 50% fresher results 3x larger repository Existence proof for distributed transactions at web scale.

Some Observations The strong consistency lead a high latency – Most app do not required: e.g., graph computing, iterative computing When client fail, locks behind in the system will block later transaction Notification need actively check data changes More a database than a computing Engine

Trigger based computing engine … Update Trigger Service In-memory distributed data structure SSD/Disk Persist State storage Pub/Sub Interface

Why trigger? Task can be scheduled Resource can be allocated as need

Example: Pagerank Update PageRank Trigger Service Page{ list out_edges; list in_edges; double rank; } … 1: M = getValues( in_edges); 2: new_r = compute(M) 3: update(new_r, rank) 4: if (|new_r-rank|>e ) 5: trigger(out_edges) add/delete links add/delete page

Batch system input Process task output MapReduce Dryad

Streaming System One pass processing – Unable adapting for further changes Output pool Available for use

Percolator: Incrementally Indexing the Web OSDI’10.

Similar presentations

Presentation on theme: "Percolator: Incrementally Indexing the Web OSDI’10."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Percolator: Incrementally Indexing the Web OSDI’10.

Similar presentations

Presentation on theme: "Percolator: Incrementally Indexing the Web OSDI’10."— Presentation transcript:

Similar presentations

About project

Feedback