Presentation is loading. Please wait.

Presentation is loading. Please wait.

Percolator: Incrementally Indexing the Web OSDI’10.

Similar presentations


Presentation on theme: "Percolator: Incrementally Indexing the Web OSDI’10."— Presentation transcript:

1 Percolator: Incrementally Indexing the Web Google @ OSDI’10

2 Outline Google Percolator Trigger based computing engine

3 Problem: Index the web times.com mit.edu g.cn fark.com nyt.com URLIn LinksBodyPageRank times.commit.edu... 1 mit.edutimes.com... 1 fark.comtimes.com... 3 g.cnfark.com, times.com.... 7 indexing Output: Documents ready for serving Input: Raw documents

4 Duplicate Elimination with MapReduce Map Reduce Indexing system is a chain of many MapReduces Parse Document Cluster By Checksum Invert Links

5 Index Refresh with MapReduce How we index the new document? o New doc could be a dup of any previously crawled o Requires that we map over entire repository Map repository refresh

6 Indexing System Goals What do we want from an ideal indexing system? Large repository of documents(scalable, throughput) o Upper bound on index size o Higher-quality index: e.g. more links Small delay between crawl and index: "freshness" MapReduce indexing system: Days from crawl to index

7 Incremental Indexing Random-RW: Maintain a random-access repository in Bigtable Global view: Indices let us avoid a global scan Incremental computing: Incrementally mutate state as URLs are crawled URLContentsPagerankChecksumLanguage http://usenix.org/osdi10 CFP,....60xabcdef01ENGLISH http://nyt.com/ Lede...90xbeefcafeENGLISH

8 Incremental Indexing on Bigtable Checksum Canonical URL Checksum PageRank IsCanonical? nyt.com 0xabcdef01 6 yes 0xabcdef01 nyt.com nytimes.com 0xabcdef01 9 nytimes.com no New challenge: What happens if we process both URLs simultaneously? yes

9 Percolator: Incremental Infrastructure Built on Big table: free random-access (RW) Provide cross-row transaction – Big table only provide single-row transaction Snapshot isolation: timestamp oracle server Notification and Observer – Notification: find the change – Observer: execute transaction

10 Bigtable Background Bigtable is a sorted (row, column, timestamp) store: Column1Column2Column3 RowOne 3: 2: 1: "I'm at timestamp 1" 3: 2: "I'm at timestamp 2" 1: 3: 2: 1: RowTwo 3: 2: 1: 3: 2: 1: 3: 2: 1: Data is partitioned into row ranges called tablets Tablets spread across many machines

11 Cross row transaction Adds distributed transactions to Bigtable Simple API: Get(), Set(), Commit(), Iterate

12 Implementing Distributed Transactions Provides snapshot isolation semantics Multi-version protocol (mapped to Bigtable timestamps ) Two phase commit, coordinated by client Due to unable to handle BT updates Locks stored in special Bigtable columns: "balance" balance:databalance:commitbalance:lock Alice 5: 4: 3: $10 5: 4: data @ 3 3: 5: 4: 3:

13 Transaction Commit balance:databalance:commit Alice 3: $10 4: data @ 3 3: Bob 3: $10 4: data @ 3 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 4: data @ 3 3: 6: 5: lock 4: 3: Bob 4: 3: $10 4: data @ 3 3: 6: 5: 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 4: data @ 3 3: 6: 5: lock 4: 3: Bob 5: $5 4: 3: $10 4: data @ 3 3: 6: 5: lock 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 6: data @ 5 5: 4: data @ 3 3: 5: 4: 3: Bob 5: $5 4: 3: $10 4: data @ 3 3: 5: lock 4: 3: balance:databalance:commitbalance:lock Alice 5: $15 4: 3: $10 6: data @ 5 5: 4: data @ 3 3: 5: 4: 3: Ben 5: $5 4: 3: $10 6: data @ 5 5: 4: data @ 3 3: 5: 4: 3: Transaction t; int a_bal = t.Get("Alice", "balance"); int b_bal = t.Get("Bob", "balance"); t.Set("Alice", "balance", a_bal + 5); t.Set("Bob", "balance", b_bal - 5); t.Commit();

14 Notifications and Observers Users register "observers" on a column: Executed when any row in that column is written Each observer runs in a new transaction Run at most once per write: "message collapsing" Applications are structured as a series of Observers: DocumentProcessor RawDocumentLoader DocumentProcessor DocumentExporter LinkInverter

15 Implementing Notifications Dirty column: set if observers must be run in that row Randomized distributed scan: Finds pending work, runs observers in thread pool Scan is efficient: only scans over bits themselves Dirty?balance:data... AliceYes 5: $15 BobNo 5: $5

16 Running Percolator Each machine runs: Worker binary linked with observer code. Bigtable tablet server GFS chunkserver Observer Code Percolator::RunWorker() Tablet Server... x N GFS

17 Optimizing an Incremental System Documents / day 0

18 Experiment How much Percolator will outperforms MapReduce? What’s the cost of transaction of Percolator? What about Percolator’s scalability?

19 Very Different Access Patterns Percolator: small, random disk I/O many RPCs per phase, per document MapReduce: streaming I/O Many documents per RPC, per phase Infrastructure is much better suited to the MR model. Even though it does "extra" work, it does so very efficiently.

20 MR v. Percolator: Performance

21 MR v. Percolator: Experience We converted an MR-based pipeline to Percolator. Pros: Freshness: indexing delay dropped from days to minutes Scalability: o More throughput: Just buy more CPUs o Bigger repository: Only limited by disk space Utilization: immune to stragglers Cons: Need to reason about concurrency More expensive per document processed (~2x)

22 The overhead of Percolator transaction operations

23 Percolator’s scalability

24 Conclusion Percolator now building the "Caffeine" websearch index 50% fresher results 3x larger repository Existence proof for distributed transactions at web scale.

25 Some Observations The strong consistency lead a high latency – Most app do not required: e.g., graph computing, iterative computing When client fail, locks behind in the system will block later transaction Notification need actively check data changes More a database than a computing Engine

26 Trigger based computing engine … Update Trigger Service In-memory distributed data structure SSD/Disk Persist State storage Pub/Sub Interface

27 Why trigger? Task can be scheduled Resource can be allocated as need

28 Example: Pagerank Update PageRank Trigger Service Page{ list out_edges; list in_edges; double rank; } … 1: M = getValues( in_edges); 2: new_r = compute(M) 3: update(new_r, rank) 4: if (|new_r-rank|>e ) 5: trigger(out_edges) add/delete links add/delete page

29 Batch system input Process task output MapReduce Dryad

30 Streaming System One pass processing – Unable adapting for further changes Output pool Available for use


Download ppt "Percolator: Incrementally Indexing the Web OSDI’10."

Similar presentations


Ads by Google