Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nectar: Efficient Management of Computation and Data in Data Centers Lenin Ravindranath Pradeep Kumar Gunda, Chandu Thekkath, Yuan Yu, Li Zhuang.

Similar presentations


Presentation on theme: "Nectar: Efficient Management of Computation and Data in Data Centers Lenin Ravindranath Pradeep Kumar Gunda, Chandu Thekkath, Yuan Yu, Li Zhuang."— Presentation transcript:

1 Nectar: Efficient Management of Computation and Data in Data Centers Lenin Ravindranath Pradeep Kumar Gunda, Chandu Thekkath, Yuan Yu, Li Zhuang

2 Motivation Resources are poorly managed in a data center Computation Storage Redundant computations – Wasting resources Manually managed – Unused files occupying space – Redundant output files

3 Goal Efficiently manage resources in a cluster Computation Storage Nectar

4 Key Insight Data Center Computation Storage Single query interface for computation and data access DryadLINQ Query Interface User

5 Goal Efficiently manage resources in a cluster Computation Storage Nectar

6 Computation PROBLEM: Redundant Computation – Programs share sub queries – Programs share partial data sets SOLUTION: Caching – Cache results of popular sub queries – Automatically rewrite user query to use cache X.Select(…) X.Select(…).Where(…) X.Select(…) (X+X’).Select(…) 1234567 2345678

7 Does caching help? Analyzed logs from production clusters Logs of 3 months (Oct – Dec 2008) 33 virtual clusters, 36000 jobs Parsed SCOPE programs, extracted sub queries Simulated caching

8 Caching helps About 50% cache hit on 10 clusters More than 30% cache hit on 20 clusters 35% on average

9 Goal Efficiently manage resources in a cluster Computation Storage Nectar

10 Storage PROBLEM: Manually managed – Unused files occupying space 50% data was never accessed in the last 275 days

11 Storage SOLUTION: Automatically manage data – Track usage and delete infrequently used files – Store programs which re-computes the data

12 Query Interface Data Center Computation Storage DryadLINQ Query Interface User

13 Goal Efficiently manage resources in a cluster Computation Storage Nectar

14 Data Center Computation Storage DryadLINQ Query Interface Nectar User

15 Nectar Architecture Query Rewriter DryadLINQ Dryad DryadLINQ program Query Cache entries Nectar Client Cache Server Add T to cache P P’ Add R to cache R T Cluster

16 Nectar Architecture Query Rewriter Nectar Client Cache Server

17 Query Rewriter Select X X R R X X X’ Select X’ Select R R Concat (R+R’) Cache

18 Query Rewriter Select X X R R X X X’ Select X’ Select R R Merge Sort (R+R’) Cache Order by

19 Query Rewriter Generates multiple plans – Using multiple cache entries Selects the best plan – Based on benefit Execution time Output Size Whether pipeline is broken Operators supported – Select, Where, Order by, Group by, Join X.Select(…) X.Select(…).Where(…)

20 Nectar Architecture Query Rewriter Nectar Client Cache Server

21 SQL Server Garbage Collector Cache Policy Cache Server URIQuery Fingerprint Query + Data Fingerprint Execution Time Output Size Inquire Stats Usage Stats Fingerprints

22 Cache policy Insertion Policy – Always add program output to cache – Sub query outputs are added to cache Popularity exceeds a threshold Savings exceeds a threshold

23 Garbage Collector Storage pressure – Delete infrequently used files Deletion policy – Based on savings – Cache type Mark and sweep algorithm – Delete cache entry – Reachability analysis Delete files Cache Server 1 2 3 Distributed FS 1 2

24 What if I try to access a garbage collected file?

25 Nectar Architecture Query Rewriter Nectar Client Cache Server Program store

26 Program Store Store executed programs in the cluster Output file is tied to its corresponding program that generates the output If a file is deleted, the program is executed to regenerate the output

27 Managing Data Nectar Client Program Store Distributed FS foo.pt Cache Server FP Program FP A31E4.pt ToPartitionedTable (lenin\foo.pt) DryadLINQ Dryad usrNectar P’ Program P

28 Managing Data Nectar Client Program Store Distributed FS foo.pt Cache Server FP Program FP FromPartitionedTable (lenin\foo.pt) DryadLINQ Dryad usrNectar P A31E4.pt

29 Managing Data Nectar Client Program Store Distributed FS foo.pt Cache Server FP Program FP FromPartitionedTable (lenin\foo.pt) DryadLINQ Dryad usrNectar P A31E4.pt Program KJ1LM.pt

30 Goal Efficiently manage resources in a cluster Computation Storage Nectar Computation Storage Unified computation and data

31 Distributed cache servers Cache Server SQL Server Partitioned by query fingerprint Nectar Client Centralized Garbage collector Centralized Garbage collector Hash based on query fingerprint Program store Cache Server SQL Server

32 Summary We built Nectar – Automatically manage data – Efficiently manage computation Components Query Rewriter – Automatically rewrite queries to use cache Cache server – Popular sub queries are cached – Garbage collected based on usage Program store – Store programs which regenerates the output

33 Status Almost done with development – Query Rewriter Including other operators – Fingerprinter Program static analysis – Cache Server – Program Store In the process of deploying

34 Can we do better?

35 Cluster Utilization Most clusters have more than 40% Idle time Even the busiest clusters have 10-20% idle time

36 Exploiting idle time Do speculative caching – Cache popular data before query issued – Run program on new streams when available No side effects – Executed only when cluster is idle – Low priority jobs – Output garbage collected with high priority – More electric bill? Not Really!

37 Questions

38 Backup

39 Caching Results


Download ppt "Nectar: Efficient Management of Computation and Data in Data Centers Lenin Ravindranath Pradeep Kumar Gunda, Chandu Thekkath, Yuan Yu, Li Zhuang."

Similar presentations


Ads by Google