Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494.

Similar presentations


Presentation on theme: "Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494."— Presentation transcript:

1 Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494

2 October 13, 20112 Example Definitions Algorithm CN Generation Sequential Algorithm CLP : Naïve CLP : New OLP DLP Performance Studies CN Evaluation CONTENTS

3 October 13, 20113 BANKS Model Author1Author2 Paper1 Author1Author2 Paper2 Steiner Trees

4 October 13, 20114 DISCOVER Model Author1Author2 Paper1 TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITES PAPERCITE Writes {} Paper {} Writes {} Joining Network Of Tuples Joining Network Of Tuple Sets Author1: Paper1 Author2: Paper1 Author1Author2 Paper2 Author1: Paper2 Author2: Paper2 Author Author1 Author Author2 Author Author1 Writes {} Paper {} Writes {} Author Author2

5 5 Background : DISCOVER October 13, 2011

6 6 Background : DISCOVER Schema Graph (TPC-H) October 13, 2011

7 Background : DISCOVER 7 Example Data Source : Discover[3] October 13, 2011

8 Background : DISCOVER 8 Query: Smith,Miller” Source : Discover[3] October 13, 2011

9 9 Source : Discover[3] Background : DISCOVER Query: Smith,Miller” SIZERESULT 2 O1 C1 O2 October 13, 2011

10 10 Source : Discover[3] Background : DISCOVER Query: Smith,Miller” SIZERESULT 2 O1 C1 O2 4 O1 C1 N1 C2 O3 Joining Network Of Tuples October 13, 2011

11 11October 5, 2011 Joining Network Of Tuple Sets Background : DISCOVER Source : Discover[2]

12 12 Background : DISCOVER October 13, 2011

13 13 Background : DISCOVER October 13, 2011

14 14  Candidate Networks Generation  Complete : Every possible MTJNT is produced by a candidate network output by the algorithm  Minimal : Does not produce any redundant candidate networks Example:  ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS Miller  ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS Miller ⋈ CUSTOMER{}  ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS{}  ORDERS Smith ⋈ LINEITEM{} ⋈ ORDERS Miller  Tmax : Maximum number of tuple sets in a CN Background : DISCOVER October 13, 2011

15 15 CN Generation October 13, 2011 Source : Discover[2]

16 16 CN Generation October 13, 2011 Source : Discover[2]

17 17 CN Generation October 13, 2011 Source : Discover[2]

18 18 CN Evaluation : October 13, 2011

19 Sequential Algorithm : Example 19  Dataset : DBLP Source : TTS[1] TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITE PAPERCITE October 13, 2011

20 20 Source : TTS[1] Sequential Algorithm : Example TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITE PAPERCITE October 13, 2011

21 CN Evaluation : state-of-art sequential algorithm 21October 13, 2011

22 22 Source : TTS[1] Sequential Algorithm : Execution Graph October 13, 2011

23 23 Sequential Algorithm : Execution Graph October 13, 2011

24 24 New Solution  Use of multi-core architecture  Why not existing parallel multi-query processing?  Large number of queries  Large sharing between queries  Large intermediate results  What we need on multi-core archs?  CNs in the same core share : most computational cost  CNs in different cores share : least computational cost  Handle high workload skew  Handle errors caused by estimation adaptively October 13, 2011

25 25 CN Level Parallelism : Straightforward Approach  largest first rule : partition with the least workload Final Cost : max(cost of each core) = 1949 Source : TTS[1] October 13, 2011

26 26 CLP : Straightforward Approach Source : TTS[1] select the core : O(n) October 13, 2011

27 27 CLP: Sharing-Aware CN Partitioning  Which CN to distribute first?  the largest not-shared/extra cost  To which partition?  with maximum sharing if it does not destroy the workload balancing.  Total cost for a partition = cost after sharing sub-expressions for all CNs in that partition October 13, 2011

28 APPAPP W C CWC C PPP Core 1Core 2Core 3 CNMinCost 1720 2727 3 4715 5727 6 7715 8727 9 10 100102 500510 50 MaxHeap 5 555 55555 : Non-Exec Graph of Core 3 October 13, 201128

29 APPAPP W C CWC C PPP 10 100102 500510 50 MaxHeap Core 1Core 2Core 3 CNMinCost 1610 2727 3115 4605 5115 6727 7715 8115 9 5 555 55555 October 13, 201129

30 APPP W C WC C PP Core 1Core 2Core 3 CNMinCost 1115 2727 3115 4 5 6727 7715 8115 9 10 102 510 50 5 555 55555 MaxHeap October 13, 201130

31 PPP C WC C P CNMinCost 1115 2727 3115 4 5 6727 7715 8115 9 10 102 510 50 5 555 55555 MaxHeap Core 1Core 2Core 3 October 13, 201131

32 PP WC C CNMinCost 1115 2727 3115 4 5 6727 7715 8115 9 102 510 5 555 55555 Core 1Core 2Core 3 MaxHeap October 13, 201132

33 33 CLP: Sharing-Aware CN Partitioning Source : TTS[1] October 13, 2011

34 34 CLP: Sharing-Aware CN Partitioning Source : TTS[1] Initialization October 13, 2011

35 35 CLP: Error Accumulation Source : TTS[1] October 13, 2011

36 36 Operator Level Parallelism October 13, 2011

37 37 Operator Level Parallelism Source : TTS[1] October 13, 2011

38 38 OLP : Overcoming Error Accumulation October 13, 2011

39 39 OLP : Overcoming Accumulated Cost Source : TTS[1] 643 685 October 13, 2011

40 40 Operator Level Parallelism Source : TTS[1] October 13, 2011

41 41 Data Level Parallelism  each operation in GE can be performed on multiple cores  uses the operation level parallelism if there is no workload skew  partition data adaptively before each time workload skew happens  Which node to partition?  Most costly node if its dominant  When to merge the sub-results?  At final phase October 13, 2011

42 42 Data Level Parallelism Source : TTS[1] Core 1 Core 2Core 3 October 13, 2011

43 43 Data Level Parallelism Source : TTS[1] Divide the tuples of child node Select the child node to be partitioned Makes copies of selected child node and all its father nodes Adds corresponding edges Re-estimate October 13, 2011

44 44 Performance Studies October 13, 2011

45 45 Source : TTS[1] Performance Studies October 13, 2011

46 46 Source : TTS[1] October 13, 2011

47 47 Source : TTS[1] October 13, 2011

48 48 Source : TTS[1 ] October 13, 2011

49 References 1. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Ten Thousand SQLs: Parallel Keyword Queries Computing, Proceedings of the VLDB Endowment, Volume 3 Issue 1-2, September 2010, Singapore 2. Vagelis Hristidis, Yannis Papakonstantinou, Discover: keyword search in relational databases, VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases, Hong Kong 3. [PPT] DISCOVER: Keyword Search in Relational Databases 49October 13, 2011

50 50October 13, 2011


Download ppt "Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494."

Similar presentations


Ads by Google