Presentation is loading. Please wait.

Presentation is loading. Please wait.

K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

Similar presentations


Presentation on theme: "K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science."— Presentation transcript:

1 K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science and Technology Doklea Meci (A.M 2152) May 2012 University Of Crete Department Of Computer Science 1

2 O UTLINE Introduction Relational Keyword Search On Tables Graph-Based Processing Operator-Based Processing Optimizations For Continuous GB Predecessor-KL Time-KL Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation Conclusion 2

3 THE CHALLENGES OF ACCESSING STRUCTURED DATA Query languages: Numerous complex SQL statements Schemas: Complex, or nontrivial schema R-KWS queries: replaces numerous complex SQL statements liberates users from studying a database schema allows querying for terms in unknown locations (tables/attributes) 3

4 I NTRODUCTION KeyWord Search (KWS) each document/Web page constitutes one unit of information a result if it contains a subset of the querys keywords has been applied to relational DBMS allows data retrieval without SQL Relational-Keyword Search (R-KWS) the basic unit of information is a record/tuple queries cannot be answered by inspecting records individually results have to be constructed by joining tuples

5 O UTLINE Introduction Relational Keyword Search On Tables Graph-Based Processing Operator-Based Processing Optimizations For Continuous GB Predecessor-KL Time-KL Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation Conclusion 5

6 R ELATIONAL K EYWORD S EARCH O N T ABLES Goal: methods for BG and OB processing avoid the shortcomings of prior systems improve performance of R-KWS in conventional databases

7 G RAPH -B ASED P ROCESSING Basic Idea: given an inverted index I (on disk), it traverses an undirected data graph G (in memory), searching for MTJNT (Minimal Total Join Networks of Tuples ) results JNT –Join Networks of Tuples (JNT), which are connected acyclic components of G A JNT is called Minimal Total JNT (MTJNT) iff it is impossible to remove any node and find the remainder to be total 7

8 G SEARCH A LGORITHM Basic Idea: the algorithm enumerates all possible trees in G rooted at sn Result: a tree that corresponds to an MTJNT 8

9 G SEARCH A LGORITHM 9

10 GSearch computes the set of MTJNT containing node sn and so GB answers an R-KWS query q correctly, completely, without duplicates. 10

11 O PERATOR -B ASED P ROCESSING 11

12 E XAMPLE 12

13 E XAMPLE 13

14 O UTLINE Introduction Relational Keyword Search On Tables Graph-Based Processing Operator-Based Processing Optimizations For Continuous GB Predecessor-KL Time-KL Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation Conclusion 14

15 O PTIMIZATIONS F OR C ONTINUOUS GB 15

16 E XAMPLE 16

17 BENEFITS OF A MIN - COMPLETE LABELING 17

18 18

19 P REDECESSOR -KL IMPLEMENTATION 19

20 P REDECESSOR -KL E XAMPLE 20

21 T IME -KL 21

22 T IME -KL EXAMPLE 22

23 O UTLINE Introduction Relational Keyword Search On Tables Graph-Based Processing Operator-Based Processing Optimizations For Continuous GB Predecessor-KL Time-KL Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation Conclusion 23

24 OPTIMIZATIONS FOR CONTINUOUS OB 24

25 O PERATOR M ESH (1/3) 25

26 O PERATOR M ESH EXAMPLE 26 shows the shared execution of four operator trees

27 O PERATOR M ESH EXAMPLE 27

28 P ROBLEMS WITH O PERATOR M ESH APPROACH 28

29 D EMAND -D RIVEN O PERATOR E XECUTION (2/3) This mesh is maintained in main memory throughout the lifespan of the query. A join is considered to be either running - operators process input Sleeping – operators ignore input A join operator is sent to sleep if: it has no input from the right child (a source), or all its parents are sleeping Sending operators to sleep does not affect the results correctness or completeness because either: the operator cannot produce output, or its output would not be consumed 29

30 D EMAND -D RIVEN O PERATOR E XECUTION - EXAMPLE Shows the state diagram for a join operator 30

31 D EMAND -D RIVEN O PERATOR E XECUTION - EXAMPLE States are characterized by two binary flags: d indicating that at least one parent operator is running, and r specifying that the operators right input is not empty. An operator only runs in the topmost state (d/r) Operators exchange messages regarding their state, in order to ensure that all d and r flags are up-to- date. When it leaves this state (transition 2 or 3) it goes to sleep (or halts), to wake up (or restart) later (transitions 9 and 10) a join operator communicates changes (running/sleeping) to its left child that adjusts its d flag 31

32 D EMAND -D RIVEN O PERATOR E XECUTION - EXAMPLE 32

33 D EMAND -D RIVEN O PERATOR E XECUTION - EXAMPLE 33

34 34

35 35 Note!!! this method is not restricted to keyword search; it can equally benefit other data stream applications.

36 P ARTIAL -M ESH (3/3) B ASIC I DEA A Partial-Mesh (PM) is built at runtime and breaks the distinction between operator initialization Tuple processing The method maintains relatively few active operators in memory It is each operators responsibility to create its parents before it can produce output It destroys its parents (and other operators up the tree) if it cannot supply them with input In large meshes operators are idle Their absence does not affect results completeness, but dramatically reduces memory consumption 36

37 P ARTIAL -M ESH E XAMPLE 37

38 P ARTIAL -M ESH E XAMPLE 38

39 P ARTIAL -M ESH A LGORITHM 39

40 P ARTIAL -M ESH E XAMPLES OF T REE G EN. 40

41 O UTLINE Introduction Relational Keyword Search On Tables Graph-Based Processing Operator-Based Processing Optimizations For Continuous GB Predecessor-KL Time-KL Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Queries over Streams Conclusion 41

42 S NAPSHOT R-KWS Q UERIES OVER T ABLES (1/3) 42

43 E XAMPLE 43

44 E XAMPLE - SEVEN SETS OF R-KWS QUERIES QS 1 -QS 7 QS 1, QS 2 : peoples or companies names (denoted as PeopleName), which appear in the columns Customer. Name, Supplier.Name, and Orders.Clerk ; ( retrieve connections between multiple people ) QS 3 /QS 4: terms from the name of apart, for example, ivory, from the Part.Name attribute; 44

45 E XAMPLE - SEVEN SETS OF R-KWS QUERIES QS 1 -QS 7 QS 5, QS 6 : years, which are present in LineItem.ShipDate, LineItem.CommitDate, LineItem.ReceiptDate, Orders.OrderDate; QS 7 : terms from Part.Brand, Part.Mfgr, Part.Size, and Part.Container 45

46 E XAMPLE - PROCESSING TIME FOR QUERIES QS 1 -QS 7 46

47 S NAPSHOT R-KWS Q UERIES OVER T ABLES –C ONCLUSION 47 (+) For conventional tables, GB is more efficient than OB,. GB methods, GSearch avoids duplicate results reduces the total cost GB is preferable for datasets with frequent updates (-) Not efficient for queries involving numerous keywords and/or a large value of T max consumes a large amount of main memory to store the data graph Conclusion: On servers dedicated for R-KWS queries, GB is the best choice due to its high performance (+) OB utilizes the functionality provided by a DBMS, and, thus, can answer R-KWS queries using much less memory than GB Conclusion: On servers running multiple applications and only answering R- KWS queries infrequently, OB might be preferable due to its low memory footprint GBOB

48 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS (2/2) 48

49 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS 49

50 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS 50

51 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS 51

52 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS 52

53 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS 53

54 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS 54

55 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS 55

56 C ONTINUOUS R-KWS Q UERIES OVER S TREAMS - C ONCLUSION 56 FM is usually the most CPU-efficient method for a single query GB and PM are more economical in terms of memory consumption FULL MESH (FM)Partial Mesh (PM)

57 O UTLINE Introduction Relational Keyword Search On Tables Graph-Based Processing Operator-Based Processing Optimizations For Continuous GB Predecessor-KL Time-KL Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Queries over Streams Conclusion 57

58 C ONCLUSION – ADVANTAGES OF R - KWS R-KWS handles broad query tasks whose complexity does not permit handcoded structured queries Presents considerable algorithmic challenges because query processing has to explore a vast search space Challenges are faced through a series of contributions they provide R-KWS semantics that are well defined and easily extensible to streaming environments develop GB and OB processing techniques that match these semantics and remedy problems encountered in previous systems they adapt their framework to relational streams, and propose a wide range of optimizations support their claims through an extensive set of experiments 58

59 C ONCLUSION – FUTURE WORK They plan to further improve R-KWS performance by means of indexing They intend to integrate ranking into continuous R-KWS query processing Example: if there are a sudden burst of results, it may be desirable to report only the top-k answers for the affected period. 59


Download ppt "K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science."

Similar presentations


Ads by Google