Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS246 Query Translation. Mind Your Vocabulary Q: What is the problem? A: How to integrate heterogeneous sources when their schema & capability are different.

Similar presentations


Presentation on theme: "CS246 Query Translation. Mind Your Vocabulary Q: What is the problem? A: How to integrate heterogeneous sources when their schema & capability are different."— Presentation transcript:

1 CS246 Query Translation

2 Mind Your Vocabulary Q: What is the problem? A: How to integrate heterogeneous sources when their schema & capability are different

3 Bestbookbuys.com How to integrate? Amazon.combn.com Mediator [au = “Clancy, Tom”] [fn = “Tom”] [ln = “Clancy”] ? [au = “Tom, Clancy”] [fn = “Tom”] [ln = “Clancy”]

4 Framework User expresses a query using a mediator schema Mediator translates the query to source- supported queries Mediator collects and postprocess results from the sources Amazon.combn.com Mediator [fn = “Tom”] [ln = “Clancy”] [fn = “Tom”] [ln = “Clancy”] [au = “Clancy, Tom”]

5 Difference From Previous Studies? Heterogeneous attributes Different “vocabularies” Semantic translation necessary Previous studies assumed homogeneous attributes for all sources Complex Boolean queries Not just conjunctive queries

6 Main Challenge How best to translate a query when the mediator and the source use different model/schema? Author  lastname, firstname Western calendar  Chinese lunar calendar

7 Query Translation Example Q: For the above schema, best translation for [last = “Clancy” & year = “1998” & month = “Jan”] ? A: [author = “Clancy” & date = “winter, 1998”] ? publisher = “publisher” title = “title” author = “last, first?” date = “spring, 2002” Amazon.com publisher = “publisher” title = “title” last = “lastname” first = “firstname” year = “2002” month = “may” Mediator

8 More Translation Examples More translations for the same schemas: [publisher = “p” & last = “l” & first = “f”]  [publisher = “p” & author = “l, f”] [title = “t” & last = “l” & first = “f”]  [title = “t” & author = “l, f”] Do we have to translate every possible query manually? Is it necessary to have separate rules for the above translations? Can the system automatically translate queries? Any idea?

9 Observations The system cannot figure out [last = “l” & first = “f”]  [author = “l, f”] No semantic knowledge User needs to provide these types of mappings There seem to exist “basic” mappings However, system may compose “correct” translation using “basic” translations [last = “l” & first = “f”]  [author = “l, f”] [year = “yy” & month = “Jan”]  [date = “spring, yy”]

10 Framework Human expert provides a set of “basic” rules [last = “l” & first = “f”]  [author = “l, f”] [year = “yy” & month = “Jan”]  [date = “spring, yy”] Mediator Context Source Context Basic rules

11 Framework Given a query, the system automatically translates the query using the basic rules Basic rules Traslation Algorithm Qm: First = “Tom” Last = “Clancy” Qs: Author = “Clancy, Tom”

12 Advantage of the Proposed Framework Minimizes manual intervention Human input only for the initial rule writing Can translate any queries Not just “template” queries

13 Questions How do we know whether a translation is “good” or “correct”? What basic rules are necessary? Do we need a rule for [last = ‘l’ | first = ‘f’]? How do we translate? Algorithm for “good” translation?

14 Good Translation? Q: Why do we think these are good translations? [last = “Clancy” & first = “Tom”]  [author = “Clancy, Tom”] [year = “2002” & month = “Jan”]  [date = “winter, 2002”] A: Results for the translated queries are “ close ” to the original queries

15 Minimum Superset Translation Definition of “closeness” in the paper Q : original query  S(Q) : translated query We also use Q and S(Q) to represent results S(Q) : minimal superset of Q expressed in the source terms Q S 1 (Q) S 2 (Q) Minimum superset translation

16 Minimum Superset Translation Find the minimum superset translation from the original query “Filter out” false positives by applying filtering condition at the mediator

17 Any Alternative for “Closeness”? What about maximum subset translation? Definition of previous studies Maybe a good definition when result is large or filtering is impossible… Q S 1 (Q) S 2 (Q) Maximum Subset Translation

18 Any Alternative for “Closeness”? Consider both false positives and false negatives Maximize | S(Q)  Q | / | S(Q)  Q | Other definitions possible depending on scenario Q S(Q) False positive False negative

19 Questions How do we know whether a translation is “good” or “correct”? Minimal subsuming translation What basic rules are necessary? Do we need a rule for [last = ‘l’ | first = ‘f’]? How do we translate? Algorithm for “good” translation?

20 Three Main Concepts Query Separability Query Safety Cross matching

21 Query Separability Q = [ln = “Clancy”] & [fn = “Tom”] & [p = “Wiley”] We still get minimum superset translation if we separately translate [ln = “Clancy”] & [fn = “Tom”] and [p = “Wiley”] Q = C1  C2  C3 (  : & or | ) is separable if S(Q) = S(C1)  S(C2)  S(C3)

22 Disjunction Separability Theorem [CGM96] Disjunctions are always separable Q = C1 | C2 | C3  S(Q) = S(C1) | S(C2) | S(C3) for any C1, C2 and C3 Assuming minimum superset translation semantics Implication Basic rules are necessary only for conjunctions e.g., [c1 & c2], but not [c1 | c2] Why? Any complex queries can be transformed to DNF Significant simplification for a rule writer

23 Basic Rules Only conjunction of constraints Separability of conjunctions is determined by a human expert [ln & fn] but not [ln & publisher] User-provided basic rules should be sound and complete Soundness: All mappings are correct (minimal subsuming translation) Completeness: Contains all inseparable simple conjunctions

24 Questions How do we know whether a translation is “good” or “correct”? What basic rules are necessary? Do we need a rule for [last = ‘l’ | first = ‘f’]? How do we translate? Algorithm for “good” translation?

25 Translation Algorithm Simple conjunction query Step 1: Find all matching rules ln = “l”  au = “l” ln = “l” & fn = “f”  au = “l, f” p = “p”  p = “p” Q: Rules ln = “l”fn = “f”p = “p” & au = “l”au = “l, f”p = “p”

26 Translation Algorithm Simple conjunction query Step 2: Remove subset matching Superset matching is more “precise” ln = “l”  au = “l” ln = “l” & fn = “f”  au = “l, f” p = “p”  p = “p” Q: Rules ln = “l”fn = “f”p = “p” & au = “l”au = “l, f”p = “p”

27 Translation Algorithm Simple conjunction query Step 3: Generate translated query ln = “l”  au = “l” ln = “l” & fn = “f”  au = “l, f” p = “p”  p = “p” Q: Rules ln = “l”fn = “f”p = “p” & au = “l, f”p = “p” &

28 Translation Algorithm Complex Boolean query? | & Q

29 Solution 1 (Algorithm DNF) Convert to DNF and translate Disjunctions are always separable We can individually translate each disjunct | & au = “l, f1”p = “p”au = “l, f2”p = “p” Q | && DNF

30 What’s Wrong with DNF? DNF conversion is exponential DNF parse tree is not compact Global conversion often not necessary Translation of C3 is independent of others x: [fn …] y: [fn …]  z: [ln...]  [p...] independent C1C2C3

31 Partition conjuncts into independent groups Translate each group separately By rewriting local groups Top level “AND” of C3 is preserved. Group 1: G1 = {C1,C2} Group 2: G2 = {C3} Conjunction Partitioning x: [fn …]y: [fn …]  z: [ln...]  [p...] independent C1C2C3

32 Independent Groups? Q: How do we know G1 and G2 are “independent”? A: Q = G1 & G2 is separable Q: How do we know Q = G1 & G2 is separable?

33 Safety Condition Query seperability is difficult to check directly Safety condition : A practical way to check query separability Sufficient condition for query separability But not a necessary condition

34 Safety Condition for Simple Conjunction M(Q) : Matching rules for Q Q = G1 & G2 G1 and G2 are simple conjunction G1 = [C1 & C2], G2 = [C3 & C4] Q is safe iff M(Q) = M(G1)  M(G2) That is, Q is safe if there is no “cross matching” among G1 and G2 Cross matching: a rule that matches some constraints in G1 and some constraints in G2 Example G1 : [fn=“f1” & fn = “f2”], G2 : [ln = “ln”] Q = G1 & G2 unsafe: cross matching of “fn & ln  au”

35 Safety Condition for Complex Disjunction M(Q) : Matching rules for Q Q = G1 & G2 G1 and G2 are complex disjunction G1 = [C1 | C2], G2 = [C3 | C4] 1. Disjuntivize Q : Q = [C1 & C3] | [C1 & C4] | [C2 & C3] | [C2 & C4] 2. Q is safe iff every disjunct is safe i.e., if all [C1 & C3], [C1 & C4], [C2 & C3], and [C2 & C4] are safe

36 Important Theorem A query is separable if it is safe (i.e., query separability  safety) A query is safe if there is no cross matching (i.e., safety  no cross matching) If there is a cross-matching between conjuncts, we cannot separately translate them Put them into the same group

37 Algorithm TDQM Recursively traverse the query tree in the top- down order At a disjunction node: Separately translate its children At a conjunction node: Put the children with cross matching into the same group and rewrite the query locally in each group

38 At a disjunction node Separately apply TDQM each child Disjunction separability theorem Algorithm TDQM x:[fn…] y:[fn…]  z:[ln...]  v:[p...]  w:[y...] Recursively traverse the tree top-down

39 G1G2 C1 C2C3 At a conjunction node Group children by identifying “cross-matchings” No cross-matching between groups (safety condition) Algorithm TDQM x:[fn…] y:[fn…]  z:[ln...]  v:[p...]  w:[y...] {x,z}{y,z} cross-matchings:

40 For groups with more than one conjunct Locally rewrite into a disjunctive form (not DNF) Algorithm TDQM G1G2 C1 C2C3 x:[fn…] y:[fn…]  z:[ln...]  v:[p...]  w:[y...]    x  zyz G2 C3 v:[p...] G1

41 For groups with more than one conjunct Locally rewrite into a disjunctive form (not DNF) Algorithm TDQM  w:[y...]    x  zyz G2 v:[p...] G1

42 Continue tree traversal until we reach simple conjunction and apply basic mappings Algorithm TDQM  w:[y...]    x  zyz v:[p...]

43 Algorithm TDQM Generates minimum superset translation Resulting translation is “compact” Assuming the original query is “compact” Convert the tree only when it is necessary

44 TDQM Summary Key concepts Seperability  Safety  cross matching Local rewriting for compact translation

45 A Few Remarks Final algorithm is straightforward Simply put, separately translate each term if there is no “cross-matching” Many people can come up with the algorithm But the author developed an amazing theory by carefully studying basic questions Initial problem looks rather “trivial” But a mine-field of interesting research topics…

46 Questions?


Download ppt "CS246 Query Translation. Mind Your Vocabulary Q: What is the problem? A: How to integrate heterogeneous sources when their schema & capability are different."

Similar presentations


Ads by Google