Download presentation

Presentation is loading. Please wait.

Published byCael Judson Modified about 1 year ago

1

2
MONADIC QUERIES over TREE-STRUCTURED DATA Georg Gottlob TU Wien & Oxford University Joint work with Christoph Koch, Robert Baumgartner, and Marcus Herzog, and Reinhard Pichler

3
Talk Outline Semistructured data HTML, XML Monadic Queries Monadic datalog over trees Xpath Web information extraction (wrapping) Lixto

4
Strings, Trees, Graphs, & Logic Büchi: MSO=REG over strings Rabin: decidability of S2S Thatcher and Wright: MSO = REG over ranked trees (tree automata) Brüggemann-Klein/Wood/Murata: MSO = REG over unranked trees Fagin: ESO = NP Note: over graphs ESO NP-hard, MSO hard for Pol. Hierarchy. Grädel/Immerman/Vardi: ESO(Horn)=Datalog=LFP=PTIME (on ordered structures) Courcelle MSO in LinTime on tree-like structures (treewidth <= k) Clarke, Emerson, Pnueli, et al: CTL, LTL … A few well-known results:

5
Web documents are trees ! HTML: Hypertext Markup Language XML: Extensible Markup Language HTML, XML: Context free languages. Represent a document by its parse tree. Tags: vertex labels Labeled trees.

6
DBAI Georg Gottlob Christoph Koch HTML Example Georg Christoph DBAI htmlbody table tr td tr td Christoph Koch Georg h1

7
DBAI Georg Gottlob Christoph Koch HTML Example Georg Christoph DBAI htmlbody table tr td tr td Christoph Koch Georg h1

8
DBAI Georg Gottlob Christoph Koch HTML Example Georg Christoph DBAI htmlbody table tr td tr td Christoph Koch Georg h1

9
……. paperDB paper author title chandramerlin“Conjunctive Queries” paper author title …… XML Example … …

10
paper author title chandramerlin“Conjunctive Queries” Ordered Trees as finite structures Child-relation is a priori unordered fc = first child ns = next sibling paper authortitle “Conj. Queries”chandramerlin fc ns fc ns

11
Core XPath simple location steps paper/title loc. steps with explicit axes paper/descendant::merlin qualifiers paper[…..] Boolean logic...[chandra and merlin and (not harel)] Full Xpath: node set comparisons and operations order functions (first, last, position), etc. arithmetic and string operations Implementations: in the context of XSLT processors Xalan, XT, MS Internet Explorer (IE6)

12
XPath Examples /descendant::a/child::b a d a b b b c c c /descendant::a/child::b[ descendant::c and not(following-sibling::d)] a d a b b b c c c /descendant::a/child:b[ following-sibling::d] a d a b b b c c c

13
paper author title chandramerlin“Conjunctive Queries” Ordered Trees as finite structures Child-relation is a priori unordered fc = first child ns = next sibling paper authortitle “Conj. Queries”chandramerlin fc ns fc ns U = aa

14
Monadic Queries over Trees Web Information Extraction ( later) Monadic XML Queries Select some nodes of a tree Unary query f: Trees 2 dom Select titles of articles authored by Chandra and Merlin No Joins or combinations of objects Yardstick: Monadic Second Order Logic (MSO) Two important applications:

15
Monadic Datalog over Trees Select titles of articles authored By Chandra and Merlin paper authortitle “Conj. Queries”chandramerlin fc ns fc ns paperDB fc paper ns

16
Monadic Datalog over Trees paper authortitle “Conj. Queries”chandramerlin fc ns fc ns paperDB fc paper ns paper(X) root(R) & firstchild(R,X). paper(X) paper(Y) & nextsibling(Y,X). output(X) paper(P) & firstchild(P,A) & firstchild(A,Z) & label Chandra (Z) & nextsibling(Z,V) & label Merlin (V) & nextsibling(A,T) & firstchild(T,X). ns

17
How expressive is monadic Datalog? Over U, Monadic Datalog = MSO It was known that: Monadic Datalog 1 -MSO Full Datalog = P Theorem [G. & Koch 2002]: A unary query is definable in MSO iff it is definable via a monadic datalog program.

18
Proof idea: Simulate Unranked Query Automata (UQA) by Neven and Schwentick in mon. Datalog UQA Unary MSO Queries [Neven & Schwentick 01]

19
Example: “Even-query” Up transition Proof idea: Simulate Unranked Query Automata (UQA) by Neven and Schwentick in mon. Datalog

20
Example: “Even-query” 0010 Up transition 01 Proof idea: Simulate Unranked Query Automata (UQA) by Neven and Schwentick in mon. Datalog

21
Example: “Even-query” 0010 Up transition qodd(X) :- 0(Y), lastchild(X, Y). 01 Proof idea: Simulate Unranked Query Automata (UQA) by Neven and Schwentick in mon. Datalog

22
How complex is Monadic Datalog? Monadic Datalog over U has combined complexity: O(|data| * |query|) Data Complexity: P-complete and linear-time. Theorem [G. & Koch 2002]: Previously known facts on full Datalog over Graphs: Data Complexity of Datalog: P-complete (impl. in [Vardi 88]) Combined Complexity EXPTIME-complete (impl. [Vardi 88]) Comb. Compl. of sirups: EXPTIME-cplt. ([G. & Papadimitriou 99])

23
Proof idea: 1.) Transform datalog program + input tree in linear time into a “ground” propositional logic program Exploit functional dependencies: nextsibling(X,Y) has only a linear number of ground instances: nextsibling(n i,n j ), etc. Decouple independent atoms of rule bodies p(X) q(X) & r(Y) & nextsibling(X,Z) & s(Z). p(X) q(X) & r & nextsibling(X,Z) & s(Z). r r(Y). 2.) Execute ground program in linear time by using well-known algorithms: [Dowling&Gallier] [Minoux]

24
XPath chandra paper authortitle “Conj. Queries”merlin fc ns fc ns paperDB fc Paper… ns //paper[author[chandra and merlin]]/title /descendant::paper[child::author[child::chandra and child::merlin]]/child::title Unabbreviated syntax with explicit axes: /descendant::chandra/following-sibling::merlin/ancestor::paper/child::title W3C-standard; kernel of XSLT, XQUERY, etc.

25
chandra paper authortitle “Conj. Queries”merlin fc ns fc ns paperDB fc Paper… ns desc. Core XPath: A tree morphism problem anc. child chandra root merlin paper title query tree w. location steps data tree foll-s. /descendant::chandra/following-sibling::merlin/ancestor::paper/child::title

26
chandra paper authortitle “Conj. Queries”merlin fc ns fc ns paperDB fc Paper… ns /descendant::chandra/nextsibling::merlin/ancestor::paper/child::title desc. Core XPath: A tree morphism problem foll-s. anc. child chandra root merlin paper title query tree w. location steps data tree

27
Core XPath simple location steps paper/title loc. steps with explicit axes paper/descendant::merlin qualifiers paper[…..] Boolean logic...[chandra and merlin and (not harel)] Full Xpath: node set comparisons and operations order functions (first, last, position), etc. arithmetic and string operations Implementations: in the context of XSLT processors Xalan, XT, MS Internet Explorer (IE6)

28
Core XPath simple location steps paper/title loc. steps with explicit axes paper/descendant::merlin qualifiers paper[…..] Boolean logic...[chandra and merlin and (not harel)] Full Xpath: node set comparisons and operations order functions (first, last), etc. arithmetic and string operations Implementations: Xalan, XT, MS Internet Explorer 6 (IE6) Complexity, efficiency? [G.,Koch,Pichler,VLDB 02]

29
Core Xpath on Xalan and XT Queries: a/b/parent::a/b/…parent::a/b exponential! Document:

30
Core Xpath on Microsoft IE6: polynomial combined complexity, quadratic data complexity quadratic

31
Full XPath on IE6: Exponential combined complexity! Exponential query complexity

32
Axes and regular expressions Observation: All XPath Axes can be expressed as regular expression of U -axes firstchild and nextsibling: child := firstchild.nextsibling* parent := (nextsibling -1 )*.firstchild -1 descendant := firstchild.(firstchild nextsibling)* etc … General Definition of “axis” : Relation definable via a regular expression (with inversion) from the primitive relations of U

33
Conjunctive queries with axes Evaluating conjunctive queries with axes over trees is NP-complete (query complexity) Theorem: CQ: conjunction of U -atoms and of atoms corresponding to derived axes Example : nextsibling(X,Z) & descendant(Z,U) & ancestor(U,V) & label a (V) & child(V,X) & (firstchild.firstchild firstchild -1 )(U,X)

34
Conjunctive queries with axes Evaluating conjunctive queries with axes over trees is NP-complete (query complexity) Theorem: CQ: conjunction of U -atoms and of atoms corresponding to derived axes Example : nextsibling(X,Z) & descendant(Z,U) & ancestor(U,V) & label a (V) & child(V,X) & (firstchild.firstchild firstchild -1 )(U,X) However: XPath more akin acyclic conjunctive queries!

35
Acyclic conjunctive queries with axes Evaluating acyclic conjunctive queries with axes over trees is feasible in time O(|data| * |query|) Theorem: Proof idea: translate acyclic qery into monadic datalog program over U descendant(X,Y) child(A,X) descendant(Y,Z) label a (Z) label b (Y)

36
Acyclic conjunctive queries with axes Evaluating acyclic conjunctive queries with axes over trees is feasible in time O(|data| * |query|) Theorem: Proof idea: translate acyclic qery into monadic datalog program over U descendant(X,Y) child(A,X) descendant(Y,Z) label a (Z) label b (Y) Ear atom which contains an ear variable that otherwise occurs in monadic atoms only. Is definable as (unary) MSO-query and thus expressible by a monadic datalog program.

37
Acyclic conjunctive queries with axes Evaluating acyclic conjunctive queries with axes over trees is feasible in time O(|data| * |query|) Theorem: Proof idea: translate acyclic qery into monadic datalog program over U descendant(X,Y) child(A,X) descendant(Y,Z) label a (Z) label b (Y) d(Y) <- firstchild(Y,Z) & aa(Z). aa(Z) label a (Z). aa(Z) aa(V) & nextsibling(Z,V). aa(Z) aa(V) & firstchild(Z,V)

38
Acyclic conjunctive queries with axes Evaluating acyclic conjunctive queries with axes over trees is feasible in time O(|data| * |query|) Theorem: Proof idea: translate acyclic qery into monadic datalog program over U descendant(X,Y) child(A,X) d(Y) label b (Y) d(Y) <- firstchild(Y,Z) & aa(Z). aa(Z) label a (Z). aa(Z) aa(V) & nextsibling(Z,V). aa(Z) aa(V) & firstchild(Z,V)

39
Acyclic conjunctive queries with axes Evaluating acyclic conjunctive queries with axes over trees is feasible in time O(|data| * |query|) Theorem: Proof idea: translate acyclic qery into monadic datalog program over U descendant(X,Y) child(A,X) d(Y) label b (Y) Ear atom. Continue eliminating ear atoms until query is entirely monadic.

40
Acyclic Monadic Datalog with Axes Evaluating AMX-datalog programs over trees is feasible in time O(|data| * |program|) Theorem: AMX-Datalog: monadic datalog programs whose rule bodies are acyclic and may contain arbitrary axes Remarks: Same bound for stratified AMX-Datalog AMX-Datalog expresses MSO over U (both without and with stratification)

41
Core XPath in Linear Time Evaluating core-XPath queries over trees is feasible in time O(|data| * |query|) Corollary: Proof: Linear translation from Core XPath to stratified Monadic Datalog + axes

42
Core XPath in Linear Time Evaluating core-XPath queries over trees is feasible in time O(|data| * |query|) Corollary: //paper[author[chandra and not merlin]]/title output(X) root(R) & descendant(R,P) & label paperr (P) & qual1(P) & child(P,X) & label title (X). qual1(X) child(X,Y) & label author (Y) & qual2(Y). qual2(X) child(X,Y) & label chandra (Y) & not qual3(X) qual3(X) child(X,M) & label merlin (M).

43
Full XPath in Polynomial Time Evaluating full XPath queries over XML documents is feasible in polynomial time (combined complexity) Theorem [G.,Koch,Pichler, VLDB 2002]: Proof: Extends the Logic Programming evaluation paradigm to all “nasty” features of full XPath. Implementation (main memory): XML-Taskforce XPath To our knowledge the only XPath system that scales.

44
Combined Complexity of XPath PODS’03, JACM’05

45
Data and Query Complexity Theorem. XPath is in L (data complexity). Theorem. PF is L-hard under NC1-reductions (data complexity). Theorem. XPath w/o multiplication, concatenation is in L w.r.t. query complexity. L L-complete (NC1-red.) XPath PF Data complexity

46
Core XPath and CTL Straightforward translation from Core XPath with vertical axes to CTL with past modalities. (On graphs with child relation – order independent!) //paper[author[chandra and merlin]]/title title & EX -1 (paper & EX(author & EXchandra & EXmerlin)) //title[parent::paper[author[chandra and merlin]]] first normalize to: Core XPath requires multimodal CTL: X , X , etc.

47
General conjunctive queries with axes We know they are NP-complete, but… Research programme: Find interesting sets of axes for which CQs are tractable. Trace the “tractablity frontier”, i.e., determine all maximal sets of axes for which CQs are tractable. Extend tractability results to datalog. PODS 2004: G.,Koch, Schulz: Solved for all XPath axes

48
Cyclic Query Example (from ComputationalLinguistics)

49
Complexity Results (Partition of set of axes!) (combined complexity)

50
Some simple tractability results: CQs with U -atoms and additional axe-sets {child} or {child +,child*} can be answered in time O(|data|*|query|). Proof idea for {child}: Cycles involving child: unsatisfiable (easy to check), or rewritable in linear time into acyclic CQs

51
Proof idea for {child +,child*} X:a + Y:bZ:c * U:c * * a b c c c Data tree TCyclic query Q

52
Proof idea for {child +,child*} X:a + Y:bZ:c * U:c * * a b c c c XYZU

53
Proof idea for {child +,child*} X:a + Y:bZ:c * U:c * * a b c c c XYZU

54
Proof idea for {child +,child*} X:a + Y:bZ:c * U:c * * a b c c c X ZU Y U must have an ancestor labeled b !

55
Proof idea for {child +,child*} X:a + Y:bZ:c * U:c * * a b c c c X ZUZU Y ZUZUZU

56
Proof idea for {child +,child*} X:a + Y:bZ:c * U:c * * a b c c c X Z Y ZZU Z must have U as “descendant-or-self”

57
Proof idea for {child +,child*} X:a + Y:bZ:c * U:c * * a b c c c X Z Y ZZU

58
Proof idea for {child +,child*} Lemma: T | Q iff Reduct(Q,T) well-labeled X:a + Y:bZ:c * U:c * * a b c c c X Y ZU Reduct(Q,T) Locally arc-consistent! =

59
Proof idea for {child +,child*} Lemma: T | Q iff Reduct(Q,T) well-labeled X:a + Y:bZ:c * U:c * * a b c c c X Y ZU Reduct(Q,T) Locally arc-consistent! = morphism

60
Web wrapping Goal: Make web contents accessible to electronic data processing WEB HTML pages layout Corporate edp apps structured data, Databases, XML

61
Web wrapping WEB HTML pages layout Corporate edp apps structured data, Databases, XML WRAPPER Wrappers: select, extract, annotate Monadic deatalog ideally suited, but … whowannadoit? LiXto : a graphical wrapper generator for ELOG Goal: Make web contents accessible to electronic data processing

62
Degrees - Notebook - New 2.99 $ Notebook - Compaq Presario AU $ [...]

63
Web Extraction- Program ELOG Extraction Module XML Further processing: tracking changes, delivering ( ,sms)... (Infopipesystem) similarly structured pages Lixto Architecture Visual Wrapper Generator Example page(s)

64
Elog Program for eBay pages

65
Expressive power of LiXto ELOG - expresses monadic datalog Theorems [G., Koch PODS2002] All of ELOG - is graphically programmable via LiXto Elog - : Monadic kernel of Elog LiXto expresses all MSO wrapping tasks. Corollary:

66
Comparison to other Wrapper Generators Lixto more powerful than regular path queries Lixto more powerful than HEL (Sahuguet, Azavant) paper

67
Automated navigation to target pages Automated data extraction from target pages Automated data analysis, transformation & integration Automated data personalization Automated data delivery The Lixto Suite Visual Wrapper Transformation Server

68
Product Architecture LiXto Extraction Engine Transformation Server

69
Oracle 9 Marketing Department BI Tool Business Objects report Marketing & Business Intelligence

70
Major Customers of LiXto:

71
Oracle 9 Marketing Department BI Tool Business Objects report Marketing & Business Intelligence

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google