Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managing XML and Semistructured Data

Similar presentations


Presentation on theme: "Managing XML and Semistructured Data"— Presentation transcript:

1 Managing XML and Semistructured Data
Lecture 4: Path Expressions Prof. Dan Suciu Spring 2001

2 In this lecture Path expressions Regular path expressions
Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1

3 Path Expressions Examples: Bib.paper Bib.book.publisher
Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects

4 Path Expressions Examples: DB = Bib.paper={&o12,&o29}
&96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” 122 133 paper book references author title year http publisher page firstname lastname first last Bib &o44 &o45 &o46 &o47 &o48 &o49 &o50 &o51 &o52 Examples: DB = Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}

5 Answer of a Path Expression
Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): PTIME complexity Answer(P, DB) = f(P, root(DB)) Where: f(e, x) = {x} f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)}

6 Regular Path Expressions
R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: Bib.(paper|book).author Bib.book.author.lastname? Bib.book.(references)*.author Bib.(_)*.zip

7 Applications of Regular Path Expressions
Navigating uncertain structure: Bib.book.author.lastname? Syntactic substitute for inheritance: Bib.(paper|book).author Better: Bib.publication.author, but we don’t have inheritance

8 Applications of Regular Path Expressions
Computing transitive closure: Bib.(_)*.zip = everything accessible Bib.book.(references)*.author = everything accessible via references Some regular expressions of doubtful practical use: (references.references)* = a path with an even number of references (_._)* = paths of even length (_._._.(_)?)* = paths of length (3m + 4n) for some m,n But make great examples for illustration 

9 Answer of a Regular Path Expression
Recall: Lang(R) = the set of words P generated by R Answer of regular path expressions: Answer(R,DB) = {Answer(P,DB) | P  Lang(R)} Need an evaluation algorithm that copes with cycles

10 Regular Path Expressions
Recall: each regular expression  NDFA Example: R = (a.a)*.a.b A = a states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4} s1 s2 a a b s3 s4

11 Regular Path Expressions
Canonical Evaluation Algorithm Answer(R,DB): construct A from R construct product automaton G = A x DB: nodes(G) = states(A) x nodes(db) edges(G) = {((s,x),L,(s’,x’) | (s,L,s’)  edges(A), (x,L,x’)  edges(DB)} root(G) = (initial(A), root(DB)) compute Gacc = set of nodes accessible from root(G) return {x | s  terminal(A) s.t. (s,x)  Gacc}

12 Regular Path Expressions
Example: R = _.(_._)*.a A = DB = s1 s2 s3 _ a &o1 &o2 &o3 &o4 a b Answer of R on DB = { &o2, &o3}

13 Compute Product Automaton G
_ a _ s1,&o1 s2,&o1 s3,&o1 a a a s1,&o2 s2,&o2 s3,&o2 a a a a a a s1,&o3 s1,&o4 s2,&o3 s2,&o4 s3,&o3 s3,&o4 b b b

14 Compute Accessible Part Gacc
_ a _ s1,&o1 s2,&o1 s3,&o1 a a a s1,&o2 s2,&o2 s3,&o2 a a a a a a s1,&o3 s1,&o4 s2,&o3 s2,&o4 s3,&o3 s3,&o4 b b b Answer(R,DB) = {&o2, &o3}

15 Complexity of Regular Path Expressions
The evaluation algorithm runs in PTIME in size(R), size(DB) Even when there are cycles in DB


Download ppt "Managing XML and Semistructured Data"

Similar presentations


Ads by Google