Download presentation
Presentation is loading. Please wait.
Published byMiguel Lynch Modified over 10 years ago
1
Representing and Querying XML with Incomplete Information Serge Abiteboul INRIA Luc Segoufin INRIA Victor Vianu UCSD
2
pods 2001Abiteboul-Segoufin-Vianu2 Organization Motivations Simplifying assumptions Model of incompleteness Answering queries Results Discussion Conclusion
3
Motivations
4
pods 2001Abiteboul-Segoufin-Vianu4 The Web is a world of incompleteness Information you get from the web is seldom complete: Queries return you some - not all - data Limited storage capability Documents change on the Web: expiration Sites are unavailable… Context: A warehouse of XML documents from the Web, Xyleme
5
pods 2001Abiteboul-Segoufin-Vianu5 This work This work: simple, practically appealing approach to managing incomplete information Sequence of queries to the web (q1,A1)+(q2,A2)+… Answers are cached Process a new query without access to the web Give an incomplete answer Explain incompleteness to user Seek additional information, i.e., find minimal set of queries to fully answer
6
pods 2001Abiteboul-Segoufin-Vianu6 Related works Semantic caching Answering queries using views keep (Q i,A i ) try to rewrite query Q into Q(A 1,...,A n ) reject if you cannot Incomplete database (Q i,A i ) is some incomplete knowledge of DB Related to querying incomplete information – e.g. Lipski-Imielinski
7
pods 2001Abiteboul-Segoufin-Vianu7 Challenge: balance expressiveness and tractability Choice of data model Choice of the query language Choice of a representation of incompleteness Results Simple, practical solution Extra features lead to serious problems
8
Simplifying Assumptions
9
pods 2001Abiteboul-Segoufin-Vianu9 Data is XML: trees dealer UsedCarsNewCars ad modelyear model Honda 96 Acura Honda 96Acura
10
pods 2001Abiteboul-Segoufin-Vianu10 Simplified XML =can =444 =electronique =nik =234 =electronic =camera=camera =c.jpg value function unordered trees name price cat picture catalogproduct subcategory product name price category subcategory labelling function
11
pods 2001Abiteboul-Segoufin-Vianu11 Simple XML types catalog product name price cat picture subcategory * * 1 : 1 child (default) * : 0 or more + : 1 or more ? : 0 or 1
12
pods 2001Abiteboul-Segoufin-Vianu12 Prefix Selection Queries (ps-queries) catalog product name price cat=elec subcategory <200 Query1 catalog product name Query2 picture
13
pods 2001Abiteboul-Segoufin-Vianu13 Simplifications Data No order No distinction attribute/element No recursion No links Query No complex path expressions No join No repeated child product name cat=elec cat=toy NO
14
pods 2001Abiteboul-Segoufin-Vianu14 Crucial assumption: XIDprod canon 120 elec camera &245prod&245c.jpg + c.jpgprod &245camera = URLs URLs ID/IDrefs ID/IDrefs
15
Representation of incomplete information: Incomplete trees
16
pods 2001Abiteboul-Segoufin-Vianu16 Document Type Definition (DTD) are used to represent incompleteness Set of rules: e r e element name r regular expression Set of trees satisfying a DTD d: tree(d) Shortcoming of DTDs An element has a single definition independently of the context Type of ad depends on the context dealer newxarusedcar adad modelyearmodel
17
pods 2001Abiteboul-Segoufin-Vianu17 Solution: specialization (decoupled tags) ad used and ad new h(ad used )=h(ad new )=ad dealer newxarusedcar ad ad new ad ad used modelyearmodel dealer newxarusedcar adad modelyearmodel h
18
pods 2001Abiteboul-Segoufin-Vianu18 DTDs + Specialization The sets of trees that can be specified: the regular unranked tree languages [BruggemanKlein+Murata+Wood] Same closure properties: intersection, union, complement Same complexity
19
pods 2001Abiteboul-Segoufin-Vianu19 Example Q1: name, subcat, price of electronic products with price less than $200 Q2: name, pictures of cameras at least pictured once ---------------------------- Q3: name, price, pictures of cameras costing less than $100 and at least pictured once can be completely answered using A1, A2 Q4: list all cameras can be partially answered using A1, A2
20
pods 2001Abiteboul-Segoufin-Vianu20 catalog cdplayer product canon 120 elec camera product nikon 199 elec camera product sony 175 elec product1product2 * * Q1: name, subcat, price of electronic products with price less than 200 missing
21
pods 2001Abiteboul-Segoufin-Vianu21 Missing data after Q1 product1 name price cat picture subcategory * product2 subcategory * !=elec =elec >200
22
pods 2001Abiteboul-Segoufin-Vianu22 catalog product canon 120 elec camera product nikon 199 elec camera product sony 175 elec cdplayer product2a Q2: name, pictures of cameras at least pictured once product1 missing product2c product2** product2b * c.jpg akai a.jpg elec camera 33
23
pods 2001Abiteboul-Segoufin-Vianu23 Incomplete information Known information Prefix of the real data tree Missing information Extended tree type Conditions on data values Specializations, disjunctions
24
pods 2001Abiteboul-Segoufin-Vianu24 product1 name price cat picture subcategory * !=elec product2a name price cat picture subcategory =elec >200 name price cat product3 elec elec product2b name price cat picture * =elec >200 product2c name price cat subcategory =elec >200 subcategory !=camera subcategory !=camera no picture product + Known data Missing data
25
Answering Queries
26
pods 2001Abiteboul-Segoufin-Vianu26 Complete answer to Q3 Q3: name, price, pictures of cameras costing less than $150 and having at least one pictureQ3: name, price, pictures of cameras costing less than $150 and having at least one picture Can be fully answered using available informationCan be fully answered using available information Need to check whether answer is completeNeed to check whether answer is complete catalog prod canon 120 c.jpg
27
pods 2001Abiteboul-Segoufin-Vianu27 Incomplete answer to Q4 Provide known cameras Explain incompleteness canonnikonsony akai more products nameprice>200and no picture
28
pods 2001Abiteboul-Segoufin-Vianu28 Completing answer to Q4 It suffices to ask: product name price cat sub=camera =elec >200 picture 0
29
pods 2001Abiteboul-Segoufin-Vianu29 Revisit the types DTD Conditions Specialization: same element name may have several types Not sufficient Need to extend again the types: disjunctions product2b * =elec >200 subcategory !=camera name price cat picture
30
pods 2001Abiteboul-Segoufin-Vianu30 Disjunction ? ? vehicle data engine description sail vehicle data description vehicle data engine sailQuery1Query2vehicle data=…. description=…. Empty! &322
31
pods 2001Abiteboul-Segoufin-Vianu31 Disjunction continued Type of &322 vehicle1 + vehicle2 vehicle2 data description sail vehicle1 data engine description The type of &322 can not be described independently of that of data below
32
Results
33
pods 2001Abiteboul-Segoufin-Vianu33 Representation System: Lipskis+Imielinskis rep rep(T) Set of possible worlds q(rep(T))=rep(q(T)) q answers TRepresentation of information q(T) rep q Representation of result
34
pods 2001Abiteboul-Segoufin-Vianu34 Representation System for PS-queries Incomplete tree T to represent q 1 -1 (A 1 ) … q k -1 (A k ) PS-query q q(T) can be computed in ptime (representation of the answer can be computed in ptime)
35
pods 2001Abiteboul-Segoufin-Vianu35 Querying Incomplete Trees Given T and a query q, one can Give in ptime the sure answers up to our current knowledge Check in ptime whether query q can be fully anwered Generate in ptime queries to complete answer
36
pods 2001Abiteboul-Segoufin-Vianu36 Comparison with IL Relational model Relational calculus/algebra Conditional table Closed or open world Representation system XML tree model Weaker language (no join) Weaker system (no variable) + Closed and open World Representation system
37
pods 2001Abiteboul-Segoufin-Vianu37 Drawback: exponential blowup Incomplete information may become exponential w.r.t the sequence of query/answer q 1 /A 1 ;q 2 /A 2 … 11 qi:qi:qi:qi: Answers are empty database a=i b=i database a b Type:
38
pods 2001Abiteboul-Segoufin-Vianu38 Dealing with exponential blowup Make the representation more complex using disjunctions of types Size of representation stays polynomial Manipulations much more complex Restrict tree types and PS-queries Already very/too? simple Accept to loose some information Ask extra queries to simplify representation
39
Discussion
40
pods 2001Abiteboul-Segoufin-Vianu40 Discussion: extend language Some results in paper Extensions often lead to intractability E.G. : K-pebble transducers [Milo,Suciu,Vianu] that somehow subsume XML-QL and XSL No (known) representation system Testing rep(T) is empty is non-elementary
41
pods 2001Abiteboul-Segoufin-Vianu41 Discussion : node Ids Without node Ids much less information to integrate results more complex tedious case analysis
42
pods 2001Abiteboul-Segoufin-Vianu42 Discussion: ordering Ordering in XML, DTD, queries Problem is totally different and very complex Example: Q1/A1: list of males; Q2/A2: list of females; Q3: list all Depending on the type of input (Male)*(Female)* A3= A1 || A2 (Male Female)* A3= shuffle(A1,A2) (Male + Female)* we cannot answer A3 Regular expression processing
43
pods 2001Abiteboul-Segoufin-Vianu43 Conclusion Framework for acquiring, maintaining, querying incomplete XML data Limitations: simple queries no order and Id assumption small extensions lead to problems Possible to represent the incompleteness Possible to answer with incompleteness Possible to obtain queries to provide full answer
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.