Presentation is loading. Please wait.

Presentation is loading. Please wait.

Representing and Querying XML with Incomplete Information Serge Abiteboul INRIA Luc Segoufin INRIA Victor Vianu UCSD.

Similar presentations


Presentation on theme: "Representing and Querying XML with Incomplete Information Serge Abiteboul INRIA Luc Segoufin INRIA Victor Vianu UCSD."— Presentation transcript:

1 Representing and Querying XML with Incomplete Information Serge Abiteboul INRIA Luc Segoufin INRIA Victor Vianu UCSD

2 pods 2001Abiteboul-Segoufin-Vianu2 Organization Motivations Simplifying assumptions Model of incompleteness Answering queries Results Discussion Conclusion

3 Motivations

4 pods 2001Abiteboul-Segoufin-Vianu4 The Web is a world of incompleteness Information you get from the web is seldom complete: Queries return you some - not all - data Limited storage capability Documents change on the Web: expiration Sites are unavailable… Context: A warehouse of XML documents from the Web, Xyleme

5 pods 2001Abiteboul-Segoufin-Vianu5 This work This work: simple, practically appealing approach to managing incomplete information Sequence of queries to the web (q1,A1)+(q2,A2)+… Answers are cached Process a new query without access to the web Give an incomplete answer Explain incompleteness to user Seek additional information, i.e., find minimal set of queries to fully answer

6 pods 2001Abiteboul-Segoufin-Vianu6 Related works Semantic caching Answering queries using views keep (Q i,A i ) try to rewrite query Q into Q(A 1,...,A n ) reject if you cannot Incomplete database (Q i,A i ) is some incomplete knowledge of DB Related to querying incomplete information – e.g. Lipski-Imielinski

7 pods 2001Abiteboul-Segoufin-Vianu7 Challenge: balance expressiveness and tractability Choice of data model Choice of the query language Choice of a representation of incompleteness Results Simple, practical solution Extra features lead to serious problems

8 Simplifying Assumptions

9 pods 2001Abiteboul-Segoufin-Vianu9 Data is XML: trees dealer UsedCarsNewCars ad modelyear model Honda 96 Acura Honda 96Acura

10 pods 2001Abiteboul-Segoufin-Vianu10 Simplified XML =can =444 =electronique =nik =234 =electronic =camera=camera =c.jpg value function unordered trees name price cat picture catalogproduct subcategory product name price category subcategory labelling function

11 pods 2001Abiteboul-Segoufin-Vianu11 Simple XML types catalog product name price cat picture subcategory * * 1 : 1 child (default) * : 0 or more + : 1 or more ? : 0 or 1

12 pods 2001Abiteboul-Segoufin-Vianu12 Prefix Selection Queries (ps-queries) catalog product name price cat=elec subcategory <200 Query1 catalog product name Query2 picture

13 pods 2001Abiteboul-Segoufin-Vianu13 Simplifications Data No order No distinction attribute/element No recursion No links Query No complex path expressions No join No repeated child product name cat=elec cat=toy NO

14 pods 2001Abiteboul-Segoufin-Vianu14 Crucial assumption: XIDprod canon 120 elec camera &245prod&245c.jpg + c.jpgprod &245camera = URLs URLs ID/IDrefs ID/IDrefs

15 Representation of incomplete information: Incomplete trees

16 pods 2001Abiteboul-Segoufin-Vianu16 Document Type Definition (DTD) are used to represent incompleteness Set of rules: e r e element name r regular expression Set of trees satisfying a DTD d: tree(d) Shortcoming of DTDs An element has a single definition independently of the context Type of ad depends on the context dealer newxarusedcar adad modelyearmodel

17 pods 2001Abiteboul-Segoufin-Vianu17 Solution: specialization (decoupled tags) ad used and ad new h(ad used )=h(ad new )=ad dealer newxarusedcar ad ad new ad ad used modelyearmodel dealer newxarusedcar adad modelyearmodel h

18 pods 2001Abiteboul-Segoufin-Vianu18 DTDs + Specialization The sets of trees that can be specified: the regular unranked tree languages [BruggemanKlein+Murata+Wood] Same closure properties: intersection, union, complement Same complexity

19 pods 2001Abiteboul-Segoufin-Vianu19 Example Q1: name, subcat, price of electronic products with price less than $200 Q2: name, pictures of cameras at least pictured once ---------------------------- Q3: name, price, pictures of cameras costing less than $100 and at least pictured once can be completely answered using A1, A2 Q4: list all cameras can be partially answered using A1, A2

20 pods 2001Abiteboul-Segoufin-Vianu20 catalog cdplayer product canon 120 elec camera product nikon 199 elec camera product sony 175 elec product1product2 * * Q1: name, subcat, price of electronic products with price less than 200 missing

21 pods 2001Abiteboul-Segoufin-Vianu21 Missing data after Q1 product1 name price cat picture subcategory * product2 subcategory * !=elec =elec >200

22 pods 2001Abiteboul-Segoufin-Vianu22 catalog product canon 120 elec camera product nikon 199 elec camera product sony 175 elec cdplayer product2a Q2: name, pictures of cameras at least pictured once product1 missing product2c product2** product2b * c.jpg akai a.jpg elec camera 33

23 pods 2001Abiteboul-Segoufin-Vianu23 Incomplete information Known information Prefix of the real data tree Missing information Extended tree type Conditions on data values Specializations, disjunctions

24 pods 2001Abiteboul-Segoufin-Vianu24 product1 name price cat picture subcategory * !=elec product2a name price cat picture subcategory =elec >200 name price cat product3 elec elec product2b name price cat picture * =elec >200 product2c name price cat subcategory =elec >200 subcategory !=camera subcategory !=camera no picture product + Known data Missing data

25 Answering Queries

26 pods 2001Abiteboul-Segoufin-Vianu26 Complete answer to Q3 Q3: name, price, pictures of cameras costing less than $150 and having at least one pictureQ3: name, price, pictures of cameras costing less than $150 and having at least one picture Can be fully answered using available informationCan be fully answered using available information Need to check whether answer is completeNeed to check whether answer is complete catalog prod canon 120 c.jpg

27 pods 2001Abiteboul-Segoufin-Vianu27 Incomplete answer to Q4 Provide known cameras Explain incompleteness canonnikonsony akai more products nameprice>200and no picture

28 pods 2001Abiteboul-Segoufin-Vianu28 Completing answer to Q4 It suffices to ask: product name price cat sub=camera =elec >200 picture 0

29 pods 2001Abiteboul-Segoufin-Vianu29 Revisit the types DTD Conditions Specialization: same element name may have several types Not sufficient Need to extend again the types: disjunctions product2b * =elec >200 subcategory !=camera name price cat picture

30 pods 2001Abiteboul-Segoufin-Vianu30 Disjunction ? ? vehicle data engine description sail vehicle data description vehicle data engine sailQuery1Query2vehicle data=…. description=…. Empty! &322

31 pods 2001Abiteboul-Segoufin-Vianu31 Disjunction continued Type of &322 vehicle1 + vehicle2 vehicle2 data description sail vehicle1 data engine description The type of &322 can not be described independently of that of data below

32 Results

33 pods 2001Abiteboul-Segoufin-Vianu33 Representation System: Lipskis+Imielinskis rep rep(T) Set of possible worlds q(rep(T))=rep(q(T)) q answers TRepresentation of information q(T) rep q Representation of result

34 pods 2001Abiteboul-Segoufin-Vianu34 Representation System for PS-queries Incomplete tree T to represent q 1 -1 (A 1 ) … q k -1 (A k ) PS-query q q(T) can be computed in ptime (representation of the answer can be computed in ptime)

35 pods 2001Abiteboul-Segoufin-Vianu35 Querying Incomplete Trees Given T and a query q, one can Give in ptime the sure answers up to our current knowledge Check in ptime whether query q can be fully anwered Generate in ptime queries to complete answer

36 pods 2001Abiteboul-Segoufin-Vianu36 Comparison with IL Relational model Relational calculus/algebra Conditional table Closed or open world Representation system XML tree model Weaker language (no join) Weaker system (no variable) + Closed and open World Representation system

37 pods 2001Abiteboul-Segoufin-Vianu37 Drawback: exponential blowup Incomplete information may become exponential w.r.t the sequence of query/answer q 1 /A 1 ;q 2 /A 2 … 11 qi:qi:qi:qi: Answers are empty database a=i b=i database a b Type:

38 pods 2001Abiteboul-Segoufin-Vianu38 Dealing with exponential blowup Make the representation more complex using disjunctions of types Size of representation stays polynomial Manipulations much more complex Restrict tree types and PS-queries Already very/too? simple Accept to loose some information Ask extra queries to simplify representation

39 Discussion

40 pods 2001Abiteboul-Segoufin-Vianu40 Discussion: extend language Some results in paper Extensions often lead to intractability E.G. : K-pebble transducers [Milo,Suciu,Vianu] that somehow subsume XML-QL and XSL No (known) representation system Testing rep(T) is empty is non-elementary

41 pods 2001Abiteboul-Segoufin-Vianu41 Discussion : node Ids Without node Ids much less information to integrate results more complex tedious case analysis

42 pods 2001Abiteboul-Segoufin-Vianu42 Discussion: ordering Ordering in XML, DTD, queries Problem is totally different and very complex Example: Q1/A1: list of males; Q2/A2: list of females; Q3: list all Depending on the type of input (Male)*(Female)* A3= A1 || A2 (Male Female)* A3= shuffle(A1,A2) (Male + Female)* we cannot answer A3 Regular expression processing

43 pods 2001Abiteboul-Segoufin-Vianu43 Conclusion Framework for acquiring, maintaining, querying incomplete XML data Limitations: simple queries no order and Id assumption small extensions lead to problems Possible to represent the incompleteness Possible to answer with incompleteness Possible to obtain queries to provide full answer


Download ppt "Representing and Querying XML with Incomplete Information Serge Abiteboul INRIA Luc Segoufin INRIA Victor Vianu UCSD."

Similar presentations


Ads by Google