Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of.

Similar presentations


Presentation on theme: "Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of."— Presentation transcript:

1 Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of Thessaloniki

2 Abstract  X-DEVICE is a deductive OODB system  It is used for storing XML documents as objects  X-DEVICE has a powerful rule-based query language for  intelligently querying stored XML documents  publishing the results  The rule language features:  second-order syntax  generalized path and ordering expressions  Metadata are used to translate the extended features into first-order rules

3 Object Model of XML Data  DTD definitions are automatically translated into a class schema  XML documents are automatically translated into objects  Generated classes and objects are stored within the underlying OODB ADAM  ADAM is an OODB built on Prolog (Norman Paton, Peter M.D. Gray, Univ. of Aberdeen)

4 Object Model of XML Data W3C XQuery: TEXT Use Case company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+

5 Object Model of XML Data Alternation content content_alt1+ par … figure … content_alt1 par figure

6 Deductive XML Query Language  The X-DEVICE language is an extension of DEVICE, the basic deductive rule language  N. Bassiliades, I. Vlahavas, A.K. Elmagarmid, E-DEVICE: An extensible active knowledge base system with multiple rule type support, IEEE TKDE, 12(5), 824-844, 2000.  X-DEVICE rules are pre-compiled into DEVICE deductive rules  Deductive rules are compiled into production rules  ECA rules with one complex event  Matching through RETE network

7 X-DEVICE Language Basic first-order deductive rules if C@company(name=‘XYZ Ltd’, partner.partners  P) then partner_of_xyz(partner:P)  Selects company C with name ‘XYZ Ltd’  Iterates over partners P through navigation  Path inverse notation: NOT partners.partner  Defines a new derived class of partners of company XYZ  Derived objects are materialized

8 X-DEVICE Language Recursion if P@partner_of_xyz(partner:P1)and C@company(name=P1, partner.partners  P2) then partner_of_xyz(partner:P2)  Rule processing uses semi-naïve evaluation  Negation is allowed (safety, stratification)  Single-valued attributes use : for instantiation  Multi-valued attributes use  for instantiation  Prolog lists guarantee correct ordering

9 X-DEVICE Language Variable-Attribute Expressions if C@company(A $ ‘XYZ’) then a_xyz_comp(company:list(C))  We don’t know which attribute of company contains the string ‘XYZ’  A is second-order variable (meta-variable)  list is an aggregation function (collects company OIDs in a multi-valued attribute)  The $ operator performs string search

10 X-DEVICE Language Translation of Variable-Attributes if company@xml_seq(elem_order  A) then new_rule(‘ if C@company(A $ ‘XYZ’) then a_xyz_comp(company:list(C)) ’) => deductive_rule  Iterate over meta-class xml_seq to find all attributes (sub-elements) of class company  A production rule creates one deductive rule for each instantiation of A  A is now a first-order variable in the condition and a constant in the action

11 X-DEVICE Language Generalized Path Expressions if C@company(* $ ‘XYZ’) then a_xyz_comp(company:list(C))  The search for string ‘XYZ’ must be performed  not only to attributes of company  but also to attributes of objects contained within company  at all levels of nesting

12 X-DEVICE Language Translation of Generalized Paths  Iterate over all immediate elements of class company  Store them into an auxiliary derived class if company@xml_seq(elem_order  X 1 ) then tmp_elem 1 (cnd_elem:X 1, path:[X 1 ]) company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+

13 X-DEVICE Language Translation of Generalized Paths  Recursively iterate over all elements and sub- elements stored in the auxiliary class  The path-so-far from the root company element is accumulated if X 1 @tmp_elem 1 (cnd_elem:X 2,path:X 3 ) and X 2 @xml_seq(elem_order  X 4 ) then tmp_elem 1 (cnd_elem:X 4, path:[X 4 |X 3 ]) company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+

14 X-DEVICE Language Translation of Generalized Paths  Terminate the recursion if no more nested elements can be found  Create one deductive rule for each “discovered” concrete path if X 1 @tmp_elem 1 (cnd_elem:X 2,path:X 3 ) and not X 2 @xml_seq and prolog{create_path(X 3,PATH)} then new_rule(‘ if C@company(PATH $ ‘XYZ’) then a_xyz_comp(company:list(C)) ') => deductive_rule

15 X-DEVICE Language Translation of Generalized Paths  The following deductive rules are created C@company(name $ ‘XYZ’) C@company(ticker_symbol $ ‘XYZ’) C@company(description $ ‘XYZ’) C@company(business_code $ ‘XYZ’) C@company(partner.partners $ ‘XYZ’) C@company(competitor.competitors $ ‘XYZ’)  Optimization of multiple rules is achieved through common parts of the RETE network  The DEVICE system takes care of that

16 X-DEVICE Language Ordering Expressions  W3C TEXT Case – Query 5  For each news item that is relevant to the “Gorilla Corp”, create an “item summary” element.  The content of the item summary is the content of the title, date, and first paragraph of the news item if N@news_item(*.content$‘Gorilla Corp’, par.content  1 PAR, title:T, date:D) then item_summary(title:T,date:D, par:PAR)

17 X-DEVICE Language Translation of Ordering  Collect all the paragraphs that satisfy the condition  Store them in a list of an auxiliary derived class if N@news_item(*.content$‘Gorilla Corp’, par.content  X 1, title:T, date:D) then tmp_elem 1 (tmp_var 1 :T, tmp_var 2 :D, tmp_obj:list(X 1 ))

18 X-DEVICE Language Translation of Ordering  Isolate a sub-list of all the paragraphs that satisfy the ordering expression  1  There is one Prolog goal for each ordering expression if X 3 @tmp_elem 1 (tmp_var 1 :T,tmp_var 2 :D, tmp_obj:X 1 ) and prolog{length(X 2,1),append(X 2,_,X 1 )} then tmp_elem 2 (tmp_var 1 :T,tmp_var 2 :D, tmp_obj:X 2 )

19 X-DEVICE Language Translation of Ordering  Iterate over all qualifying results and return them into the target element if X 1 @tmp_elem 2 (tmp_var 1 :T,tmp_var 2 :D, tmp_obj  PAR) then item_summary(title:T,date:D, par:PAR)

20 X-DEVICE Language Building Result Documents  The top-level element of the XML result document is identified with the keyword xml_result  The DTD of the result document is identified through object references  W3C TEXT Case – Query 2  Find news items where the “Foo Corp” company and one or more of its partners are mentioned in the same paragraph and/or title  List each news item by its title and date

21 X-DEVICE Language Building Result Documents  Find the “Foo” company and iterate over its partners  For each partner, iterate over news items and search for “Foo” and its partner inside the title of the same news item if C@company(name=‘Foo Corp’, partner.partners  P) and N@news_item(title:T$‘Foo Corp’ & $ P, date:D) then xml_result(news_item1(title:T, date:D))

22 X-DEVICE Language Building Result Documents  Find the “Foo” company and iterate over its partners  For each partner, iterate over news items and search for “Foo” and its partner inside the nested paragraphs of the same item if C@company(name=‘Foo Corp’, partner.partners  P) and N@news_item(*.par.content$‘Foo Corp’ & $ P, title:T, date:D) then news_item1(title:T,date:D)

23 X-DEVICE Language Building Result Documents <!DOCTYPE news_item1 [ <!ELEMENT news_item1 (title, date)> ]>  The structure of the title and date elements is automatically determined by the type of the corresponding rule variables

24 Advantages of X-DEVICE  Logic-based query languages have  well-understood mathematical properties  declarative nature  advanced optimization techniques (magic-sets)  X-DEVICE compared to XQuery (functional)  more high-level, declarative syntax  more compact and comprehensible  general path expressions  due to fixpoint semantics and second-order variables

25 Advantages of X-DEVICE  Users can express complex XML document views  Information customization for e-commerce, e- learning, etc.  X-DEVICE offers multiple knowledge representation formalisms  Deductive, Production, and Active rules  Structured objects  Production and Active rules can be used to update XML documents  All the above can play an important role as an infrastructure for the Semantic Web

26 Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of Thessaloniki X-DEVICE site www.csd.auth.gr/~lpis/systems/ x-device.html


Download ppt "Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of."

Similar presentations


Ads by Google