Download presentation
Presentation is loading. Please wait.
Published byKaren Figgs Modified over 9 years ago
1
Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of Thessaloniki
2
Abstract X-DEVICE is a deductive OODB system It is used for storing XML documents as objects X-DEVICE has a powerful rule-based query language for intelligently querying stored XML documents publishing the results The rule language features: second-order syntax generalized path and ordering expressions Metadata are used to translate the extended features into first-order rules
3
Object Model of XML Data DTD definitions are automatically translated into a class schema XML documents are automatically translated into objects Generated classes and objects are stored within the underlying OODB ADAM ADAM is an OODB built on Prolog (Norman Paton, Peter M.D. Gray, Univ. of Aberdeen)
4
Object Model of XML Data W3C XQuery: TEXT Use Case company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+
5
Object Model of XML Data Alternation content content_alt1+ par … figure … content_alt1 par figure
6
Deductive XML Query Language The X-DEVICE language is an extension of DEVICE, the basic deductive rule language N. Bassiliades, I. Vlahavas, A.K. Elmagarmid, E-DEVICE: An extensible active knowledge base system with multiple rule type support, IEEE TKDE, 12(5), 824-844, 2000. X-DEVICE rules are pre-compiled into DEVICE deductive rules Deductive rules are compiled into production rules ECA rules with one complex event Matching through RETE network
7
X-DEVICE Language Basic first-order deductive rules if C@company(name=‘XYZ Ltd’, partner.partners P) then partner_of_xyz(partner:P) Selects company C with name ‘XYZ Ltd’ Iterates over partners P through navigation Path inverse notation: NOT partners.partner Defines a new derived class of partners of company XYZ Derived objects are materialized
8
X-DEVICE Language Recursion if P@partner_of_xyz(partner:P1)and C@company(name=P1, partner.partners P2) then partner_of_xyz(partner:P2) Rule processing uses semi-naïve evaluation Negation is allowed (safety, stratification) Single-valued attributes use : for instantiation Multi-valued attributes use for instantiation Prolog lists guarantee correct ordering
9
X-DEVICE Language Variable-Attribute Expressions if C@company(A $ ‘XYZ’) then a_xyz_comp(company:list(C)) We don’t know which attribute of company contains the string ‘XYZ’ A is second-order variable (meta-variable) list is an aggregation function (collects company OIDs in a multi-valued attribute) The $ operator performs string search
10
X-DEVICE Language Translation of Variable-Attributes if company@xml_seq(elem_order A) then new_rule(‘ if C@company(A $ ‘XYZ’) then a_xyz_comp(company:list(C)) ’) => deductive_rule Iterate over meta-class xml_seq to find all attributes (sub-elements) of class company A production rule creates one deductive rule for each instantiation of A A is now a first-order variable in the condition and a constant in the action
11
X-DEVICE Language Generalized Path Expressions if C@company(* $ ‘XYZ’) then a_xyz_comp(company:list(C)) The search for string ‘XYZ’ must be performed not only to attributes of company but also to attributes of objects contained within company at all levels of nesting
12
X-DEVICE Language Translation of Generalized Paths Iterate over all immediate elements of class company Store them into an auxiliary derived class if company@xml_seq(elem_order X 1 ) then tmp_elem 1 (cnd_elem:X 1, path:[X 1 ]) company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+
13
X-DEVICE Language Translation of Generalized Paths Recursively iterate over all elements and sub- elements stored in the auxiliary class The path-so-far from the root company element is accumulated if X 1 @tmp_elem 1 (cnd_elem:X 2,path:X 3 ) and X 2 @xml_seq(elem_order X 4 ) then tmp_elem 1 (cnd_elem:X 4, path:[X 4 |X 3 ]) company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+
14
X-DEVICE Language Translation of Generalized Paths Terminate the recursion if no more nested elements can be found Create one deductive rule for each “discovered” concrete path if X 1 @tmp_elem 1 (cnd_elem:X 2,path:X 3 ) and not X 2 @xml_seq and prolog{create_path(X 3,PATH)} then new_rule(‘ if C@company(PATH $ ‘XYZ’) then a_xyz_comp(company:list(C)) ') => deductive_rule
15
X-DEVICE Language Translation of Generalized Paths The following deductive rules are created C@company(name $ ‘XYZ’) C@company(ticker_symbol $ ‘XYZ’) C@company(description $ ‘XYZ’) C@company(business_code $ ‘XYZ’) C@company(partner.partners $ ‘XYZ’) C@company(competitor.competitors $ ‘XYZ’) Optimization of multiple rules is achieved through common parts of the RETE network The DEVICE system takes care of that
16
X-DEVICE Language Ordering Expressions W3C TEXT Case – Query 5 For each news item that is relevant to the “Gorilla Corp”, create an “item summary” element. The content of the item summary is the content of the title, date, and first paragraph of the news item if N@news_item(*.content$‘Gorilla Corp’, par.content 1 PAR, title:T, date:D) then item_summary(title:T,date:D, par:PAR)
17
X-DEVICE Language Translation of Ordering Collect all the paragraphs that satisfy the condition Store them in a list of an auxiliary derived class if N@news_item(*.content$‘Gorilla Corp’, par.content X 1, title:T, date:D) then tmp_elem 1 (tmp_var 1 :T, tmp_var 2 :D, tmp_obj:list(X 1 ))
18
X-DEVICE Language Translation of Ordering Isolate a sub-list of all the paragraphs that satisfy the ordering expression 1 There is one Prolog goal for each ordering expression if X 3 @tmp_elem 1 (tmp_var 1 :T,tmp_var 2 :D, tmp_obj:X 1 ) and prolog{length(X 2,1),append(X 2,_,X 1 )} then tmp_elem 2 (tmp_var 1 :T,tmp_var 2 :D, tmp_obj:X 2 )
19
X-DEVICE Language Translation of Ordering Iterate over all qualifying results and return them into the target element if X 1 @tmp_elem 2 (tmp_var 1 :T,tmp_var 2 :D, tmp_obj PAR) then item_summary(title:T,date:D, par:PAR)
20
X-DEVICE Language Building Result Documents The top-level element of the XML result document is identified with the keyword xml_result The DTD of the result document is identified through object references W3C TEXT Case – Query 2 Find news items where the “Foo Corp” company and one or more of its partners are mentioned in the same paragraph and/or title List each news item by its title and date
21
X-DEVICE Language Building Result Documents Find the “Foo” company and iterate over its partners For each partner, iterate over news items and search for “Foo” and its partner inside the title of the same news item if C@company(name=‘Foo Corp’, partner.partners P) and N@news_item(title:T$‘Foo Corp’ & $ P, date:D) then xml_result(news_item1(title:T, date:D))
22
X-DEVICE Language Building Result Documents Find the “Foo” company and iterate over its partners For each partner, iterate over news items and search for “Foo” and its partner inside the nested paragraphs of the same item if C@company(name=‘Foo Corp’, partner.partners P) and N@news_item(*.par.content$‘Foo Corp’ & $ P, title:T, date:D) then news_item1(title:T,date:D)
23
X-DEVICE Language Building Result Documents <!DOCTYPE news_item1 [ <!ELEMENT news_item1 (title, date)> ]> The structure of the title and date elements is automatically determined by the type of the corresponding rule variables
24
Advantages of X-DEVICE Logic-based query languages have well-understood mathematical properties declarative nature advanced optimization techniques (magic-sets) X-DEVICE compared to XQuery (functional) more high-level, declarative syntax more compact and comprehensible general path expressions due to fixpoint semantics and second-order variables
25
Advantages of X-DEVICE Users can express complex XML document views Information customization for e-commerce, e- learning, etc. X-DEVICE offers multiple knowledge representation formalisms Deductive, Production, and Active rules Structured objects Production and Active rules can be used to update XML documents All the above can play an important role as an infrastructure for the Semantic Web
26
Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of Thessaloniki X-DEVICE site www.csd.auth.gr/~lpis/systems/ x-device.html
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.