Presentation is loading. Please wait.

Presentation is loading. Please wait.

TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,

Similar presentations


Presentation on theme: "TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,"— Presentation transcript:

1 TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland, College Park SIGMOD, Paris, France, June, 2004

2 Outline Introduction Ontologies and Integration Similarity Enhanced Ontology (SEO) TOSS Algebra Implementation and Experiments Related Work

3 Introduction [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001] one of the best algebra developed for XML DB

4 DBLP SIGMOD Problems!

5 Problems Lack of lexical semantics in answering queries Find papers written by “J. Ullman”: J.D. Ullman? Jeffrey Ullman? Find papers whose at least one author is from “U.S. government”: U.S. Census Bureau? U.S. Army? High precision, poor recall Quality = (recall  precision) 1/2

6 Our approach Goal: extend and enhance the semantics of TAX to return high quality answers using ontology and similarity measures 1. c apture inter-term lexical relationships by ontology and integrate ontologies of different DBs 2. use existing similarity measures to enhance the integrated ontology 3. TOSS: extend TAX algebra to query with ontology and similarity

7 Motivating Examples and TAX DBLP and SIGMOD bibliographies in XML TAX selection projection product

8 DBLP

9 Pattern tree Selection

10 Pattern tree Selection

11 Pattern tree Selection

12 Pattern tree Projection

13 Product The product of two instances (two sets of trees) contains, for each pair of trees (from the two instances), a tree whose root is a new node (called tax_prod_root). X tax_prod_root

14 DBLP SIGMOD Problems!

15 Architecture

16

17 Ontology a set S S = {article, author, title} a partially ordered set (S, ≤ S ) part_of relation ≤ S = {(author, article), (title, article), (article, article), (author, author), (article, article)} a hierarchy (H, ≤ H ) is Hasse diagram for (S, ≤ S ) a DAG with a minimal set of edges s.t. there’s a path from u to v iff u ≤ S v ≤ H = {(author, article), (title, article)}

18 Ontology Suppose Σ is some finite set of strings and S is some set. An ontology w.r.t. Σ is a partial mapping Θ from Σ to hierarchies for S Σ = {part_of} Θ(part_of) = (H, ≤ H )

19 Ontology Suppose Σ is some finite set of strings and S is some set. An ontology w.r.t. Σ is a partial mapping Θ from Σ to hierarchies for S S = {article, author, title} Σ = {part_of} ≤ H = {(author, article), (title, article)} Θ(part_of) = (H, ≤ H )

20 Ontology Integration SIGMODDBLP

21 Ontology Integration SIGMODDBLP IC (interoperation constraints)

22 Ontology Integration Hierarchy graph associated with SIGMOD and DBLP

23 Ontology Integration Fusion of ontologies of SIGMOD and DBLP

24 Architecture

25 Similarity Enhanced Ontology A string similarity measure d S is any function which takes two strings X,Y and returns a non- negative real number such that  X, d S (X,X) = 0  X,Y, d S (X,Y) = d S (Y,X) Any string similarity measure can be used. For example: Levenstein distance which assigns a unit cost to every edit operation. d S (“relation”, “relational”)=2

26 Similarity Enhanced Ontology A similarity measure is any function which takes nodes A, B as input and returns a non-negative real numbers such that d(A,B) = min X  S,Y  T d S (X,Y), where d S is a string similarity measure, S,T are sets of strings contained in nodes A,B.

27 Similarity Enhanced Ontology Suppose H is an integrated hierarchy, d is a similarity measure and   0. (H’,  ) is a similarity enhancement of H w.r.t. d,  iff H’ is a hierarchy and  is a function from H to 2 H’ such that: the original partial orderings in H are preserved, and no unwarranted orderings are included all nodes mapped into the same node are similar to each other (by the threshold  ) two strings are similar iff they are jointly present in some node in (H’,  ) no redundant node whose string set is a subset of some other node

28 Similarity Enhanced Ontology An example ontologyIts similarity enhancement

29 Similarity Enhanced Ontology (H, d,  ) is similarity consistent iff there exists a similarity enhancement of H w.r.t. d, . Theorem If (H, d,  ) is similarity consistent, then all similarity enhancements of H are equivalent.

30 Architecture

31 TOSS Algebra A simple selection condition has the form X op Y op  { =, ,, , ~, instance_of, isa, part_of, subtype_of, above, below }, and X, Y are terms, i.e.,attributes (tag, content), types, or typed values v:  with v  dom(  ). A selection condition is a simple selection condition OR a conjunction/disjunction of two selection conditions

32 TOSS Algebra The pattern tree to find the titles of all papers in DBLP related to Microsoft (independently of the field in which Microsoft appears): #1.tag = inproceedings & #2.tag = title & #3.tag part_of inproceedings & #3.content ~ “Microsoft”

33 TOSS Algebra In order to ensure an embedding to be correct w.r.t. a semistructured DB with an associated similarity enhanced ontology, we define a selection condition to be well-typed if X and Y have a least common supertype  and there exists a function to convert their types to . we define (1) the type and value of a term w.r.t. a mapping h, and (2) the satisfaction of a selection condition We extend the following algebraic operations: selection, projection, product, union, intersection, difference.

34 Implementation and Experiments TOSS system implemented in Java built on top of Xindice DBMS Experiments: Recall and precision Scalability selection join

35 Recall and Precision =TAX X = TOSS (  =2) + = TOSS (  =3)

36 Quality of Answers QueryTAX TOSS (  =2)TOSS (  =3) 10.30.690.86 20.310.610.93 30.320.640.54 40.520.740.79 50.56 0.96 60.590.690.97 70.610.691 80.670.790.85 90.710.690.75 10111 11111 12111

37 Quality of Answers =TAX X = TOSS (  =2) + = TOSS (  =3) Quality =

38

39

40

41 Related Work Wiederhold et al. [ICOT’ 94, EDBT’00,…] ontology algebra (LISP-style logical statements) IC (interoperation constraints) are not considered A similar concept as IC is considered in EDBT’00, but their integrated ontologies were not concise. Besides, we deal with XML documents.

42 Related Work [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001] algebra to query XML documents ontology is not used [Al-Khalifa et al., Querying structured text in an XML database, in SIGMOD 2003] IR-style query to find relevant results with weighting and ranking support in run-time We use ontologies and similarity measures; we consider integration of ontologies and precompute SEO.

43 Questions and Answers Thank you very much!


Download ppt "TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,"

Similar presentations


Ads by Google