Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Wednesday 14 th of June, 2006.

Similar presentations


Presentation on theme: "Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Wednesday 14 th of June, 2006."— Presentation transcript:

1 Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006 Rapha ë l Troncy, Umberto Straccia

2 Motivation Various SW repositories, using different vocabularies, distributed on the web Already large amounts of data out there Swoogle hits 1.5M unique Semantic Web documents (05/06/2006) Problem: How to search and retrieve information in such an environment?

3 Example scenario Kim Clijsters (courtesy of AFP) Montenegro independence (courtesy of Euronews) NewsML SportsML EventsML iCalendar TimeML... EXIF EBU P/Meta MXF MPEG-7

4 Distributed Search in the SW: Resource selection Select a subset of some relevant resources Query reformulation Reformulate the information need into the vocabulary used by the resource Data fusion and rank aggregation Merged and ranked all the results together

5 Resource Selection Compute an approximation of the content of each resources For some random queries, an approximation consists of: The ontology the resource relies on Some instances (sampling annotated documents)

6 Query reformulation Transformation rules: From the query vocabulary to the vocabularies used by the resources Semantic Web: Ontology Alignment Establishing relationships holding between the entities (subsumption, equivalence, disjointness…) With a confidence measure Automatically computed  an ontology alignment framework

7 oMAP: Ontology Alignment Tool

8 oMAP: A Formal Framework Sources of inspiration: Formal work in data exchange [Fagin et al., 2003] GLUE: combining several specialized components for finding the best set of mappings [Doan et al., 2003] Notation: A mapping is a triple: M = (T, S, ∑ )  S and T are the source and target ontologies  S i is an OWL entity (class, datatype property, object property) of the ontology  ∑ is a set of mapping rules: α ij T j ← S i

9 oMAP: Combining Classifiers Weight of a mapping rule: α ij = w (S i,T j, ∑ ) Using different classifiers: w (S i,T j, CL k ) is the classifier's approximation of the rule T j ← S i Combining the approximations: Use of a priority list: CL 1 CL 2 … CL n Weighted average of the classifiers prediction

10 Terminological Classifiers Same entity names (or URI) Same entity name stems

11 Terminological Classifiers String distance name Iterative substring matching See [Stoilos et al., ISWC'05]

12 Terminological Classifiers WordNet distance name lcs is the longest common substring between S i and T j sim =

13 Machine Learning-Based Classifiers Collecting bag of words: label for the named individuals data value for the datatype properties type for the anonymous individuals and the range of object properties … Recursion on the OWL definition: depth parameter Use statistical methods on the collected bag of words

14 Machine Learning-Based Classifiers Example Individual (x 1 type (Conference) value (label "European SW Conf") value (location x 2 ) ) Individual (x 2 type (Address) value (city "Budva") value (country "Montenegro") ) u1 = (" European SW Conf ", "Address") u2 = ("Address", "Budva", "Montenegro") Naïve Bayes text classifier kNN text classifier

15 Structural and Semantics-Based Classifier ∑ is a set of mapping rules: α ij T j ← S i ∑ sets are computed by taking the OWL definition of the entities to align recursively in the OWL structure... without looping thanks to cycles detection

16 Structural and Semantics-Based Classifier If S i and T j are property names: If S i and T j are concept names 1 : 1 Where D = D(S i ) * D(T j ) ; D(S i ) represents the set of concepts directly parent of S i

17 Structural and Semantics-Based Classifier Let C S =(QR.C) and D T =(Q’R’.D), then 1 : Let C S =(op C 1 …C m ) and D T =(op’ D 1 …D m ), then 2 : 1 Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions 2 Where op, op’ are concept constructors and n,m ≥ 1

18 Evaluation OAEI Contests (2004, 2005, 2006): http://oaei.ontologymatching.org/ http://oaei.ontologymatching.org/ Systematic benchmark tests on bibliographic data  Tests 2xx: aligning an ontology with variations of itself where each OWL constructs are discarded or modified one per one  Tests 3xx: four real bibliographic ontologies Web categories alignment http://oaei.ontologymatching.org/2005/results/

19 Benchmark Tests dublin200.92 Falcon0.91 FOAM0.90 oMAP0.85 CMS0.81 OLA0.80 ctxMatch0.72 edna0.45 Falcon0.89 OLA0.74 dublin200.72 FOAM0.69 oMAP0.68 edna0.61 ctxMatch0.20 CMS0.18 oMAP: 4 th with the global F-Measure 1 st on 3xx tests (real ontologies to align) Precision Recall

20 Aligning Web Categories Aligning Google, Loksmart and Yahoo web categories [Avesani et al., ISWC'05] Blind tests: only recall results are available ctxMatchFOAMCMSDublin20FalconOLAoMAP 9.4%11.9%14.1%26.5%31.2%32.0%34.4%

21 Distributed Search in the SW Q: retrieve course material dealing with history of the Americas and "Columbus" query(d)<- History_Americas(d,"Columbus") is re-written as two queries 0.63 query(d)<- Latin_American_History(d,"Columbus") 0.84 query(d)<- American_History(d,"Columbus") Each document score is then multiplied with the confidence score of the rule. university courses

22 Conclusion Distributed Search in the SW resource selection / query reformulation / data fusion and rank aggregation oMAP: a formal framework for aligning automatically OWL ontologies Combining several specific classifiers Terminological classifiers Machine learning-based classifiers Structural and semantics-based classifier

23 Future Work Implementing the three steps proposed Keyword-based or structured (SPARQL) queries Ranked list of results oMAP Using additional classifiers:  KL-distance, other resources, background K, etc.  Straightforward theoretically but practically difficult! Finding complex alignment  name = firstName + lastName OWL and rule-based languages:  Take into account this additional expressivity

24 http://www.cwi.nl/~troncy/oMAP/ Any questions ?

25

26 Structural and Semantics-Based Classifier Possible values for w op and w Q weights w op w Q ⊓⊔ ¬ ⊓ 11/40 ⊔ 10 ¬1   1  1  n  n  m11/3  m 1


Download ppt "Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Wednesday 14 th of June, 2006."

Similar presentations


Ads by Google