Indexing Source Descriptions based on Defined Classes Ralph Lange, Frank Dürr, Kurt Rothermel Institute of Parallel and Distributed Systems (IPVS) Universität.

Slides:



Advertisements
Similar presentations
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Advertisements

Remote Real-Time Trajectory Simplification Ralph Lange, Tobias Farrell, Frank Dürr, Kurt Rothermel Institute of Parallel and Distributed Systems (IPVS)
Fast Algorithms For Hierarchical Range Histogram Constructions
1 A Description Logic with Concrete Domains CS848 presentation Presenter: Yongjuan Zou.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Indexing Network Voronoi Diagrams*
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
Laboratory for Semantic Information Technology Bamberg University Ontology-based Verification of Core Model Conformity in Cadastral Modeling Claudia Hess,
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Chapter 3: Data Storage and Access Methods
GloServ: Global Service Discovery Architecture Knarig Arabshian and Henning Schulzrinne IRT internal talk April 26, 2005.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Efficient Real-Time Tracking of Moving Objects’ Trajectories Ralph Lange, Frank Dürr, Kurt Rothermel Institute of Parallel and Distributed Systems (IPVS)
Scalable Management of Trajectories and Context Model Descriptions Ralph Lange PhD graduate (Dr. rer. nat.) of the Universität Stuttgart Supervisor: Prof.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Carlos Lamsfus. ISWDS 2005 Galway, November 7th 2005 CENTRO DE TECNOLOGÍAS DE INTERACCIÓN VISUAL Y COMUNICACIONES VISUAL INTERACTION AND COMMUNICATIONS.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
Genetic Programming on Program Traces as an Inference Engine for Probabilistic Languages Vita Batishcheva, Alexey Potapov
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Search Engines and Information Retrieval Chapter 1.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
An Introduction to Description Logics. What Are Description Logics? A family of logic based Knowledge Representation formalisms –Descendants of semantic.
Ontologies for the Integration of Geospatial Data Michael Lutz Workshop: Semantics and Ontologies for GI Services, 2006 Paper: Lutz et al., Overcoming.
04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
A view-based approach for semantic service descriptions Carsten Jacob, Heiko Pfeffer, Stephan Steglich, Li Yan, and Ma Qifeng
Querying Structured Text in an XML Database By Xuemei Luo.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Universität Stuttgart Institute of Parallel and Distributed Systems (IPVS) Universitätsstraße 38 D Stuttgart Scalable Processing of Trajectory-Based.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Dimitrios Skoutas Alkis Simitsis
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Semantic based P2P System for local e-Government Fernando Ortiz-Rodriguez 1, Raúl Palma de León 2 and Boris Villazón-Terrazas 2 1 1Universidad Tamaulipeca.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Deriving Relation Keys from XML Keys by Qing Wang, Hongwei Wu, Jianchang Xiao, Aoying Zhou, Junmei Zhou Reviewed by Chris Ying Zhu, Cong Wang, Max Wang,
Data Profiling 13 th Meeting Course Name: Business Intelligence Year: 2009.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
An Unstructured Semantic Mesh Definition Suitable for Finite Element Method Marek Gayer, Hannu Niemistö and Tommi Karhela
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
1 An infrastructure for context-awareness based on first order logic 송지수 ISI LAB.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Spatial Data Management
Record Storage, File Organization, and Indexes
Improving Data Discovery Through Semantic Search
Pattern-Directed Programming
Relational Algebra Chapter 4, Part A
Modeling Your Data Chapter 2 cs542
ece 720 intelligent web: ontology and beyond
Overview of Query Evaluation
Presentation transcript:

Indexing Source Descriptions based on Defined Classes Ralph Lange, Frank Dürr, Kurt Rothermel Institute of Parallel and Distributed Systems (IPVS) Universität Stuttgart, Germany Universität Stuttgart Institute of Parallel and Distributed Systems (IPVS) Universitätsstraße Stuttgart, Germany Collaborative Research Center 627

2 Motivation SELECT … FROM Which entities are described by a source? Which information is given about the entities? Heterogeneous information systems (HIS) ◦ Areas: logistics, finance, context management, … ◦ Types: FDBMS, mediator-based IS, PDMS Problem: Source discovery in large HIS ◦ Schema mappings give coarse descriptions only 1. Formalism for concise source descriptions 2. Index structure for their efficient retrieval Focus: Ontology-based HIS

3 Example: Scalable Context Management (e.g. Nexus) ◦ Millions of providers of sensor data maps, 3D building models, street maps, … Well-known idea: Exclude sources from processing a query using constraints Contributions 1.Advanced description formalism based on defined classes ▪ Alternative descriptions, constraints on relations, … 2.Adjustable matching semantics 3.Source Description Class Tree (SDC-Tree) Motivation (2) location = 44 Gt Russell St, London, UKlocation = Berlin, Germany name = “Pergamon Museum”  ?

4 Overview Motivation Description formalism Matching Source Description Class Tree (SDC-Tree) Evaluation Summary

5 Describing Sources Assumption: (simple) shared ontology ◦ Classes C i, attributes a j, relations r k Sources provide information about coherent clippings of domain of discourse ◦ Entities share characteristic properties, which can be characterized by a defined class ◦ Recursive resolving of relations ◦ Differentiation of alternative defined classes – requires expert knowledge D 2 = 〈 BuildingPart : partOf ∈ 〈 Museum : name ∈ {“British Museum”} 〉 〉 D 1 = 〈 BuildingPart : location ∈ {44 Gt Russell St, London, UK} 〉

6 Definition of Defined Classes Formal definition: ◦ Base (D) returns C ◦ isCon ai (D) returns whether D has a constraint on a i ◦ Con ai (D) returns the constraint range for a i … i.e. Con ai (D) = X i ⊆ Rng (a i ) ◦ Of course, Dom (a i ) ≽ C  Expressive and self-contained D = 〈 C : a 1 ∈ X 1 ⋀ a 2 ∈ X 2 ⋀ … ⋀ r 1 ∈ D 1 ⋀ r 2 ∈ D 2 ⋀ … 〉 same for relations r j

7 Queries consist of only one defined class  Possible matching semantics: Example with query class Q and source description {D 1, …, D n } Matching against Queries D 2 = 〈 BuildingPart : partOf ∈ 〈 Museum : name ∈ {“British Museum”} 〉 〉 D 1 = 〈 BuildingPart : location ∈ {44 Gt Russell St, London, UK} 〉 Q = 〈 ExhibitionHall : location ∈ {44 Gt Russell St, London, UK} ⋀ partOf ∈ 〈 Museum : name ∈ {“British Museum”} 〉 〉 Positive: Overlapping constraints matching indicator – like keywords Negative: Exclusion of sources by disjoint ranges of corresponding constraints Q = 〈 ExhibitionHall : partOf ∈ 〈 Museum : name ∈ {“Brit*”} 〉 〉 Q = 〈 ExhibitionHall : location ∈ {London, UK} ⋀ partOf ∈ 〈 Museum : name ∈ {“Churchill Mus*”} 〉 〉 

8 Queries consist of only one defined class  Possible matching semantics: Example with query class Q and source description {D 1, …, D n } Matching against Queries D 1 = 〈 BuildingPart : location ∈ {44 Gt Russell St, London, UK} 〉 Q = 〈 ExhibitionHall : location ∈ * ⋀ partOf ∈ 〈 Museum : name ∈ {“Brit*”} 〉 〉 Q = 〈 ExhibitionHall : partOf ∈ 〈 Museum : name ∈ {“Brit*”} 〉 〉 ?  Positive: Overlapping constraints matching indicator – like keywords Negative: Exclusion of sources by disjoint ranges of corresponding constraints Necessary condition for matching: ⇝ Q Disjoint ranges form sufficient condition for dismatching: // Q

9 Query matching predicate Source class D matches query class Q, denoted by D ⇝ Q Q, iff 1. ( Base (D) ≽ Base (Q)) ⋁ ( Base (D) ≼ Base (Q)) 2. ∀ attribute a with ( Dom (a) ≽ Base (Q)) ⋀ ( Dom (a) ≽ Base (D)): isCon a (D) ⇒ ( isCon a (Q) ⋀ ( Con a (D) ⋂ Con a (Q) ≠ {})) 3. ∀ relation r with ( Dom (r) ≽ Base (Q)) ⋀ ( Dom (r) ≽ Base (D)): isCon r (D) ⇒ ( isCon r (Q) ⋀ ( Con r (D) ⇝ Q Con r (Q))) Visually: D and Q each span a cuboid ◦ Q must have same or more dimensions than D … and cuboids must overlap Predicates D Q

10 Query dismatching predicate Source class D dismatches query class Q, denoted by D // Q Q, iff ∃ attribute a with ( Dom (a) ≽ Base (Q)) ⋀ ( Dom (a) ≽ Base (D)): isCon a (D) ⋀ isCon a (Q) ⋀ ( Con a (D) ⋂ Con a (Q) = {}) or ∃ relation r with ( Dom (r) ≽ Base (Q)) ⋀ ( Dom (r) ≽ Base (D)): isCon r (D) ⋀ isCon r (Q) ⋀ ( Con r (D) // Q Con r (Q)) Matching Source description {D 1, …, D n } matches query class Q, iff 1. ∃ D i : D i ⇝ Q Q 2. ∄ D i : D i // Q Q Predicates (2)

11 Predicates (3) Query subsumption predicate Defined class D subsumes defined class Q, denoted by D ≽ Q Q, iff 1. Base (D) ≽ Base (Q) 2. ∀ attribute a with Dom (a) ≽ Base (D): isCon a (D) ⇒ ( isCon a (Q) ⋀ ( Con a (D) ⊇ Con a (Q))) 3. ∀ relation r with ( Dom (r) ≽ Base (D): isCon r (D) ⇒ ( isCon r (Q) ⋀ ( Con r (D) ≽ Q Con r (Q))) Visually: Q must have same or more dimensions than D … and Q has be to contained in D (in the dimensions of D) D Q Predicate ≽ Q is transitive by construction since ≽ and ⊇ are transitive

12 SDC-Tree Large HIS require index structure for efficient search of source descriptions Defined classes may differ in three aspects: ◦ Base class ◦ Existence of constraints ◦ Ranges of constraints Source Description Class Tree ◦ Indexes descriptions by source classes ◦ Split types for all differentiating aspects

13 Nodes associated with node classes N i ◦ Hierarchy by index subsumption predicate ≽ I, implying ≽ Q ◦ Base split ◦ Existence split ◦ Range split D is indexed at leaf nodes where N i ⇝ I D ◦ Index matching predicate ⇝ I implies ⇝ Q Queries are passed by ⇝ Q ◦ Post-filtering for // Q SDC-Tree (2) 〈 Thing, True : 〉 〈 Thing, False : 〉 〈 Spatial, True : 〉 〈 LegalBody, True : 〉 〈 Spatial, True : loc. ∈ NULL 〉 〈 Spatial, True : loc. ∈ [-90,-180]×[90,180] 〉 〈 Spatial, True : loc. ∈ [-90,-180]×[0,180] 〉 〈 Spatial, True : loc. ∈ [0,-180]×[90,180] 〉 〈 BuildingPart : loc. ∈ [7,8]×[11,10] 〉 〈 BuildingPart : loc. ∈ [6,7]×[9,11] 〉 〈 BuildingPart : loc. ∈ [7,8]×[11,10] 〉 Splits can be also performed by nested classes, e.g. 〈 BuildingPart : partOf ∈ 〈 Museum : name ∈ {[A*,Z*]} 〉 〉

14 Implications between predicates: ◦ Extensions for node classes are evaluated by ⇝ I and ≽ I only Completeness of indexing ◦ If D ⇝ Q Q, then ∃ path N 1, …, N k : ◦ See paper for proof SDC-Tree (2) 〈 Thing, True : 〉 〈 Thing, False : 〉 〈 Spatial, True : 〉 〈 LegalBody, True : 〉 〈 Spatial, True : loc. ∈ NULL 〉 〈 Spatial, True : loc. ∈ [-90,-180]×[90,180] 〉 〈 Spatial, True : loc. ∈ [-90,-180]×[0,180] 〉 〈 Spatial, True : loc. ∈ [0,-180]×[90,180] 〉 ≽I≽I ⇝Q⇝Q ⇝I⇝I ≽Q≽Q ⇒ ⇒ ⇒⇒ ⇒ not // Q ∀ N i : (N i ⇝ I D) ∧ (N i ⇝ Q Q)

15 Split Algorithm Actual structure of SDC-Tree depends on split operations ◦ Different split strategies are feasible Generic split algorithm (GSAlg) ◦ Triggered by overflow of leaf node (n split ) 1. Compute all possible splits ▪ Recursive operation for nested classes ▪ Adapted partitioning algorithm of R*-Tree for range splits 2. Rate each split from 1 (good) to 0 (bad) … depending on distribution of entries to potential child nodes 3. Apply split with highest rating

16 Evaluation Setup Implemented Simple Ontology Language (SOL) ◦ Attribute types with concrete domains and interval/set algebras Implemented SDC-Tree as main memory index with GSAlg Created spatial context ontology (see paper) ◦ Inspired by ADL Feature Types, SUMO, and PROTON Created templates for source classes for typical spatial context providers ◦ E.g. building parts of a public building or streets and regions of a city ◦ Generated 1.1 · 10 6 source classes using OpenStreetMap database n split = 10 (see paper)

17 Results on Searching Logarithmic search cost from ≈ 1000 source classes on Bulk insertion outperforms successive insertion by ≈ 1%

18 Results on Insertion Conclusion: Logarithmic cost for search and insertion …despite heterogeneity of split types and predicates Cost for splitting amount to ≈ 4 evaluations of ⇝ I

19 Related Work Integration systems (Information Manifold, Infomaster, Quete, …) ◦ Query processing excludes sources with unrelated attributes/relations ◦ Possible to enhance mappings by constraints (e.g. price > 20000)  Not sufficient for large HIS Discovery services for text sources (GlOSS, …) ◦ Keyword-based search and ranking  Do not incorporate underlying ontology P2P discovery services for ontology-based HIS (SCS, GloServ, …) ◦ Organize sources according to class hierarchy and selected attributes  Large HIS require higher expressiveness and flexibility

20 Summary Source discovery in large HIS requires specific approach Proposed advanced description formalism for ontology-based HIS ◦ Based on nested defined classes ◦ Adjustable matching semantics using pseudo constraints Source Description Class Tree (SDC-Tree) for efficient matching ◦ Extended defined classes to reflect three different split types ◦ Generic split algorithm for arbitrary ontologies ◦ Logarithmic search/matching cost Which entities are described by a source? Which information is given about the entities?

21 Thank you for your attention! Ralph Lange Institute of Parallel and Distributed Systems (IPVS) Universität Stuttgart Universitätsstraße 38 · Stuttgart · Germany ·

BACKUP SLIDES

23 Assumptions for shared ontology Classes {C 1, C 2, …} such as Building, BuildingPart, and ExhibitionHall ◦ Prnt (C i ) gives parent class of C i ◦ C i is subclass of C j denoted by C i ≺ C j Relations {r 1, r 2, …} such as ownedBy ◦ Dom (r i ) = C j gives domain ◦ Rng (r i ) = C k gives range, where possibly C j = C k Attributes {a 1, a 2, …} such as name and location ◦ Dom (a i ) = C j gives domain ◦ Rng (a i ) gives range like integer, string, ℝ 2, {“N”, “E”, “S”, “W”}, and [0,99]  Compatible with prevalent ontology languages (e.g., OWL)

24 Spatial Context Ontology

25 Results on Searching (2)

26 Results on Tree Size

27 Results on Split Rating

28 Results on Nesting Depth