Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007.

Similar presentations


Presentation on theme: "Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007."— Presentation transcript:

1 Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007

2 Talk Organization Motivation ( a.k.a. why yet another benchmark? ) and Influences The Workload Domain Ontologies, map ontologies, data sources, queries The Metrics How do we generate things? Domain ontology generation Map ontology Generation Parameters & Relationships Map Generator Algorithm Data Source Generation Query Generation Sample Workload Conclusion & Future Work

3 Motivation As the Semantic Web matures … OWL Ontologies and data from various organizations will gain commercial value Alignment of different ontologies and integration of data that commit to them will be a viable business enterprise Quite possibly we will have post development alignments between ontologies (Alignment tools, third parties etc.) Currently DBPedia, Hawkeye provides some form of third party alignments (non commercial) We wanted to develop a benchmark that reflects the above reality

4 Influences  Lehigh University Benchmark (LUBM) by Y. Guo, Z. Pan, and J. Heflin. (ISWC 2004)  Extended LUBM (can support both OWL Lite and OWL DL) by L. Ma, Y. Yang, Z. Qiu, G, Xie and Y. Pan. (ESWC 2006)  Statistical Analysis of the available Semantic Web ontologies by Tempich, C. and Volz, R. (ISWC 2003)  Benchmarking DL systems by I. Horrocks and P. Patel- Schneider. (DL Workshop 1998)  Internet topology generator by J. Winick and S. Jamin. (University of Michigan)

5 The Workload (1) Domain ontologies “Simple” ontologies. We can control number of classes, properties, and branching factor of the hierarchies Data sources We can control number of data sources that commit to a given ontology, number of classes that will have individuals, number of properties that will connect those individuals, number of triples. Queries Extensional queries in SPARQL. We can control the mix of classes, properties, individuals We can control selectivity

6 The Workload (2) Map ontologies: Main focus of this work In our work a map ontology consists solely of “mapping” axioms that establish alignment between two domain ontologies This is just for convenience of generation and analysis. Semantically they are not much different from the domain ontologies Macro level: We generate Directed acyclic graph of domain ontologies Every edge represents a map ontology Micro level: We can control the type of axioms that are used to map two domain ontologies

7 Metrics Systems with Centralized Approach Systems with Distributed Approach Initialization TimeTime taken to Load the knowledge base Time taken to read the index (e.g. meta-data) Query Response Time Reasoning timeLoad Time + reasoning time Query Completeness Consider queries that entail at least one answer. In determining the relative completeness of queries against a reference set. Repository Size Number of triplesN/A

8 Domain Ontology Generation Simple taxonomy The number to generate vary in a normal distribution with a user supplied value for the mean Given a branching factor and number of terms we generate a balanced tree Complex axioms are left for map ontologies

9 Map Ontology Generation Inputs  No. of Ontologies we want in the workload  Average Out-degree (referred to as out below)  Diameter The number of maps created is approximately equal to -  maps ~(total onts-terminal onts)* out However we do not have terminal onts as a parameter A reasonable approximation is Terminal ontologies ~ (onts*out)/(diameter+out) Thus we have Number of maps ~ (onts*out*diameter)/(diameter+out)

10 Map Generator Algorithm 1.Determine and mark the number of terminal nodes 2.Create a path of diameter length 3.Choose targets for every non-terminal ontology. Constraints: a. No Cycles b. No path greater than diameter c. Non-terminal nodes should not become terminal Create the corresponding map ontologies by generating mapping axioms 4. Update the parameters of the source and the target

11 Mapping axioms Given two domain ontologies and a desired distribution of OWL constructors and restrictions We choose terms from the domain ontologies and create an axiom that connects them We can generate fairly complex axioms E.g. O1:A ⊔ O1:B ⊑ ∃ O2:P.O2:C ⊓ ∀ O2:Q.O2:D Currently the algorithm is restricted to generating axioms that will keep the ontology to OWLII (a subset of OWL used by OBII, Qasem et al. 2007, ISWC NFR workshop) But this is NOT a limitation of our approach

12 Source Generation Choose an ontology Choose number of classes to create individuals Generate triples We can either generate random individuals or Use the domain and range information to connect the individuals with properties

13 Query Generation SPARQL Queries (SELECT) 1. Choose the first predicate from the classes of an ontology. 2. We bias the next predicate with a 75% chance of being one of the properties from the ontology. 3. We make use of shared variables in order to implement “joins”. A shared variable is equally likely to be in the subject as well as the object position. 4. For single predicate queries all the variables are distinguished. For others, on an average 2/3 rd of the variables are distinguished and the rest are non- distinguished. 5. There exists a 10% chance for a constant.

14 A Sample Workload We used the benchmark to evaluate OBII – a distributed query answering system We compared it with a “baseline” system which was essentially a KAON2 wrapper Some characteristics of the workload 50% of classes had individuals On an average we generated 75 triples in a source Generated configurations as large as 100 domain ontologies with about 1000 data sources

15 Conclusion and Future Work  A focus on workload that accounts for post development alignments  Micro level - controlling mapping axioms  Macro level - controlling how ontologies are mapped  Domain ontologies synthesis can be expanded to support complex axioms  Experiment with different characteristics  Hubs and Authorities (different in-degree / out-degree pattern)


Download ppt "Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007."

Similar presentations


Ads by Google