Ontology Evaluation 2009-10-22. Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.

Ontology Evaluation 2009-10-22

Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches

Motivation Ontologies are to be widely adopted in the semantic web and other semantics-aware applications. Users facing a multitude of ontologies need to have a way of assessing them and deciding which one best fits their requirements. People constructing an ontology need a way to evaluate the resulting ontology and possibly to guide the construction process and any refinement steps.

Evaluation Criteria Criteria – 正确性（ Correctness ） – 一致性（ Consistency ） – 可扩展性（ Expandability ） – 清晰性（ Clarity ） – 最小本体约定（ Minimal Ontological Commitment ） – 最小本体偏好编码（ Minimal Encoding Bias ）

Criteria Correctness – 对现实世界而言，本体的定义应该是正确的。 Consistency – 应该支持与其定义相一致的推理。它所定义的公理以及用自然语言进行说明的文档都应该具有一致性 Clarity – 本体必须有效的说明所定义术语的意思。定义应该是客观的，与背景独立的。当定义可以用逻辑公理表达时，它应该是形式化的。定义应该尽可能的完整。所有定义应该用自然语言加以说明

Criteria(2) Expandability – 支持在已有的概念基础上定义新的术语，以满足特殊的需求，而无须修改已有的概念定义 Minimal Ontological Commitment – 本体约定应该最小，只要能够满足特定的知识共享需求即可 Minimal Encoding Bias – 概念的描述不应该依赖于某一种特殊的符号层的表示方法。因为实际的系统可能采用不同的知识表示方法

Schema Metrics – Address the design of the ontology schema – Schema could be hard to evaluate: domain expert consensus, subjectivity etc. Metrics – Relationship diversity – Depth / Breadth / Fanout – Tangledness Evaluation Measures

Evaluation Measures( 1 ) Relationship diversity – This measure differentiates an ontology that contains mostly inheritance relationships (≈ taxonomy) from an ontology that contains a diverse set of relationships. Schema Depth – This measure describes the distribution of classes across different levels of the ontology inheritance tree

Fanout factor Tangledness

Evaluation Measures(2) Instance level Metrics – Evaluate the placement, distribution and relationships between instance data which can indicate the effectiveness of the schema design and the amount of knowledge contained in the ontology.

Evaluation Measures( 3 ) Class Utilization – Evaluates how classes defined in the schema are being utilized in the KB.

Evaluation Measures （ 4 ） Class Importance (popularity) – This metric evaluates the importance of a class based on the number of instances it contains compared to other classes in the ontology. Relationship Utilization – This metric evaluates how the relationships defined for each class in the schema are being used at the instances level.

Evaluation Measures （ 5 ） Ontology Score Calculation

Criteria & Measures Some measures cover a range of criteria but not completely, for example, depth and breadth. These two measures related to expandability but the relation is limited. Some criteria are difﬁcult to quantify, for example, conciseness, completeness,clarity. Some measures do not resolve to any criteria, for example, connectivity and importance.

Golden Standard Evaluation Evaluating lexicon and taxonomy Criteria Based Evaluation Task Based Evaluation Evaluation Approaches

Rationale – Compare an ontology with another ontology which is deemed to be the benchmark Methods – Measure similarities between ontologies both lexically and conceptually Golden Standard Evaluation

Lexicon Similarity Measure

Concepts Similarity Measure

Example

Lexicon Measure Premise – The term frequency is an indication of naturalness, – More users would use natural terms more frequently than other synonyms. Method – based on the term frequency counts in results of the search engine.

Measure of Taxonomies Rationale – If an ontology contains the relationship X IS-A Y then this a natural statement if there are many Web documents that contain both X and Y. Methods – For each concept pair in an IS-A relationship (X i,Y j ), generating a non-related concept pair as ((X i or Y j ),Z k ) where Z k is a randomly selected concept. – Naturalness of an IS-A relationship can be approximated by a Google search.

Evaluating Ontology Decisions with ONTOCLEAN Nicola Guarino, Christopher Welty Communications of the ACM,2002

Correctness Evaluation Rationale – For every class in an ontology, associated with labels or metaproperies which include essence, rigidity, identity and unity,and according to constraints of metaproperties to decide the correctness of ontology modeling. Metaproperties – essence, Rigidity(rigidity,semi-rigidity,anti-rigidity) – Identity, unity

Metaproperties ( 1 ) Essence – A property of an entity is essential to that entity if it must be true of it in every possible world. Rigidity – Rigidity A special form of essentiality is rigidity a property is rigid if it is essential to all its possible instances – semi-rigidity properties that are essential to some entities and not essential to others – Anti-rigidity properties that are not essential to all their instances

Metaproperties(2) Identity – Identity refers to the problem of being able to recognize individual entities in the world as being the same (or different), Unity – Unity refers to being able to recognize all the parts that form an individual entity.

Constraints Given two properties, p and q, when q subsumes p the following constraints hold: – If q is anti-rigid, then p must be anti-rigid; – If q carries an identity criterion, then p must carry the same criterion; – If q carries a unity criterion, then p must carry the same criterion; – If q has anti-unity, then p must also have anti-unity.

AutomaticEvaluationofOntologies (AEON) Johanna,DennyVrande,YorkSure ISWC2005

Automatic Evaluation of Correctness Methods – In term of the meaning of metaproperties,defining lexico-syntactic patterns on the Web to obtain positive and negative evidence for rigidity, identity, unity of concept Architecture

Automatic Evaluation of Correctness(2) Patterns – Rigidity(negative evidence) – Unity(negative evidence) – Unity(positive evidence)

Tasks based evaluation Rationale – Based on the competency of the ontology in completing tasks. By measuring its performance in a quantitative manner within the context of the application. Strength – whether an ontology is suitable for the application or task in a quantitative manner by measuring its performance within the context of the application. Weak – An evaluation for one application or task may not be comparable with another task. Hence, evaluations need to be taken for each task being considered

Ontology Evaluation Using Wikipedia Categories for Browsing Metrics – Depth / Breadth / Fanout – Tangledness Method – Comparing original category of wikipedia with generated category with clustering method and untangled category. – For given browsing task, measuring efficiency and effectiveness of browsing task according to browsing history.

Browsing Task Subtree variation from Wikipedia

Wikipedia tree

Experimental Conclusion User behaviour – Exhibited exploratory behaviour if subtree did not help – Tended to backtrack more if the subtree was not helping with the task at hand Original Wikipedia subtree(a) – Helped users perform better with tasks that were less broad Wikipedia untangled subtree(b) – Generally users backtracked less on broader tasks – Generally found more definite relevant articles in the broadest task in both domains Generated subtree(c) – Users obtain more mostly-relevant but not defnitely-relevant articles in Domain X – Users tended to perform better from Domain Y than X

Ontology Evaluation 2009-10-22. Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.

Similar presentations

Presentation on theme: "Ontology Evaluation 2009-10-22. Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ontology Evaluation 2009-10-22. Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.

Similar presentations

Presentation on theme: "Ontology Evaluation 2009-10-22. Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches."— Presentation transcript:

Similar presentations

About project

Feedback