Presentation is loading. Please wait.

Presentation is loading. Please wait.

ASWC08 Semantically Conceptualizing and Annotating Tables Stephen Lynn & David W. Embley Data Extraction Research Group Department of Computer Science.

Similar presentations


Presentation on theme: "ASWC08 Semantically Conceptualizing and Annotating Tables Stephen Lynn & David W. Embley Data Extraction Research Group Department of Computer Science."— Presentation transcript:

1 ASWC08 Semantically Conceptualizing and Annotating Tables Stephen Lynn & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University Supported by the

2 ASWC08 Semantically Conceptualizing and Annotating Tables Overview Context WoK: Web of Knowledge TANGO: Table ANalysis for Generating Ontologies MOGO: Mini-Ontology GeneratOr Semantic Enrichment via MOGO Implementation Experimentation Enhancements Challenges & Opportunities

3 ASWC08 Semantically Conceptualizing and Annotating Tables WoK: a Web of Knowledge

4 ASWC08 Semantically Conceptualizing and Annotating Tables TANGO fleckvelter gonsity (ld/gg) hepth (gd) burlam1.2120 falder2.3230 multon2.5400 TANGO repeatedly turns raw tables into conceptual mini- ontologies and integrates them into a growing ontology. Growing Ontology

5 ASWC08 Semantically Conceptualizing and Annotating Tables MOGO fleckvelter gonsity (ld/gg) hepth (gd) burlam1.2120 falder2.3230 multon2.5400 TANGO repeatedly turns raw tables into conceptual mini- ontologies and integrates them into a growing ontology. Growing Ontology MOGO generates mini-ontologies from interpreted tables.

6 ASWC08 Semantically Conceptualizing and Annotating Tables MOGO Overview Table Interpretation Yields a canonical table Canonical Table Concept/Value Recognition Relationship Discovery Constraint Discovery Yields a semantically enriched conceptual model Mini-ontology Integration into a growing ontology MOGO

7 ASWC08 Semantically Conceptualizing and Annotating Tables Sample Input Region and State Information LocationPopulation (2000)LatitudeLongitude Northeast2,122,869 Delaware817,37645-90 Maine1,305,49344-93 Northwest9,690,665 Oregon3,559,54745-120 Washington6,131,11843-120 Sample Output

8 ASWC08 Semantically Conceptualizing and Annotating Tables Concept/Value Recognition Lexical Clues Labels as data values Data value assignment Data Frame Clues Labels as data values Data value assignment Default Recognize concepts and values by syntax and layout

9 ASWC08 Semantically Conceptualizing and Annotating Tables Concept/Value Recognition Lexical Clues Labels as data values Data value assignment Data Frame Clues Labels as data values Data value assignment Default Recognize concepts and values by syntax and layout Concepts and Value Assignments Northeast Northwest Delaware Maine Oregon Washington Location RegionState

10 ASWC08 Semantically Conceptualizing and Annotating Tables Concept/Value Recognition Lexical Clues Labels as data values Data value assignment Data Frame Clues Labels as data values Data value assignment Default Recognize concepts and values by syntax and layout PopulationLatitudeLongitude 2,122,869 817,376 1,305,493 9,690,665 3,559,547 6,131,118 45 44 45 43 -90 -93 -120 Year 2002 2003 Concepts and Value Assignments Northeast Northwest Delaware Maine Oregon Washington Location RegionState

11 ASWC08 Semantically Conceptualizing and Annotating Tables Relationship Discovery Dimension Tree Mappings Lexical Clues Generalization/Specialization Aggregation Data Frames Ontology Fragment Merge 2000

12 ASWC08 Semantically Conceptualizing and Annotating Tables Relationship Discovery Dimension Tree Mappings Lexical Clues Generalization/Specialization Aggregation Data Frames Ontology Fragment Merge

13 ASWC08 Semantically Conceptualizing and Annotating Tables Constraint Discovery Generalization/Specialization Computed Values Functional Relationships Optional Participation Region and State Information LocationPopulation (2000)LatitudeLongitude Northeast2,122,869 Delaware817,37645-90 Maine1,305,49344-93 Northwest9,690,665 Oregon3,559,54745-120 Washington6,131,11843-120

14 ASWC08 Semantically Conceptualizing and Annotating Tables Validation Concept/Value Recognition Correctly identified concepts Missed concepts False positives Data values assignment Relationship Discovery Valid relationship sets Invalid relationship sets Missed relationship sets Constraint Discovery Valid constraints Invalid constraints Missed constraints PrecisionRecallF-measure Concept Recognition 87%94%90% Relationship Discovery 73%81%77% Constraint Discovery 89%91%90%

15 ASWC08 Semantically Conceptualizing and Annotating Tables Concept Recognition Counted: Correct/Incorrect/Missing Concepts Correct/Incorrect/Missing Labels Data value assignments

16 ASWC08 Semantically Conceptualizing and Annotating Tables Relationship Discovery Counted: Correct/incorrect/missing relationship sets Correct/incorrect/missing aggregations and generalization/specializations

17 ASWC08 Semantically Conceptualizing and Annotating Tables Constraint Discovery Counted: Correct/Incorrect/Missing: Generalization/Specialization constraints Computed value constraints Functional constraints Optional constraints

18 ASWC08 Semantically Conceptualizing and Annotating Tables Concept Recognition Successes 98% of concepts identified Missing label identification 97% of values assigned to correct concept Common problems Finding an appropriate label Duplicate concepts

19 ASWC08 Semantically Conceptualizing and Annotating Tables Relationship Discovery Recall of 92% for relationship sets Missing aggregations and gen./spec.s (only found in label nesting) Unnecessary rel. sets generated (are computable)

20 ASWC08 Semantically Conceptualizing and Annotating Tables Constraint Discovery F-measure of 98% for functional relationship sets Computed value discovery Funtional/non-functional lists in cells

21 ASWC08 Semantically Conceptualizing and Annotating Tables MOGO Contributions Tool to generate mini-ontologies Accuracy encouraging PrecisionRecallF-measure Concept Recognition 87%94%90% Relationship Discovery 73%81%77% Constraint Discovery 89%91%90%

22 ASWC08 Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: MOGO Enhancements Check for inter-label relationships Check for more complex computations Check for lists in cells … Wish List Data-frame library Atomic knowledge components Instance recognizers Library of molecular components Semi-automatic construction of a WordNet-like resource for knowledge components

23 ASWC08 Semantically Conceptualizing and Annotating Tables Summary MOGO Semantic Enrichment Encouraging Results But More Possible Broader Implications ~ Vision & Challenges TANGO WoK Web of Data Semantic Annotation User-friendly Query Answering www.deg.byu.edu embley@cs.byu.edu

24 ASWC08 Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: TANGO Table Interpretation Transforming tables to F-logic [Pivk07] Layout-independent table representation [Jha08] Table interpretation by sibling tables [Tao07] Semantic Enhancement / Ontology Generation Naming unnamed table concepts [Pivk07] MOGO [Lynn09] Semi-automatic Ontology Integration Ontology Matching [Euzenat07] Ontology-mapping tools [Falconer07] Direct and indirect schema mappings for TANGO [Xu06]

25 ASWC08 Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: WoK Web of Data The Semantic Web is a web of data. [W3C] Upcoming special issue of Journal of Web Semantics Enabling a Web of Knowledge [Tao09] Information Extraction Domain-independent IE from web tables [Gatterbauer07] Open IE [Banko07] …

26 ASWC08 Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: WoK … Semantic Annotation wrt Ontologies Linking Data to Ontologies [Poggi08] TISP [Tao07] FOCIH [Tao09] Reasoning & Query Answering Description Logics [Baadar03] NLIDB Community AskOntos [Ding06] SerFR [Al-Muhammed07]

27 ASWC08 Semantically Conceptualizing and Annotating Tables References [Al-Muhammed07] Al-Muhammed and Embley, Ontology-Based Constraint Recognition for Free-Form Service Requests, Proceedings of the 23 rd International Conference on Data Engineering, 2007. [Baader, Calvanese, McGuinness, Nardi and Patel-Schneider, The Description Logic Handbook, Cambridge University Press, 2003. [Banko07] Banko, Cafarella, Soderland, Broadhead and Etzioni, Open Information Extraction from the Web, Proceedings of the International Joint Conference on Artificial Intelligence, 2007. [Ding06] Ding, Embley and Liddle, Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies, Proceedings of the First Asian Semantic Web Conference, 2006. [Euzenat07] Eusenat and Shvaiko, Ontology Matching, Springer Verlag, 2007. [Falconer07] Falconer, Noy and Storey, Ontology MappingA User Survey, Proceedings of the Second International Workshop on Ontology Mapping, 2007. [Gatterbauer07] Gatterbauer, Bohunsky, Herzog and Pollak, Towards Domain-Independent Information Extraction from Web Tables, Proceedings of the Sixteenth International World Wide Web Conference, 2007. [Jha07] Jha and Nagy, Wang Notation Tool: Layout Independent Representation of Tables, Proceedings of the 19 th International Conference on Pattern Recognition, 2007. [Pivk07] Pivk, Sure, Cimiano, Gams, Rajkovič and Studer, Transforming Arbitrary Tables into Logical Form with TARTAR, Data & Knowledge Engineering, 2007. [Poggi08] Poggi, Lembo, Calvanese, DeGiacomo, Lenzerini and Rosati, Linking Data to Ontologies, Journal on Data Semantics, 2008. [Tao07] Tao and Embley, Automatic Hidden-Web Table Interpretation by Sibling page Comparison, Proceedings of the 26 th International Conference on Conceptual Modeling, 2007. [Tao09] Tao, Embley and Liddle, Enabling a Web of Knowledge, Technical Report : tango.byu.edu/papers, 2009. [Xu06] Xu and Embley, A Composite Approach to Automating Direct and Indirect Schema Mappings, Information Systems, 2006.


Download ppt "ASWC08 Semantically Conceptualizing and Annotating Tables Stephen Lynn & David W. Embley Data Extraction Research Group Department of Computer Science."

Similar presentations


Ads by Google