Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching 1 Label Normalization and Lexical Annotation for Schema and.

Slides:

Advertisements

Similar presentations

Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino

Advertisements

Three-Step Database Design

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Searching for data and services F. Guerra 1, A. Maurino 2, M. Palmonari.

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Semantic Analysis for an Advanced ETL framework S.Bergamaschi 1, F.

Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.

07 - Special Session on Agricultural Metadata & Semantics Antonio Sala - Università di Modena e Reggio Emilia 1 Creating and Querying.

Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool.

S-Match: an Algorithm and an Implementation of Semantic Matching Pavel Shvaiko 1 st European Semantic Web Symposium, 11 May 2004, Crete, Greece paper with.

Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.

Heterogeneous Data Warehouse Analysis and Dimensional Integration Marius Octavian Olaru XXVI Cycle Computer Engineering and Science Advisor: Prof. Maurizio.

SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.

Lexical chains for summarization a summary of Silber & McCoy’s work by Keith Trnka.

A Framework for Ontology-Based Knowledge Management System

Generic Schema Matching using Cupid

April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:

Università degli Studi di Modena e Reggio Emilia The MOMIS project - Sonia Bergamaschi, Alberto Corni, Francesco Guerra,

A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.

Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.

Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.

Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

Automatic Data Ramon Lawrence University of Manitoba

INTEGRATION INTEGRATION Ramon Lawrence University of Iowa

Methodology Conceptual Database Design

Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.

BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.

Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,

Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.

D YNAMIC B UILDING OF D OMAIN S PECIFIC L EXICONS U SING E MERGENT S EMANTICS Final Presentation Matt Selway Supervisor: Professor Markus Stumptner.

Knowledge Discovery in Ontology Learning A survey.

Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.

Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.

Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.

SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,

Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.

Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.

1 Technologies for (semi-) automatic metadata creation Diana Maynard.

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.

RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah

Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.

Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.

Dimitrios Skoutas Alkis Simitsis

updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.

STASIS The STASIS project Domenico Beneventano BDGROUP Università degli Studi di Modena e Reggio Emilia - Italy DB unimo International Workshop.

Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.

CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.

A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.

NeP4B Aims and Innovations: Toward a Unified View of Data and Services Carlo Batini Matteo Palmonari Andrea Maurino University of Milan-Bicocca Italy Sonia.

IFS310: Module 6 3/1/2007 Data Modeling and Entity-Relationship Diagrams.

Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.

1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.

Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.

Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.

Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.

An Ontology-based Approach to Context Modeling and Reasoning in Pervasive Computing Dejene Ejigu, Marian Scuturici, Lionel Brunie Laboratoire INSA de Lyon,

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

Of 24 lecture 11: ontology – mediation, merging & aligning.

SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.

Element Level Semantic Matching

Presentation transcript:

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching 1 Label Normalization and Lexical Annotation for Schema and Ontology Matching International Doctorate School in Information and Communication Technologies Università degli Studi di Modena e Reggio Emilia Serena Sorrentino XXIII Cycle Computer Engineering and Science Advisor: Prof. Sonia Bergamaschi Co-Advisor: Prof. Sanda Harabagiu

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingOutline 2 Conclusion & Future Work Overview Schema Matching Lexical Annotation The MOMIS Data Integration System Open Problems and Contributions Semi-Automatic Lexical Annotation Schema Label Normalization Uncertainty in Automatic Annotation

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Schema Matching - Definition Schema matching Schema matching is the task of finding the semantic correspondences (mappings) between elements of two schemata 3 Auxiliary Information: dictionaries, thesauri, user input … Schema Information: element names, data types, constraints… Instance Information: used to characterize the content and semantics of schema elements Match Result: is defined as a set of mapping elements each of which specifies that certain elements of S1 are mapped to certain elements of S2 Input Output 3

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Lexical Annotation for Schema Matching 4 Lexical Annotation of schema labels is the explicit assignment of meanings w.r.t. a reference lexical thesaurus (WordNet in our case) Lexical relationships (inter-schema knowledge): SYN SYN (Synonym-of)  between two synonym terms BT ( BT (Broader Term)  between two terms where the first generalizes the second (the opposite is NT- Narrower Term) RT RT(Related Term)  between two terms that are generally used together in the same context [ S.Bergamaschi, S.Castano, M.Vincini, D.Beneventano. Semantic integration of heterogeneous information sources. DKE Journal, 2001] Schema derived relationships (intra-schema knowledge): BT/NT ( BT/NT ( from ISA relationships, and from Foreign Key (FK) in relational sources when it is a Primary Key in both the original and referenced relation) RT RT (from nested elements in XML files and from FK in relational sources) DBGroup Approach: schema labels DBGroup Approach: starting from “hidden” meanings associated to schema labels (i.e. class and attribute names, also called terms), it is possible to discover lexical relationships among schema elements

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Lexical Annotation - Example 5 √ √ √ √ Lexical Annotation Customer Client SYN Client #2 Client #3 Customer #1 Client #1 Same Synset … … hyponym meronymy hypernym holonym … Lexical Relationship Discovery SYN SYN  synonym in WordNet BT/NT BT/NT  hypernym/hyponym WordNet relationship RT RT  meronym relationship (part of) or sibling in WordNet

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching The MOMIS Data Integration System 6 MANUAL LEXICAL ANNOTATION AUTOMATIC LEXICAL ANNOTATION INFERRED RELATIONSHIPS LEXICAL RELATIONSHIPS SCHEMA DERIVED RELATIONSHIPS Common Thesaurus COMMON THESAURUS GENERATION USER SUPPLIED RELATIONSHIPS LOCAL SCHEMA N GLOBAL SCHEMA GENERATION clusters generation WRAPPING LOCAL SCHEMA 1 … RDB SYNSET 2 SYNSET # SYNSET 3 SYNSET 1 MAPPING TABLES GLOBAL CLASSES The MOMIS System (Mediator EnvirOment for Multiple Information Sources) is an I 3 framework designed for the integration of structured and semi-structured data sources 6 Wrapping Lexical Annotation Common Thesaurus Generation Global Schema Generation

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Open Problems and Contributions: Automatic Lexical Annotation 7 … … … Schema S1 Schema S2 CLIENT_ID NAME ADDRESS CLIENT COUNTRY CITY PO_ID STREET_ADDRESS PO_ID PRODUCT_CODE PURCHASE_ORDER QTY TSP_INFO INVOCE_NR PRICE … … Non-Dictionary Words. i.e., Compound Nouns(CNs), abbreviations, acronyms: need to normalize schema labels Non-Dictionary Words. i.e., Compound Nouns(CNs), abbreviations, acronyms: need to normalize schema labels Fully Automatic Annotation (i.e. “on- the-fly”) is intrinsically uncertaint: need of dealing with uncertain annotations Fully Automatic Annotation (i.e. “on- the-fly”) is intrinsically uncertaint: need of dealing with uncertain annotations Manual Annotation is a boring and not scalable task  we need of a method to perform Automatic or Semi-automatic Annotation

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingOutline 8 Conclusion & Future Work Overview Schema Matching Lexical Annotation The MOMIS Data Integration System Open Problems and Contributions Semi-Automatic Lexical Annotation Schema Label Normalization Uncertainty in Automatic Annotation

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Word Sense Disambiguation for Semi-Automatic Lexical Annotation WSD (Word Sense Disambiguation) is the ability of identifying the meanings of words in a context by a computational technique [R. Navigli, Word sense disambiguation: A survey. ACM Comput. Surv., 2009 ] 9 The semi-automatic CWSD (Combined Word Sense Disambiguation) method: associates to each label, one/more WordNet meanings combines two WSD algorithms: SD (Structural Disambiguation) exploits the schema derived relationships WND (WordNet domains Disambiguation) exploits WordNet Domains [B. Magnini, et al.,The role of domain information in Word Sense Disambiguation, Journal of Natural Language Engineering, 2002 ]

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching The CWSD method SOURCES SCHEMA DERIVED RELATIONSHIP EXTRACTION (Automatic Wrapping) 1 CLASS AND ATTRIBUTE NAMES EXTRACTION (Automatic Wrapping) 1 SD Algorithm WND Algorithm CWSD LEXICAL RELATIONSHIPS 4 3 ANNOTATED SCHEMATA A AA INTEGRATION DESIGNER Selects relevant domains 10 Common Thesaurus 2

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching We experimented CWSD over a real data set: three level of a subtree of the Yahoo and Google directories (“society and culture” and “society”, respectively) Experimental Evaluation WSD Algorithm RecallPrecisionF-Measure SD WND CWSD Publications related to CWSD: OTM Workshops 2007 S.Bergamaschi, L.Po, S.Sorrentino. Automatic Annotation in Data Integration Systems. OTM Workshops 2007 DBISP2P 2007 S.Bergamaschi, L.Po, A.Sala, S.Sorrentino. Data source annotation in data integration systems. DBISP2P 2007

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingOutline 12 Conclusion & Future Work Overview Schema Matching Lexical Annotation The MOMIS Data Integration System Open Problems and Contributions Semi-Automatic Lexical Annotation Schema Label Normalization Uncertainty in Automatic Annotation

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Schema label normalization: Schema label normalization: is the reduction of each label to some standardized form that can be easily recognized In our case In our case: the process of abbreviation expansion and CN (Compound Noun) annotation Schema Label Normalization a- Discovered relationships without Schema normalization b- Discovered relationships with Schema normalization Legenda Right Relationship False Negative Relationship False Positive Relationship PO PurchaseOrder SYN PO PurchaseOrder 13

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching The Schema Label Normalization method 14  Selecting  Selecting the labels to be normalized  Tokenizing  Tokenizing labels into separated words  Identifying  Identifying abbreviations and CNs among the tokenized words  Selecting  Selecting the labels to be normalized  Tokenizing  Tokenizing labels into separated words  Identifying  Identifying abbreviations and CNs among the tokenized words Maciej Gawinecki’s presentation Maciej Gawinecki’s presentation  Interpreting  Interpreting CNs  Creating new WordNet entries and meanings  Creating new WordNet entries and meanings for the CNs  Interpreting  Interpreting CNs  Creating new WordNet entries and meanings  Creating new WordNet entries and meanings for the CNs We propose a semi-automatic schema label normalization method which is composed by three phases: Label Preprocessing Abbreviation Expansion CN Annotation

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching CN Annotation Compound Noun (CN): is a term composed of two or more words called constituents head modifiers Endocentric CNs: they consist of a head (i.e. the part that contains the basic meaning of the CN) and modifiers, which restrict this meaning. Eg. “delivery company” four main steps Our method can be summed up into four main steps 15

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching 1.CN constituent disambiguation head and modifiers disambiguationhead and modifiers disambiguation: by applying CWSD 2.Redundant constituent identification and pruning Redundant words:Redundant words: words that do not contribute new information, i.e. derived from the schema or from the lexical thesaurus E.g. the attribute “company address” of the class “company”: “company” is not considered as the relationship holding among a class and its attributes is implicit in the schema CN constituent disambiguation & pruning 16

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching CN interpretation via semantic relationships 3. CN interpretation: selecting, among a set of predefined semantic relationships in our case the nine Levi’s relationships ( CAUSE, HAVE, MAKE, IN, FOR, ABOUT, USE, BE, FROM ) [Levi, J. N., The Syntax and Semantics of Complex Nominals. Academic Press, 1978]) the one that best captures the relationship between the head and the modifier Intuition: the semantic relationship between head and modifier is the same holding between their unique beginners (i.e., the 25 top concepts in the noun WordNet hierarchy)  we manually select the correct Levi’s relationship only for the couple of unique beginners Group #1 hyponym … Institution #1 hyponym … Company #1 Act #2 hyponym Delivery #1 MAKE hyponym Transport #1 … … 17 they are suitable to interpret couple of unique beginners a detailed and fine interpretation is not required in our context they can be used during the CN gloss definition Why Levi’s relationships?:

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Creation of a new WN meaning for a CN 4.a Gloss definition Company #1 Gloss Delivery #1 Gloss an institution created to conduct business the act of delivering or distributing something + + Modifier MAKE Head an institution created to conduct business make the act of delivering or distributing something Delivery_Company Gloss: 4.b Inclusion of the new CN meaning in WN Company #1 Delivery #1 Delivery_Company #1 SYNSET µ SYNSET β Hypernym/ Hyponym Related Term Delivery_Company#1 18

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Experimental Evaluation Evaluation over five different data sets (including relational and XML schemata) Evaluating the lexical annotation process: Evaluating the discovered lexical relationships: PrecisionRecallF-Measure Lexical Annotation without Normalization Lexical Annotation with Normalization PrecisionRecallF-Measure Relationships discovered without Normalization Relationships discovered with Normalization Publications related to Schema Label Normalization : DKE Journal, S.Sorrentino, S.Bergamaschi, M.Gawinecki, L.Po, Schema Label Normalization for Improving Schema Matching, DKE Journal, ER 2009 S.Sorrentino, S.Bergamaschi, M.Gawinecki, L.Po, Schema Label Normalization for Improving Schema Matching, ER 2009

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingOutline 20 Conclusion & Future Work Overview Schema Matching Lexical Annotation The MOMIS Data Integration System Open Problems and Contributions Semi-Automatic Lexical Annotation Schema Label Normalization Uncertainty in Automatic Annotation

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Uncertainty in Automatic Annotation 21 In Automatic Lexical Annotation, uncertainty is assessed in terms of probability PWSD The PWSD (Probabilistic Word Sense Disambiguation) algorithm: automatically associates one/more WordNet meanings to schema labels automatically assigns to each annotation a probability value that indicates the reliability of the annotation itself is based on a probabilistic combination of different WSD algorithms uses the Dempster-Shafer theory [Shafer, G., A Mathematical Theory of Evidence, Princeton 1976] to combine the results of the different WSD algorithms

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingExample 22 Dempster-Shafer Theory …… AnnotationsProb. Value Source1.Book Source2.Brochure Source2.Book Heading Schema Elements book#1 book#3 brochure#1 heading#2 … meaningsWSD 1WSD 2WSD N labellabel#1xxx label#2 label#3x WSD Algorithm 1 70% Confidence TERMS ANNOTED WITH ALGORITHM 1 WSD Algorithm 2 60% Confidence WSD Algorithm 3 50% Confidence … TERMS ANNOTED WITH ALGORITHM 2 TERMS ANNOTED WITH ALGORITHM N SCHEMA LABELS

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Probabilistic Lexical Relationships 23 probabilistic lexical relationships Starting from the probabilistic annotation, PWSD derives a set of probabilistic lexical relationships between schema elements WordNet First SensePWSD

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Experimental Results Evaluation on 2 relational schemata of the Amalgam integration benchmark and 3 ontologies from the benchmark OAEI’06 24 WSD methodPrecisionRecallF-Measure WordNet First Sense PWSD* WSD methodPrecisionRecallF-Measure WordNet First Sense PWSD* * Threshold = 0.2 * Threshold = 0.15 Evaluating the lexical annotation process: : Evaluating the discovered lexical relationships: Publications related to PWSD: Information Systems Journal, 2011 L.Po, S.Sorrentino, Automatic generation of probabilistic relationships for improving schema matching, Information Systems Journal, 2011 ECKM 2009 L. Po, S.Sorrentino, S.Bergamaschi, D. Beneventano, Lexical knowledge extraction: an effective approach to schema and ontology matching, ECKM 2009

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching NORMS and ALA ICDE 2011 The Schema Label Normalization functionalities have been implemented in a tool called NORMS (NORMalizer of Schemata) which allows the designer to enhance the normalized labels by correcting potential errors [S.Sorrentino, S.Bergamaschi, M.Gawinecki, NORMS: an automatic tool to perform schema label normalization, ICDE 2011] ERPD 2009 CWSD and PWSD have been implemented in a tool called ALA (Automatic Lexical Annotator). It has been integrated within the MOMIS System [S.Bergamaschi, L.Po, S.Sorrentino, A.Corni, Dealing with Uncertainty in Lexical Annotation, ERPD 2009 ] 25

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingConclusion 26 Automatic and Semi-Automatic methods to perform Label Normalization and Lexical Annotation have been presented: CWSD Schema Label Normalization PWSD Automatic methods to extract (probabilistic) lexical relationships have been proposed and their effectiveness in order to improve schema matching has been shown All the methods have been implemented in the context of the MOMIS Data Integration System. However, they can be applied in the general contexts of schema and ontology matching

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Future Work 27 New research lines: inclusion and integration of other knowledge resources for automatic lexical annotation: Domain-Specific Resources such as domain ontologies, domain thesauri etc. to address the problem of specific domain terms in schemata (e.g., the biomedical term “aromatase” which is an enzyme involved in the production of estrogen) Generic resources: Wikipedia, dictionary etc. inclusion of instance-information extraction techniques to improve the automatic annotation and relationship discovery processes and to solve the problem of non-informative schema labels The use of RELEVANT [S. Bergamaschi, C. Sartori, F. Guerra, M. Orsini, Extracting Relevant Attribute Values for Improved Search. IEEE Internet Computing 2007], which is a tool to extract (and add to the schema) metadata about the relevant instance values of an attribute, is a promising direction

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingPublicationsJournals: Po, L. and Sorrentino, S. (2011). Automatic generation of probabilistic relationships for improving schema matching. Information Systems Journal, Special Issue on Semantic Integration of Data, Multimedia, and Services, 36(2): Sorrentino, S., Bergamaschi, S., Gawinecki, M., and Po, L. (2010). Schema label normalization for improving schema matching. DKE Journal, 69(12): International Conferences and Workshops: ICDE 2011Sorrentino, S., Bergamaschi, S., and Gawinecki, M. (2011). NORMS: an automatic tool to perform schema label normalization. In Press, Accepted Manuscript (Demo Paper), IEEE International Conference on Data Engineering ICDE 2011, April 11-16, Hannover. ER 2009Sorrentino, S., Bergamaschi, S., Gawinecki, M., and Po, L. (2009). Schema normalization for improving schema matching. In proceedings of the 28th International Conference on Conceptual Modeling, ER 2009, Gramado, Brasil, 9-12 November, pages IEEE NLP-KEBeneventano, D., Bergamaschi, S., and Sorrentino, S. (2009) Extending WordNet with compound nouns for semi-automatic annotation in data integration systems. In proceeding of the IEEE NLP-KE Conference, Dalian, China, September ER 2009 Poster and DemonstrationsBergamaschi, S., Po, L., Sorrentino, S., and Corni, A. (2009). Dealing with Uncertainty in Lexical Annotation. Revista de Informatica Terica e Aplicada, RITA, ER 2009 Poster and Demonstrations Session,16(2):

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingPublications IEEE ICSC 2009Beneventano, D., Orsini, M., Po, L., Antonio, S., and Sorrentino, S. (2009). An ontology-based data integration system for data and multimedia sources. In Proceeding of the Third International Conference on Semantic Computing, IEEE ICSC 2009, Berkeley, CA, USA - September 14-16, pages IEEE Computer Society. ISDSI 2009Beneventano, D., Orsini, M., Po, L., and Sorrentino, S. (2009). The MOMIS-STASIS approach for Ontology-Based Data Integration. In proceedings of the 1st International Workshop on Interoperability through Semantic Data and Service Integration, ISDSI 2009, Camogli (GE), Italy June 25. ECKM 2009Po, L., Sorrentino, S., Bergamaschi, S., and Beneventano, D. (2009). Lexical knowledge extraction: an effective approach to schema and ontology matching. Proceedings of the European Conference on Knowledge Management, ECKM 2009, 3-4 September Vicenza. DBISP2PBergamaschi, S., Po, L., Sala, A., and Sorrentino, S. (2007). Data source annotation in data integration systems. In Proceedings of the fifth International Workshop on Databases, Information Systems and Peer- to -Peer Computing, DBISP2P, at 33st International Conference on Very Large Data Bases (VLDB 2007), University of Vienna, Austria, September 24. OTM WorkshopsBergamaschi, S., Po, L., and Sorrentino, S. (2007). Automatic Annotation in Data Integration Systems. In Proceeding of the OTM Workshops, Portugal, November

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingPublications National Conferences National Conferences ITAISBergamaschi, L. Po, S. Sorrentino, A. Corni, "Uncertainty in data integration systems: automatic generation of probabilistic relationships", VI Conference of the Italian Chapter of AIS, ITAIS 2009,, Costa Smeralda, Italy, October SEBDS. Bergamaschi, S. Sorrentino, "Semi-automatic compound nouns annotation for data integration systems", Proceedings of the 17th Italian Symposium on Advanced Database Systems, SEBD 2009, Camogli (Genova), Italy June SEBDS. Bergamaschi, L. Po, and S. Sorrentino, "Automatic annotation for mapping discovery in data integration systems", Proceedings of the Sixteenth Italian Symposium on Advanced Database Systems, SEBD 2008, Mondello (Palermo), Italy, June 2008 (pp ). Book Chapters Bergamaschi, S., Beneventano, D., Po, L., Sorrentino, S. (2011). Automatic Schema Mapping through Normalization and Annotation. In Press, in Second Search Computing Workshop: Challenges and Directions, 2010, LNCS State-of-the-Art Survey. Bergamaschi S., Po L., Sorrentino S., Corni A.. “Uncertainty in data integration systems: automatic generation of probabilistic relationships”, to appeat at Management of the Interconnected World (A. D’Atri, M. De Marco, A. Maria Braccini, F. Cariddu eds.), Springer, ISBN/ISSN: ,

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology MatchingProjects 31 NeP4B - Networked Peers for Business, MIUR funded research project – FIRB 2005 ( ) ( STASIS - SofTware for Ambient Semantic Interoperable Services - Project FP IST ( ) ( “Searching for a needle in mountains of data!” project funded by the Fondazione Cassa di Risparmio di Modena within the Bando di Ricerca Internazionale ( ) (

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Thanks for your attention! 32

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Evaluation Measures 33 FN:False Negative TP: True Positive FP: False Positive TN: True Negative

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Unique beginners The top level concepts of the WordNet hierarchy are the 25 unique beginners (e.g., act, animal, artifact etc.) for WordNet English nouns defined in [ Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K., WordNet: An on-line lexical database. International Journal of Lexicography, 1990] 34

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Levi’s relationships set 35 M = Modifier H = Head [Levi, J. N., The Syntax and Semantics of Complex Nominals. Academic Press, 1978]

Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching Dempster-Shafer theory 36 The Dempster-Shafer theory is a mathematical theory of evidence. It allows to combine evidence from different sources: by using this theory for each algorithm, we assign a probability mass function m(·) to the set of all possible meanings for the term under consideration The mass function of the WSD algorithms are combined by using the Dempster’s rule of combination In the end, to obtain the probability assigned to each meaning, the belief mass function concerning a set of meanings is split