Using Ontologies to Enable Access to Multiple Heterogeneous Databases CARDGIS Eduard Hovy Information Sciences Institute University of Southern California.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
KR-2002 Panel/Debate Are Upper-Level Ontologies worth the effort? Chris Welty, IBM Research.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Omega Ontology: Supporting Annotation Eduard Hovy with Andrew Philpot, Jerry Hobbs, Michael Fleischman, and Patrick Pantel USC/ISI.
QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Columbia University Dept of Computer Science Center for Research on Info Access University of So. Calif Information Sciences Institute (ISI)
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
A Portal for Access to Complex Distributed Information about Energy Jose Luis Ambite, Yigal Arens, Eduard H. Hovy, Andrew Philpot DGRC Information Sciences.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
Ontology Maintenance with an Algebraic Methodology: a Case Study Jan Jannink, Gio Wiederhold Presented by: Lei Lei.
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Retrieval Effectiveness of an Ontology-based Model for Information Selection Khan, L., McLeod, D. & Hovy, E. Presented by Danielle Lee.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Survey of Semantic Annotation Platforms
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Retrieval
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Extracting Semantic Concept Relations
Presented by: Prof. Ali Jaoua
Searching with context
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

Using Ontologies to Enable Access to Multiple Heterogeneous Databases CARDGIS Eduard Hovy Information Sciences Institute University of Southern California (in collaboration with Columbia University)

CARDGIS2 Context: CARDGIS Project Sources: –Energy Info. Adminstration (quarterly CD ROM). –Bureau of Labor Statistics ( –Census Bureau (CD ROM for 1992 data). –California Energy Commission (weekly data at Enable access to multiple, heterogeneous Federal agency data sources through single interface using standardized nomenclature, while accounting for semantic variability.

CARDGIS3 System Architecture Sources Integrated Ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor User phase: Compose query Ontology Construction - DB analysis - text analysis Construction phase: Deploy DBs Extend ontol. Query Processor - reformulation - cost optimization RST  Access phase: Create DB query Retrieve data

CARDGIS4 So What is an Ontology? Desiderata: –‘anchor points’ for terminology variants (salary, income…), –wide coverage, –some degree of taxonomic organization for inference/program behavior control. Terminological (not domain) ontology.

CARDGIS5 Taxonomy, multiple superclass links. Approx. 90,000 items. Top level: Penman Upper Model (ISI). Body: WordNet (Princeton), rearranged. Used at ISI for machine translation, text summarization, database access. ISI’s SENSUS Ontology

CARDGIS6 3 Ways of Building Ontologies 1. Combine existing knowledge resources: ontology alignment Learn from texts and Web: extract word families for thousands of concepts. 3. Parse dictionary definitions: extract information and place into ontology.

CARDGIS7 1. Cross-Ontology Alignment 1. Text Matches –concept names (cognates; reward for delimiter confluence...) –textual definitions (string matching, demorphing, stop words...) [Knight & Luk 94, Dalianis & Hovy 98] 2. Hierarchy Matches –shared superconcepts, to filter ambiguity [Knight & Luk 94] –semantic distance [Agirre et al. 94] 3. Data Item and Form Matches –inter-concept relations [Ageno et al. 94; Rigau & Agirre 95] –slot-filler restrictions [Okumura & Hovy 94] Why create a new Ontology? — Merge and re- use existing ones! Problem: automatically find corresp. concepts.

CARDGIS8 Cross-Ontology Alignment Results Ontologies: –SENSUS Upper Model (350) –CYC top region (2400) [Lenat; Lehmann 96] –MIKROKOSMOS (4790 concepts) [Mahesh 96] –SENSUS top region (6768) Recall (how many links were missed?): difficult to count! … 32.4 mill pairs Precision (how many suggested links are correct?): –0.252 (strict) –0.517 (lenient) After 5 runs: correct: 244 (= 3.6%) –883 suggestions near miss: 256 (= 3.8%) (= 13% of SENSUS candidates)wrong: 383 (= 5.6%)

CARDGIS9 2. The Websucker Corpus –Training set WSJ 1987: 16,137 texts (32 topics). –Test set WSJ 1988: 12,906 texts (31 topics). –Texts indexed into categories by humans. Signature data –300 terms each, using tf.idf. –Word forms: single words, demorphed words, multi-word phrases. How many terms in signatures? –5,10,15, …, 300 terms.

CARDGIS10 Pollution on the Web Cleanup: try various methods: tf.idf,  2, Latent Semantic Analysis...

CARDGIS11 3. Dictionary Extraction Babel n 2 [ SENT [ NP OR [ NP A/DT place/NN ] [ NP scene/NN ] ] [ PP of/IN [ NP AND [ NP noise/NN ] [ NP confusion/NN ] ] ] ] ;/: [ SENT [ NP a/DT confused/JJ mixture/NN ] [ PP of/IN [ NP sounds/NNS ] ],/, as/IN [ PP of/IN [ NP languages/NNS ] ] ]./. Step 1: find unencumbered dictionary (Webster 1913). Step 2: reformat and then parse entries ( Step 3: identify individual propositions and their heads. Step 5: place entries into ontology (not yet done). Step 4: convert preps to semantic relations (EM alg).

CARDGIS12 Identify propositions and their parts: Impression: “A communicating [of a mold or trait] [by an external force or influence]” Reflection: “The return [of light or sound waves] [by or as if by a mirror]” by = AGENT or PATH? communication by force; return by mirror; return by road of = OWNER or NUMBER-PART or SOURCE or …? the car of Joe; 1 of 15 people smoke; man of La Mancha Apply EM algorithm to disambiguate. Disambiguating Extracted Info.

CARDGIS13 Dictionary Extraction Results Ambiguity reduction Readings Instances Evaluation for sentence #1: "As a prefix to english words." : NIL relation<abst PHRASAL speech_act Score: 1/1 = 1 Evaluation for sentence #13: "Gives up to underwriters." : create,make NIL RECIPIENT capitalist<so : transmit_thou NIL RECIPIENT capitalist<so Score: 1/2 = 0.5 Evaluation for sentence #14: "Gives all claim to the property." : emit,utter human_action PHRASAL possessn>tr : chnge_pos human_action PHRASAL possessn>tr : create,make human_action PHRASAL possessn>tr : cogitate human_action PHRASAL possessn>tr : utilize human_action PHRASAL possessn>tr : transmit_thou human_act PHRASAL possessn>tr : transfer>comm human_act PHRASAL possessn>tr : chnge>go_mad human_act PHRASA possessn>tr Score: 1/8 = 0.125

CARDGIS14 The Future: Terminology Standard? Reasons for terminology standardization:  1. Non-duplication  similar domain models built for many applications  2. Consistency  across experts within domain, and across domains  3. Efficient model building  complex: many decisions required simultaneously ANSI Ad Hoc Group on Ontology Standards (NCITS): draw together Ontology work worldwide IBM (Santa Teresa), Stanford, ISI, CYC, TextWise, EDR, CSLI, NMSU, Lawrence Livermore, OnTek, Government... Meetings: 3/96, 9/96, 3/97, 11/97, 1/98, (6/98)…

CARDGIS15 Questions?