Building a Large- Scale Knowledge Base for Machine Translation Kevin Knight and Steve K. Luk Presenter: Cristina Nicolae.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Extracting Knowledge-Bases from Machine- Readable Dictionaries: Have We Wasted Our Time? Nancy Ide and Jean Veronis Proc KB&KB’93 Workshop, 1993, pp
Statistical NLP: Lecture 3
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Section 4: Language and Intelligence Overview Instructor: Sandiway Fong Department of Linguistics Department of Computer Science.
Graph Data Management Lab, School of Computer Science Put conference information here.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
Wrap up  Matching  Geometry  Semantics  Multiscale modelling / incremental update / generalization  Geometric algorithms  Web Services.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Human Language Technologies. Issue Corporate data stores contain mostly natural language materials. Knowledge Management systems utilize rich semantic.
Cognitive Linguistics Croft & Cruse 10 An overview of construction grammars (part 2, through end)
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
ImageNet: A Large-Scale Hierarchical Image Database
Created By: Benjamin J. Van Someren.  Natural Language Translation – Translating one natural language such as German to another natural language such.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Survey of Semantic Annotation Platforms
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Intelligent Systems Lecture 20 Examples of NLP in searching systems.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Intro to Lexing & Parsing CS 153. Two pieces conceptually: – Recognizing syntactically valid phrases. – Extracting semantic content from the syntax. E.g.,
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
SEMANTIC ANALYSIS WAES3303
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
JHU WORKSHOP July 30th, 2003 Semantic Annotation – Week 3 Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva,
Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies.
From Allesandro Lenci. Linguistic Ontologies Mikrokosmos (Nirenburg, Mahesh et al.) Generalized Upper Model (Bateman et al.)Generalized Upper Model WordNet.
Logics for Data and Knowledge Representation
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA CVPR ImageNet1.
Linguistic Essentials
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
Zdroje jazykových dat Word senses Sense tagged corpora.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Constructing A Yami Language Lexicon Database from Yami Archiving Projects Meng-Chien Yang(Providence University, Taiwan) D. Victoria Rau(National Chung.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Lexicons, Concept Networks, and Ontologies
Statistical NLP: Lecture 3
Cross-language Information Retrieval
WordNet: A Lexical Database for English
Linguistic Essentials
Presentation transcript:

Building a Large- Scale Knowledge Base for Machine Translation Kevin Knight and Steve K. Luk Presenter: Cristina Nicolae

Linguistic resources combined into PANGLOSS PENMAN Upper Model (Bateman 1990) –top-level network of 200 nodes implemented in the LOOM KR language –makes extensive use of syntactic-semantic correspondences (taxonomy  grammar) ONTOS (Carlson & Nirenburg 1990) –top-level ontology designed to support machine translation Longman’s Dictionary (LDOCE) –words with definition, usage, syntactic code ( [B3] for adj+to), semantic code ( [H] for human), pragmatic code ([ECZB] for economics/business) WordNet (Miller 1990) –semantic word database Collins Bilingual Dictionary –Spanish-English dictionary

Merging resources

Merging resources – contributions LDOCE: syntax and subject area WordNet: synonyms and hierarchical structuring the upper structures: organize the knowledge for NLP in general and the English generation in particular the bilingual dictionary: lets us index the ontology from a second language

two word senses should be matched if their two definitions share words looks also at related words and senses (e.g. synonyms) LDOCE (batter_2_0) “mixture of flour, eggs and milk, beaten together and used in cooking” (batter_3_0) “a person who bats, esp. in baseball – compare BATSMAN” WordNet (BATTER-1) “ballplayer who bats” (BATTER-2) “a flour mixture thin enough to pour or drop from a spoon” Match: –(batter_2_0) with (BATTER-2) –(batter_3_0) with (BATTER-1) Definition Match Algorithm

Definition Match Algorithm – Results low ambiguity wordshigh ambiguity words Ran algorithm on all nouns from LDOCE and WordNet.

Hierarchy Match Algorithm uses sense hierarchies inside LDOCE and WordNet once two senses are matched, it is a good idea to look at their respective ancestors and descendants for further matches Match: –animal_1_2 with ANIMAL-1 –and their respective animal-subhierarchies start with unambiguous words and match them, then look downward and upward in the hierarchies rooted at them and match those too

Hierarchy Match Algorithm – Results In the end, the algorithm produced 11,128 noun sense matches at 96% accuracy.

Bilingual Match Algorithm goal is to annotate the ontology with a large Spanish lexicon from: –mappings between Spanish and English words (from Collins) –mappings between English words and ontological entities (from WordNet) –conceptual relations between ontological entities we obtain: –direct links between Spanish words and ontological entities

Discussion each merge algorithm presented above is verified by humans afterwards (humans are faster at verifying info than generating it from scratch) semi-automatic merging brings together complementary sources of information also allows us to detect errors and omissions where resources are redundant