NLP Midterm Solution 2006. #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.

Slides:



Advertisements
Similar presentations
Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict.
Advertisements

1 Egyptian Ministry of Communications and Information Technology Research and Development Centers of Excellence Initiative Data Mining and Computer Modeling.
Statistical NLP: Lecture 3
1 Linguistics and translation theory Mark Shuttleworth Teaching Translation Swansea, 20 January 2006.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
Research methods in corpus linguistics Xiaofei Lu.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
1 A Chart Parser for Analyzing Modern Standard Arabic Sentence Eman Othman Computer Science Dept., Institute of Statistical Studies and Research (ISSR),
Natural Language Processing DR. SADAF RAUF. Topic Morphology: Indian Language and European Language Maryam Zahid.
Chapter 2 Words and word classes.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine translation Context-based approach Lucia Otoyo.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
A Summary of Terminology in Linguistics. First Session Orientation to the Course Introduction to Language & Linguistics 1. Definition of Language 2. The.
Lecture 2 What Is Linguistics.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
An Introduction to Semantics
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
WHAT IS LANGUAGE?. INTRODUCTION In order to interact,human beings have developed a language which distinguishes them from the rest of the animal world.
Natural Language Processing Chapter 2 : Morphology.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
MORPHOLOGY definition; variability among languages.
SYNTAX.
Pragmatics and Text Analysis Chapter 6.  concerned with the how meaning is communicated by the speaker (writer) and interpreted by the listener (reader)
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
Morphology and Syntax- Week 5
ENG 213 MIDSEMESTER EXAM An Introduction to Language
MORPHOLOGY. PART 1: INTRODUCTION Parts of speech 1. What is a part of speech?part of speech 1. Traditional grammar classifies words based on eight parts.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Statistical NLP: Lecture 3
Computational and Statistical Methods for Corpus Analysis: Overview
What is Linguistics? The scientific study of human language
Dekai Wu Presented by David Goss-Grubbs
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

NLP Midterm Solution 2006

#1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source of Corpora (2) –Association of Computational Linguistics’ Data Collection Initiative (ACL/DCI) –European Corpus Initiative (ECI) –International Computer Archive of Modern English (ICAME) –Linguistic Data Consortium (LDC)LDC –Consortium for Lexical Research (CLR) –Electronic Dictionary Research (EDR) –Text Encoding Initiative (TEI) –European Language Resources Distribution Agency (ELDA)ELDA –Association for Computational Linguistics and Chinese Language Processing (ROCLING)ROCLING

#2 … (15)

#3 Fundamental rule (4) If the chart contains edges and, where A and B are categories and W1, W2 and W3 are (possibly empty) sequences of categories or words, then add edge to the chart. i jk A  W1 B W2 B  W3 A  W1 B W2

#3 cont. Use chart to avoid redundancy (2) Bottom-up rule (4) if you are adding edge to the chart, then for every rule in the grammar of the form B  C W2, add an edge to the chart. i j B  C W2 C  W1

#4 Whatever’s reasonable… (10) Consider sample questions going to be asked How to translate question into SQL

#5 Level Well-formedness Types of ambiguity constraints Morphological Rules of inflection and Analysis: structural, analysis (3) derivation morpheme boundaries, [prefix, stem, suffix] morpheme identity Lexical (+3) … Syntactic Grammar rules Analysis: structural, word Analysis(3) category [POS] Semantic Selection restrictions Analysis: word sense, Interpretation(3) quantifier scope Generation: synonymy Pragmatic ?principles of cooperative Analysis: Interpretation(3) conversation? ?pragmatic function? (speaker, listener, context) Generation: ?realization of pragmatic function?

#6 a (6) b (4) Smaller H correct model approximate model

#7 a (5) Pointwise Mutual Information is roughly a measure of how much one word tells us about the other.

#7 cont. b (5) X = word 1, Y = word 2 (3) Physical meaning: higher value = higher dependence. Pointwise Mutual Information is roughly a measure of how much one word tells us about the other. (2)

#7 cont c (5) Perfect dependence (3) As the perfectly dependent bigrams get rarer, their mutual information increases.  bad measure of dependence (2) With MI, bigrams composed of low-frequency words will receive a higher score than bigrams composed of high-frequency words. Higher frequency means more evidence and a higher rank for bigrams is preferred when we have more evidence

#8 a (5) sentence pairs b (5)

#8 Cont. c (5) do for all words