Computational Linguistics INTroduction

Slides:



Advertisements
Similar presentations
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Advertisements

Intro to Linguistics Class # 2 Chapter 1: What is Language?
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Introduction to Linguistics and Basic Terms
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.
Introduction to Computational Linguistics Lecture 2.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
PZ02A - Language translation
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
Introduction to Programming Lecture Number:. What is Programming Programming is to instruct the computer on what it has to do in a language that the computer.
9/8/20151 Natural Language Processing Lecture Notes 1.
Introduction to Natural Language Processing Heshaam Faili University of Tehran.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Linguistics and Language
1 Computational Linguistics Ling 200 Spring 2006.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Postgraduate Diploma in Translation Lecture 1 Computers and Language.
Introduction to CL & NLP CMSC April 1, 2003.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Introduction to Computational Linguistics (LIN3060) Lecture 1 Computers and Language.
PSY270 Michaela Porubanova. Language  a system of communication using sounds or symbols that enables us to express our feelings, thoughts, ideas, and.
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
ISBN Chapter 3 Describing Syntax and Semantics.
SYNTAX.
Syntax.
NATURAL LANGUAGE PROCESSING
Chapter 1 Introduction PHONOLOGY (Lane 335). Phonetics & Phonology Phonetics: deals with speech sounds, how they are made (articulatory phonetics), how.
INTRODUCTION TO COMPUTER PROGRAMMING(IT-303) Basics.
MENTAL GRAMMAR Language and mind. First half of 20 th cent. – What the main goal of linguistics should be? Behaviorism – Bloomfield: goal of linguistics.
INTRODUCTION TO APPLIED LINGUISTICS
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
| MSC 8102:PROGRAMMING CONCEPTS By Vincent Omwenga, PhD. 1.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
An Introduction to Linguistics
Theory of Computation Theory of computation is mainly concerned with the study of how problems can be solved using algorithms.  Therefore, we can infer.
Natural Language Processing (NLP)
Algorithm and Ambiguity
Method of Language Definition
COMS W1004 Introduction to Computer Science and Programming in Java
Natural Language - General
Introduction to Linguistics
Natural Language Processing
Natural Language Processing (NLP)
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Natural Language Processing (NLP)
Presentation transcript:

Computational Linguistics INTroduction Lecture 1 Computers and Language

Course Information Course Website http://staff.um.edu.mt/mros1/lin2160 Lecturers mike.rosner@um.edu.mt ray.fabri@um.edu.mt Book Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2009, ISBN 978-0-13-504196-3 Natural Language Toolkit (NLTK) http://www.nltk.org/ Feb 2010 -- MR CLINT - Lecture 1

CL: Two Main Disciplines LINGUISTICS COMP SCI language and computers Feb 2010 -- MR CLINT - Lecture 1

Language and Computers includes … Natural Language Processing (NLP) Computational models of language analysis, interpretation, and generation. syntax/semantics interface Human Language Technology emphasis on large-scale performance example1: Google search example2: speech technology Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts Feb 2010 -- MR CLINT - Lecture 1

Linguistics Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use Feb 2010 -- MR CLINT - Lecture 1

Noam Chomsky Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central. Chomsky has been the dominant figure in linguistics ever since. Chomsky invented the generative approach to grammar. Feb 2010 -- MR CLINT - Lecture 1

Generative Grammar: Some Key Points Theory of grammar includes mathematical definition of what a grammar is. A language is a (possibly infinite) set of sentences. But a grammar is finite. Grammar generates all and only sentences of a language. Undergeneration Overgeneration [source: Sag & Wasow] Feb 2010 -- MR CLINT - Lecture 1

Generative Power of a Grammar L G G L overgeneration all but not only undergeneration only but not all L G all and only Feb 2010 -- MR CLINT - Lecture 1

Formal Grammar Grammar is a set of rewrite rules Rules have the form LHS  RHS LHS can be rewritten as RHS LHS & RHS are sequences made of words or symbols Lexicon specifies words and their categories Category  word Category can be rewritten as word Feb 2010 -- MR CLINT - Lecture 1

A Simple Grammar/Lexicon S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill S NP N John kicks NP V VP N Bill Feb 2010 -- MR CLINT - Lecture 1

Formal v. Natural Languages Formal Languages Arithmetic 3290 1 1010101 Logic x man(x)  mortal(x) URL http://www.cs.um.edu.mt Natural Languages English John saw the dog German Johann hat den hund gesehen Maltese Ġianni ra kelb Feb 2010 -- MR CLINT - Lecture 1

Some Points of Similarity Sentences are sequences of words (or symbols). Rules determine which sequences are valid sentences. Sentences have a definite structure. Sentence structure systematically related to meaning. Feb 2010 -- MR CLINT - Lecture 1

Structure Affects Meaning I shot an elephant in my trousers Feb 2010 -- MR CLINT - Lecture 1

Points of Difference Formal Languages The grammar defines the language Restricted application Non ambiguous Natural Languages The language defines the grammar Universal application Highly ambiguous Feb 2010 -- MR CLINT - Lecture 1

Ambiguity Morphological Ambiguity en-large-ment Lexical Ambiguity Iraqi Head Seeks Arms Syntactic Ambiguity small animals and children laugh Semantic Ambiguity every girl loves a sailor Pragmatic Ambiguity can you pass the salt? The management of ambiguity is central to the success of CL Feb 2010 -- MR CLINT - Lecture 1

I made her duck I cooked a duck for her I cooked a duck belonging to her I created a duck for her I created a duck that now belongs to her I caused her to lower her head I turned her into a duck Feb 2010 -- MR CLINT - Lecture 1

Computer Science The study of basic concepts Information Data Algorithm Program The application of these concepts to practical tasks. Implementation of computational models from other fields (meteorology,..,linguistics) Feb 2010 -- MR CLINT - Lecture 1

Information Data Algorithm Program Information is a theoretical concept invented by Shannon in 1948 to measure uncertainty. The units of this measure are called bits. Length – metres Weight – kilos Information – bits 1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else). When I tell you that I have tea, I have conveyed one bit of information. The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome. Feb 2010 -- MR CLINT - Lecture 1

Information Data Algorithm Program A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means. Example: a telephone directory Unlike information, which is abstract, data is concrete Data has a certain level of structure. In the telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number. Feb 2010 -- MR CLINT - Lecture 1

Information Data Algorithm Program A completely defined procedure for the solution of a given problem in a finite number of steps Designed for a well-defined task. Finite description length. Guaranteed to terminate. Abstract Feb 2010 -- MR CLINT - Lecture 1

Algorithm for Chocolate Cake Feb 2010 -- MR CLINT - Lecture 1

Program to Add X and Y Read X and Y X = 2, Y = 3 subtract 1 from X add 1 to Y X = 0? Output Y no yes Feb 2010 -- MR CLINT - Lecture 1

Computer Program A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem. Concrete A program can implement an algorithm. More than one program may implement the same algorithm. Not all programs express good algorithms! Feb 2010 -- MR CLINT - Lecture 1

Instructions vs. Execution Steps Read X Read Y X = X-1 Y = Y+1 If X = 0 then Print(X) else goto 3 How many instructions? How many execution steps? Feb 2010 -- MR CLINT - Lecture 1

Algorithms and Linguistics Do linguistic theories in the abstract make sense? Linguistic theory explain linguistic knowledge in the form of grammar rules theories about grammar rules But performance, involves processing issues: Feb 2010 -- MR CLINT - Lecture 1

Computational Linguistics – Issues How are a grammar and a lexicon represented? How is the structure of a given sentence actually discovered? How can we actually generate a sentence to express a particular intended meaning? How can linguistic theory be made concrete enough to test algorithmically? Can an artificial system learn a language with limited exposure to grammatical sentences? Feb 2010 -- MR CLINT - Lecture 1

Computers and Language Twin Goals Scientific Goal: Contribute to Linguistics by adding a computational dimension. Technological Goal: Develop machinery capable of handling human language that can support “language engineering” Feb 2010 -- MR CLINT - Lecture 1

Computers and Language Tools & Resources Grammar Formalisms, e.g. Definite Clause Grammars Parsing Algorithms sentence  structure Generation Algorithms structure  sentence Statistical Methods Linguistic Corpora Feb 2010 -- MR CLINT - Lecture 1

Computers and Language: Applications Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Multimodal Interaction Machine Translation Feb 2010 -- MR CLINT - Lecture 1

LECTURES 1 Overview 2 Chomsky Hierarchy 3 4 5 Computational Syntax 6 Agreement & Subcategorisation 7 8 9 Corpora, Tools and Techniques 10 Morphology 11 Computational Morphology 12 13 14 Revision Feb 2010 -- MR CLINT - Lecture 1