Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Linguistics INTroduction

Similar presentations


Presentation on theme: "Computational Linguistics INTroduction"— Presentation transcript:

1 Computational Linguistics INTroduction
Lecture 1 Computers and Language

2 Course Information Course Website http://staff.um.edu.mt/mros1/lin2160
Lecturers Book Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2009, ISBN Natural Language Toolkit (NLTK) Feb MR CLINT - Lecture 1

3 CL: Two Main Disciplines
LINGUISTICS COMP SCI language and computers Feb MR CLINT - Lecture 1

4 Language and Computers includes …
Natural Language Processing (NLP) Computational models of language analysis, interpretation, and generation. syntax/semantics interface Human Language Technology emphasis on large-scale performance example1: Google search example2: speech technology Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts Feb MR CLINT - Lecture 1

5 Linguistics Phonetics: The study of speech sounds
Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use Feb MR CLINT - Lecture 1

6 Noam Chomsky Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central. Chomsky has been the dominant figure in linguistics ever since. Chomsky invented the generative approach to grammar. Feb MR CLINT - Lecture 1

7 Generative Grammar: Some Key Points
Theory of grammar includes mathematical definition of what a grammar is. A language is a (possibly infinite) set of sentences. But a grammar is finite. Grammar generates all and only sentences of a language. Undergeneration Overgeneration [source: Sag & Wasow] Feb MR CLINT - Lecture 1

8 Generative Power of a Grammar
L G G L overgeneration all but not only undergeneration only but not all L G all and only Feb MR CLINT - Lecture 1

9 Formal Grammar Grammar is a set of rewrite rules
Rules have the form LHS  RHS LHS can be rewritten as RHS LHS & RHS are sequences made of words or symbols Lexicon specifies words and their categories Category  word Category can be rewritten as word Feb MR CLINT - Lecture 1

10 A Simple Grammar/Lexicon
S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill S NP N John kicks NP V VP N Bill Feb MR CLINT - Lecture 1

11 Formal v. Natural Languages
Formal Languages Arithmetic Logic x man(x)  mortal(x) URL Natural Languages English John saw the dog German Johann hat den hund gesehen Maltese Ġianni ra kelb Feb MR CLINT - Lecture 1

12 Some Points of Similarity
Sentences are sequences of words (or symbols). Rules determine which sequences are valid sentences. Sentences have a definite structure. Sentence structure systematically related to meaning. Feb MR CLINT - Lecture 1

13 Structure Affects Meaning
I shot an elephant in my trousers Feb MR CLINT - Lecture 1

14 Points of Difference Formal Languages The grammar defines the language
Restricted application Non ambiguous Natural Languages The language defines the grammar Universal application Highly ambiguous Feb MR CLINT - Lecture 1

15 Ambiguity Morphological Ambiguity en-large-ment
Lexical Ambiguity Iraqi Head Seeks Arms Syntactic Ambiguity small animals and children laugh Semantic Ambiguity every girl loves a sailor Pragmatic Ambiguity can you pass the salt? The management of ambiguity is central to the success of CL Feb MR CLINT - Lecture 1

16 I made her duck I cooked a duck for her
I cooked a duck belonging to her I created a duck for her I created a duck that now belongs to her I caused her to lower her head I turned her into a duck Feb MR CLINT - Lecture 1

17 Computer Science The study of basic concepts
Information Data Algorithm Program The application of these concepts to practical tasks. Implementation of computational models from other fields (meteorology,..,linguistics) Feb MR CLINT - Lecture 1

18 Information Data Algorithm Program
Information is a theoretical concept invented by Shannon in 1948 to measure uncertainty. The units of this measure are called bits. Length – metres Weight – kilos Information – bits 1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else). When I tell you that I have tea, I have conveyed one bit of information. The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome. Feb MR CLINT - Lecture 1

19 Information Data Algorithm Program
A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means. Example: a telephone directory Unlike information, which is abstract, data is concrete Data has a certain level of structure. In the telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number. Feb MR CLINT - Lecture 1

20 Information Data Algorithm Program
A completely defined procedure for the solution of a given problem in a finite number of steps Designed for a well-defined task. Finite description length. Guaranteed to terminate. Abstract Feb MR CLINT - Lecture 1

21 Algorithm for Chocolate Cake
Feb MR CLINT - Lecture 1

22 Program to Add X and Y Read X and Y X = 2, Y = 3 subtract 1 from X
add 1 to Y X = 0? Output Y no yes Feb MR CLINT - Lecture 1

23 Computer Program A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem. Concrete A program can implement an algorithm. More than one program may implement the same algorithm. Not all programs express good algorithms! Feb MR CLINT - Lecture 1

24 Instructions vs. Execution Steps
Read X Read Y X = X-1 Y = Y+1 If X = 0 then Print(X) else goto 3 How many instructions? How many execution steps? Feb MR CLINT - Lecture 1

25 Algorithms and Linguistics
Do linguistic theories in the abstract make sense? Linguistic theory explain linguistic knowledge in the form of grammar rules theories about grammar rules But performance, involves processing issues: Feb MR CLINT - Lecture 1

26 Computational Linguistics – Issues
How are a grammar and a lexicon represented? How is the structure of a given sentence actually discovered? How can we actually generate a sentence to express a particular intended meaning? How can linguistic theory be made concrete enough to test algorithmically? Can an artificial system learn a language with limited exposure to grammatical sentences? Feb MR CLINT - Lecture 1

27 Computers and Language Twin Goals
Scientific Goal: Contribute to Linguistics by adding a computational dimension. Technological Goal: Develop machinery capable of handling human language that can support “language engineering” Feb MR CLINT - Lecture 1

28 Computers and Language Tools & Resources
Grammar Formalisms, e.g. Definite Clause Grammars Parsing Algorithms sentence  structure Generation Algorithms structure  sentence Statistical Methods Linguistic Corpora Feb MR CLINT - Lecture 1

29 Computers and Language: Applications
Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Multimodal Interaction Machine Translation Feb MR CLINT - Lecture 1

30 LECTURES 1 Overview 2 Chomsky Hierarchy 3 4 5 Computational Syntax 6
Agreement & Subcategorisation 7 8 9 Corpora, Tools and Techniques 10 Morphology 11 Computational Morphology 12 13 14 Revision Feb MR CLINT - Lecture 1


Download ppt "Computational Linguistics INTroduction"

Similar presentations


Ads by Google