Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSA2050 Introduction to Computational Linguistics Lecture 3 Examples.

Similar presentations


Presentation on theme: "CSA2050 Introduction to Computational Linguistics Lecture 3 Examples."— Presentation transcript:

1 CSA2050 Introduction to Computational Linguistics Lecture 3 Examples

2 Mar 2005 -- MRCSA2050 - Lecture III: Examples2 Course Contents 1 (MR)Overview 2 (RF)Chomsky Hierarchy 3 (MR)Examples 4 (RF)Grammatical Categories 5, 6 (MR)Tagging 7 (RF)Morphology 8, 9, 10 (MR)Comp Morphology 11 (RF)Syntax 12, 13, 14(MR)Grammar Formalism

3 Mar 2005 -- MRCSA2050 - Lecture III: Examples3 Outline Examples in the areas of Tokenisation Morphological Analysis Tagging Syntactic Analysis

4 Mar 2005 -- MRCSA2050 - Lecture III: Examples4 Information Extraction raw texttokenisation morphological analysis named entity recognition tagged text syntactic analysis

5 Mar 2005 -- MRCSA2050 - Lecture III: Examples5 Tokenisation The basic idea of tokenisation is to identify the basic tokens that are present in a text. Mostly, tokens are the same as words, but not always Why should this be a problem? John’s car cost €10,000.00. “And it’s worth every penny”, he exclaimed.

6 Mar 2005 -- MRCSA2050 - Lecture III: Examples6 Tokenisation Problems Punctuation novel forms:.net, Micro$oft, :-) hyphenation: linebreaks vs word-internal: e-mail, 898-0587 multi-word: the 90-cent-an-hour raise confusion with dash apostrophes in contractions: we'll periods part of names: Amazon.com numerical expressions: $1.99 abbreviations, end of sentence, haplology commas: 1,000,000

7 Mar 2005 -- MRCSA2050 - Lecture III: Examples7 Other Problems Token-internal whitespace: 898 0464 Interaction: the New York-New Haven railroad Mixed language tokens : u Automated language guesser Token equivalence (when are two tokens the same)? Case-normalization. Sentence boundary detection. Inconsistency: database, data-base, data base Demo: xerox tokeniserxerox tokeniser

8 Mar 2005 -- MRCSA2050 - Lecture III: Examples8 Morphology Simple versus complex words dog dogs Complex words formed by concatenation of morphemes. Morpheme: The smallest unit in a word that bears some meaning, such as dog and s.

9 Mar 2005 -- MRCSA2050 - Lecture III: Examples9 Morphological Analysis Morphological analysis of a word involves a segmentation problem Segmentation: discovery of the component morphemes dogs → dog + s enlargement → en + large + ment Possible ambiguities: enlargement → enlarge + ment → en + largement Role of lexicon

10 Mar 2005 -- MRCSA2050 - Lecture III: Examples10 Morphological Analysis John has a couple of rabbits rabbits → rabbit + s s indicates plural of noun rabbit Is this the only possibility?

11 Mar 2005 -- MRCSA2050 - Lecture III: Examples11 Morphological Analysis John rabbits on and on rabbits → rabbit + s s indicates 3 rd person singular plural of verb rabbit The suffix “s” is a realisation of two entirely different morphemes. The morpheme is something more abstract than the string which realises it.

12 Mar 2005 -- MRCSA2050 - Lecture III: Examples12 Morphological Analysis +PL +3S -s-a suffix world morpheme world

13 Mar 2005 -- MRCSA2050 - Lecture III: Examples13 Morphological Analysis Morphological Parser Input Word rabbits Output Analysis rabbit N PL rabbit V 3S Output is a string of morphemes Morpheme is employed in a loose sense that is useful for further processing

14 Mar 2005 -- MRCSA2050 - Lecture III: Examples14 Morphological Analysis: ENGTWOL & Xerox Atro Voutilainen, Juha Heikkilä, Timo Järvinen and Lingsoft, Inc. 1993-1995 ENGTWOL demo Xerox morphological analysis

15 Mar 2005 -- MRCSA2050 - Lecture III: Examples15 Morphological Synthesis Morphological Parser Output Word rabbits Input rabbit N PL rabbit V 3S Input is a string of morphemes Ouput is a word

16 Mar 2005 -- MRCSA2050 - Lecture III: Examples16 Reversibility Lookup APPLY UP> left left leave+Verb+PastBoth+123SP left left+Adv left left+Adj left left+Noun+Sg Lookdown APPLY DOWN> leave+Adj left

17 Mar 2005 -- MRCSA2050 - Lecture III: Examples17 POS Tagging In POS tagging, the task is to assign the most appropriate morphosyntactic label from amongst those listed in the lexicon, given the context. John leaves presents. Proper Names

18 Mar 2005 -- MRCSA2050 - Lecture III: Examples18 Semantic Tagging Named Entity Recognition Basic idea is to recognise and tag named entities and classify them as being of type Persons Locations Organisations Named Entity Recognition - Demo

19 Mar 2005 -- MRCSA2050 - Lecture III: Examples19 Syntactic Analysis Problem: given sentence and grammar/lexicon, discover assigned tree structure. XIP Parser Demo


Download ppt "CSA2050 Introduction to Computational Linguistics Lecture 3 Examples."

Similar presentations


Ads by Google