Jing-Shin Chang1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent.

Slides:



Advertisements
Similar presentations
Morphology Reading: Chap 3, Jurafsky & Martin Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: Some of the material in this slide.
Advertisements

Finite-state automata and Morphology
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Morphology.
1 Morphology September 2009 Lecture #4. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
1 Morphology September 4, 2012 Lecture #3. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Morphology, Phonology & FSTs Shallow Processing Techniques for NLP Ling570 October 12, 2011.
5/16/ ICS 482 Natural Language Processing Words & Transducers-Morphology - 1 Muhammed Al-Mulhem March 1, 2009.
Morphology Chapter 7 Prepared by Alaa Al Mohammadi.
Morphology. Overview We all have an internal mental dictionary called a lexicon Morphology is the study of words (the study of our lexicon) To look at.
Brief introduction to morphology
BİL711 Natural Language Processing1 Morphology Morphology is the study of the way words are built from smaller meaningful units called morphemes. We can.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Announcements  Revised Final Exam date:  THURSDAY 03/15/ :30-10:20 BAG 131.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Learning Bit by Bit Class 3 – Stemming and Tokenization.
Morphological analysis
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
Introduction to English Morphology Finite State Transducers
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
Ch4 – Features Consider the following data from Mokilese
Fall 2004 Lecture Notes #2 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.
Finite State Transducers
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Chapter III morphology by WJQ. Morphology Morphology refers to the study of the internal structure of words, and the rules by which words are formed.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
Morphology!. But puns first.  In partnership with Gabe  Have you seen Ken Burns' new documentary on the impact of yeast on agricultural societies? 
An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic By: Mohammed A. Attia Abbas Al-Julaih Natural Language Processing ICS.
Natural Language Processing Chapter 2 : Morphology.
MORPHOLOGY definition; variability among languages.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
MORPHOLOGY MULTIPLE AFFIXATION.
Morphology 1 : the Morpheme
Introduction to Linguistics Unit Four Morphology, Part One Dr. Judith Yoel.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Lecture 7 Summary Survey of English morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
Speech and Language Processing
Morphology: Parsing Words
Morphology.
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
CPSC 503 Computational Linguistics
Ambiguity At last, a computer that understands you like your mother.
Morphological Parsing
Introduction to English morphology
Presentation transcript:

Jing-Shin Chang1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent ways}  morpheme: small meaning bearing unit  e.g., books = book+s, cats = cat + s Classes of Morphemes  stem (root)  affixes ( 詞綴 ) Morphological Parsing (or Analysis):  breaking down surface forms (or input forms) into stem and affixes  e.g., foxes = “fox” + “-es” (+N, +PL)  stemming: mapping surface form to stem (extracting stem from surface form) Morphological Generation:  generate surface forms from stem and morphological features

Jing-Shin Chang2 Morphology & Finite-State Transducers Applications:  spelling check, tokenization for parsing Knowledge for Morphological Analysis  morphological rules (morphotactics): constituents of words & order  spelling rules (orthographic rules): spelling changes Dictionary/Lexicon:  list of stems and affixes  stems of regular words (plus irregular variants) as indexing keys  not efficient to enumerate all morphological variants  some morphemes are productive: can be applied to all words or new words (impossible to list all of them)  morphological variants depends on spelling as well as pronunciation  morphologically complex languages (e.g., Turkish) may have a large number of morphological variants

Jing-Shin Chang3 Morphology & Finite-State Transducers Models for morphological analysis/generation  generate-and-test: enumerate all possibilities & test against constraints  FSA / two-level FST model: modeling lexicon, morphological rules and orthographic rules as finite state automata or transducers

Jing-Shin Chang4 English Morphology Morphology:  the study of the way words are built up from smaller meaning-bearing units (morphemes)  morpheme: the minimal meaning-bearing unit in a language Classes of Morphemes  stem (root): main morpheme of the word, supplying main meaning  affixes ( 詞綴 ): add additional meanings Affixes:  prefixes: un-happy  suffixes: eat-s  infixes: inserted inside the stem  Philipine language Tagalog: hingi (“borrow”) => h-um-ingi (agent of borrow)  circumfixes:  sagen (“to say”) => ge-sag-t (“said”) (German) [pp]

Jing-Shin Chang5 English Morphology Affixes:  concatenative: prefix & suffixes  non-concatenative: infixes & templatic morphology Templatic: root-and-pattern  Arabic, Hebrew, Semitic languages  Hebrew: lmd (“learn”, “study”) (tri-consonantal root)  active voice template: CaCaC => lamad (‘he studied’)  intensive CiCeC template: => limed (‘he taught’)  intensive passive template CuCaC => lumad (‘he was taught’) Multiple affixes: un-believabl-y Agglutinative languages:  languages that tends to string affixes together (Turkish, Japanese, Korean)

Jing-Shin Chang6 English Morphology Infection:  stem + morphemes => same class  e.g., book + s => books (same meaning, same part of speech( 詞類 )) Derivation:  stem + morphemes => different class  e.g., computerize + ation => computerization [verb => noun]

Jing-Shin Chang7 English Morphology Inflectional Morphology  only Noun, Verb, Adjective, Adverb can be inflected Noun: Plural, Possessive  Regular: Plural (+s/+es/+ies), Possessive (+’s, +s’)  Irregular: ox-en, mouse => mice Verb (main/ 一般, modal/ 助, primary/be):  Forms: stem ( 現 / 不定 ), -s ( 現 /P3SG), -ing( 動名 / 現分 ), -ed ( 過 / 過分 / 完成 )  Regular: (+s/+es,-y+ies), -e+ing/+ing/+.ing (consonant doubling), +d/+ed/+.ed  Irregular: e.g., eat => ate, eaten (+en), catch => caught  Consonant doubling: ( 短母音 )+ 單子音 => double  -c => -ck (picnicked) Adjective/Adverb: comparative/extreme  happy => happier, happiest, happily

Jing-Shin Chang8 English Morphology Derivational Morphology  usually resulting in different classes  need part of speech (POS) conversion from root POS & affixes to get correct POS Nominalization: V/A => N  computerize => computerization  more examples … N/V => A  computation => computational  more examples …

Jing-Shin Chang9 Chinese Morphology Chinese Morphemes  hard to be distinguished from characters and words and compound words  free morphemes  bound morphemes Examples  副 - 總統, 前 - 妻, 非 - 經濟 ( 因素 )  學生 - 們  哈日 - 族, 銀髮 - 族  工業 - 化, 綠 - 化, 藍 - 化, 腐 - 化, 石 - 化, 神 - 化  公務 - 員, 業務 - 員, 推銷 - 員, 運動 - 員

Jing-Shin Chang10