Pushpak Bhattacharyya CSE Dept. IIT Bombay 1st Nov, 2012

Slides:



Advertisements
Similar presentations
Syllable Structure in English
Advertisements

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st March, 2011
Why prioritise marked consonants?
The sound patterns of language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 35– Phonetics and phonology; syllabification) Pushpak Bhattacharyya CSE Dept.,
The Sound Patterns of Language: Phonology
3. Suprasegmentals Suprasegmental features are those aspects of speech that involve more than single sound segments. The principal suprasegmentals are:
POLISH SYLLABLE IDALIA SMOCZYK. Outline n VARIOUS DEFINITIONS OF A SYLLABLE n SYLLABLES CAN BE DIVIDED INTO : n AMBIGUITY OF DIVISIONS n PHONOLOGICAL.
Phonetics II Marga Vinagre
Syllables Most of us have an intuitive feeling about syllables No doubt about the number of syllables in the majority of words. However, there is no agreed.
Syllables and Stress, part II October 22, 2012 Potentialities There are homeworks to hand back! Production Exercise #2 is due at 5 pm today! First off:
Lecture 4 The Syllable.
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
Syllable. Definition A syllable is a unit of sound composed of a central peak of sonority (usually a vowel), and the consonants that cluster around this.
CS : Speech, NLP and the Web/Topics in AI
SYLLABLE Pertemuan 6 Matakuliah: G0332/English Phonology Tahun: 2007.
1 L103: Introduction to Linguistics Phonetics—vowels.
Digital Systems: Hardware Organization and Design
Introduction to Linguistics Ms. Suha Jawabreh Lecture 10.
Contrastive Analysis The Syllable Structure Ling Prepared by: Ms. Sahar Deknash.
Syllable.
FIRE 2013 By:- Hardik Joshi 1, Apurva Bhatt 1, Honey Patel 2 1 Department of Computer Science, Gujarat.
Phonology Phonology is essentially the description of the systems and patterns of speech sounds in a language. It is, in effect, based on a theory of.
Syllabification Principles
Phonology, part 6: Syllables and Phonotactics November 7, 2012.
Lecture 3Part 1 Phonology Suprasegmental phonology the syllable
The sound patterns of language
Chapter three Phonology
Phonology LI Nathalie F. Martin.
Chapter 2 Introduction to articulatory phonetics
T RANSLITERATION INVOLVING E NGLISH AND H INDI LANGUAGES USING S YLLABIFICATION A PPROACH By Ankit Aggarwal (03d05009) Guided by Dr. Pushpak Bhattacharyya.
Last minute Phonetics questions?
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 36–syllabification and transliteration) Pushpak Bhattacharyya CSE Dept., IIT Bombay.
MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian.
Entropy in Machine Transliteration & Phonology Bhargava Reddy B.Tech Project.
Phonology, phonotactics, and suprasegmentals
…not the study of telephones!
Phonetics and Phonology
Transliteration Linguistic Enrichment of Statistical
© Crown copyright 2004 Bingo  Smallest unit of sound in a word, 44 in English, it can be represented by 1,2,3 or 4 letters phoneme.
CS460/626 : Natural Language Processing/Speech, NLP and the Web Lecture 33: Transliteration Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th Nov, 2012.
Phonology The sound patterns of language Nuha Alwadaani March, 2014.
English Linguistics: An Introduction
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-27: Phonology (quiz took place on 12/10/09; Lect 26.
Introduction to Linguistics Ms. Suha Jawabreh Lecture 9.
Phonology Moats Ch. 3. Phonetics vs. Phonology  Remember, phonetics is the ability to pronounce individual speech sounds  Phonology is the awareness.
Phonological Representations
CS460/626 : Natural Language Processing/Speech, NLP and the Web Lecture 28, 29: Phonetics, Phonology and Speech; introduce transliteration Pushpak Bhattacharyya.
Ch 3 Slide 1 Is there a connection between phonemes and speakers’ perception of phonetic differences? (audibility of fine distinctions) Due to phonology,
Phonology, Part VI: Syllables and Phonotactics November 4, 2009.
Chapter II phonology II. Classification of English speech sounds Vowels and Consonants The basic difference between these two classes is that in the production.
Introduction to Language Phonetics 1. Explore the relationship between sound and spelling Become familiar with International Phonetic Alphabet (IPA )
THE SOUND PATTERNS OF LANGUAGE
Ch4 – Features Features are partly acoustic partly articulatory aspects of sounds but they are used for phonology so sometimes they are created to distinguish.
The syllable. Early generative phonology didn't recognize the syllable as a relevant unit.
Technische Universität München Introduction to English Pronunciation Syllable Structure.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد 1 [ ] 1 King Faisal University.
Syllable.
English Vowels and diphthongs
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 31–Inside and Outside probabilities; PCFG training; start of phonetics and phonology)
King Faisal University [ ] 1 E-learning and Distance Education Deanship Department of English Language College of Arts King Faisal University Introduction.
Lecture 4 The Syllable.
Introduction to Linguistics
Syllable Structure, Stress
2016 January 14 More on the syllable.
PHONOLOGY Arranged By: Sapitra Rizky Tiara Ayu Yosi Ariani SyLLABLES
Introduction to Linguistics
Quichua The maximal template is CVC Is there an obligatory onset?
Review.
What is syllable?.
Presentation transcript:

Pushpak Bhattacharyya CSE Dept. IIT Bombay 1st Nov, 2012 CS626 : Natural Language Processing/Speech, NLP and the Web Lecture 30: Phonology, syllables; introduce transliteration Pushpak Bhattacharyya CSE Dept. IIT Bombay 1st Nov, 2012

Phonology: Syllables

Basic of syllables “Syllable is a unit of spoken language consisting of a single uninterrupted sound formed generally by a Vowel and preceded or followed by one or more consonants.” Vowels are the heart of a syllable (Most Sonorous Element) (svayam raajate iti svaraH) Consonants act as sounds attached to vowels.

Syllable structure A syllable consists of 3 major parts:- Onset (C) Nucleus (V) Coda (C) Vowels sit in the Nucleus of a syllable Consonants may get attached as Onset or Coda. Basic structure - CV

Possible syllable structures The Nucleus is always present Onset and Coda may be absent Possible structures V CV VC CVC

syllable theories Prominence Theory Chest Pulse Theory Sonority Theory E.g. entertaining /entəteɪnɪŋ/ The peaks of prominence: vowels /e ə eɪ ɪ/ Number of syllables: 4 Chest Pulse Theory Based on muscular activities Sonority Theory Based on relative soundness of segment within words

Introduction to sonority theory “The Sonority of a sound is its loudness relative to other sounds with the same length, stress and speech.” Some sounds are more sonorous Words in a language can be divided into syllables Sonority theory distinguishes syllables on the basis of sounds.

Sonority hierarchy Defined on the basis of amount of sound associated The sonority hierarchy is as follows:- Vowels (a, e, i, o, u) Liquids (y, r, l, v) Nasals (n, m) Fricatives (s, z, f,…..sh, th etc.) Affricates (ch, j) Stops (b, d, g, p, t, k)

Sonority scale Obstruents can be further classified into:- Fricatives Affricates Stops

Sonority theory & syllables “A Syllable is a cluster of sonority, defined by a sonority peak acting as a structural magnet to the surrounding lower sonority elements.” Represented as waves of sonority or Sonority Profile of that syllable Nucleus Onset Coda

Sonority sequencing principle “The Sonority Profile of a syllable must rise until its Peak(Nucleus), and then fall.” Peak (Nucleus) Onset Coda

examples ABHIJEET Profile-1 A JEET BHI Profile-2 JEET ABHI

Maximal onset principle “The Intervocalic consonants are maximally assigned to the Onsets of syllables in conformity with Universal and Language- Specific Conditions.” Determines underlying syllable division Example DIPLOMA DIP LO MA & DI PLO MA

Syllable Structure: a more detailed look Count of no. of syllables in a word is roughly/intuitively the no. of vocalic segments in a word. Thus, presence of a vowel is an obligatory element in the structure of a syllable. This vowel is called “nucleus”. Basic Configuration: (C)V(C). Part of syllable preceding the nucleus is called the onset. Elements coming after the nucleus are called the coda. Nucleus and coda together are referred to as the rhyme. S ≡ Syllable, O ≡ Onset R ≡ Rhyme, N ≡ Nucleus Co ≡ Coda

Syllable Structure: Examples ‘word’ ‘sprint’

Syllable Structure: Examples ‘may’ ‘opt’ ‘air’  No Coda.  No Onset.  No Coda, No Onset.

Syllable Structure Open Syllable: ends in vowel Closed syllable: ends in consonant or consonant cluster Light Syllable: A syllable which is open and ends in a short vowel General Description – CV. Example, ‘air’. Heavy Syllable: Closed syllables or syllables ending in diphthong Example: ‘opt’ Example, ‘may’

Syllabification: Determining Syllable Boundaries Given a string of syllables (word), what is the coda of one and the onset of another? In a sequence such as VCV, where V is any vowel and C is any consonant, is the medial C the coda of the first syllable (VC.V) or the onset of the second syllable (V.CV)? To determine the correct groupings, there are some rules, two of them being the most important and significant: Maximal Onset Principle, Sonority Hierarchy

Constraints: Phonotactics Determines possible comb. of onsets and codas which can occur. Deals with restriction on the permissible comb. Of phonemes. Defines permissible syllable structure, consonant clusters and vowel sequence by means of phonotactical constraints. In general, rules operate around the sonority hierarchy. Fricative /s/ is lower on the sonority hierarchy than the lateral /l/, so the combination /sl/ is permitted in onsets and /ls/ is permitted in codas. Opposite is not allowed. Thus, ‘slips’ and ‘pulse’ are possible English words. ‘lsips’ and ‘pusl’ are not possible.

Constraints on Onsets One-consonant: Only /ŋ/ can’t be distributed in syllable-initial position. Two-consonant: We refer to the scale of sonority. Sequence ‘rn’ is ruled out since there is a decrease of sonority. Minimal Sonority Distance: Distance in sonority between the first and the second element in the onset must be of at least 2 degrees. Thus, on the basis of Sonority Hierarchy and Minimal Sonority Distance, only a limited no. of possible two-consonant clusters. Three-consonant: Restricted to licensed two-consonant onsets preceded by /s/. Also, /s/ can only be followed by a voiceless sound. Therefore, only /spl/, /spr/, /str/, /skr/, /spj/, /stj/, /skj/, /skw/, /skl/, /smj/ will be allowed. (splinter, spray, strong etc.) While /sbl/, /sbr/, /sdr/, /sgr/, /sθr/ will be ruled out.

Constraints on Onsets Possible 2-consonant clusters in an Onset

Constraints on Coda

Constraints on Coda

Other Constraints Nucleus: The following can occur as nucleus: All vowel sounds (monophthongs as well as diphthongs). /m/, /n/ and /l/ in certain situations (for example, ‘bottom’, ‘apple’) Syllabic: Both the onset and the coda are optional (as seen previously). /j/ at the end of an onset (/pj/, /bj/, /tj/, /dj/, /kj/, /fj/, /vj/, /θj/, /sj/, /zj/, /hj/, /mj/, /nj/, /lj/, /spj/, /stj/, /skj/) must be followed by /uɪ/ or /ʊə/. Long vowels and diphthongs are not followed by /ŋ/. /ʊ/ is rare in syllable-initial position. Stop + /w/ before /uɪ, ʊ, ʌ, aʊ/ are excluded.

Challenges in Machine Transliteration Lot of ambiguities at the grapheme level esp. while dealing with non-phonetic languages Example: Devanagari letter क has multiple grapheme mappings in English {ca, ka, qa, c, k, q, ck} Presence of silent letters Pneumonia – Difference of scripts causes spelling variations esp. for loan words नूमोनिया रिलीस, रिलीज, जार्ज, जॉर्ज, बैंक, बॅंक

Introducing Transliteraion Query युरोमधील वाढ Marathi Stemmer Stemmed Query यूरो  वाढ Translation Not Found Dictionary Lookup यूरो Found Translation Options Transliteration Inflation, rise, increase Euro Translation Disambiguation English IR Engine Final Translated Query Euro Inflation

Transliteration for OOV words Name searching (people, places, organizations) constitutes a large proportion of search Words of foreign origin in a language - Loan Words Example: Such words not found in the dictionary are called “Out Of Vocabulary (OOV) words” in CLIR OOV words are usually automatically “Transliterated” बस (bus), स्कूल (school)

Machine Transliteration – The Problem Graphemes – Basic units of written language (English – 26 letters, Devanagari – 92 matraas) Definition “The process of automatically mapping an given grapheme sequence in source language to a valid grapheme sequence in the target language such that it preserves the pronunciation of the original source word”

Redefining Machine Transliteration Transliteration so far has been considered as an independent module used in Machine Translation, CLIR etc. In CLIR, important for term to be present in index In the above context, we redefine machine transliteration as “The process of automatically mapping an given grapheme sequence in source language to an index item in the target language index such that it preserves the pronunciation of the original source word” Pronunciation usually difficult to model – we only work with graphemes