Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université de Provence

Summary 1. The Aix-MARSEC Project Building Aix-MARSEC Availability of the database Methodology 2. Grapheme-Phoneme Conversion and Alignment The Aix-MARSEC Methodology Integration into PCE 3. Conclusion and Perspectives

The Aix-MARSEC Project

Automatic grapheme-to-phoneme conversion Automatic phoneme level alignment Automatic intonation annotation using the Momel-Intsint methodology 8 annotation levels aligned: phonemes, syllable constituents, syllables, words, feet and rhythmic units, tone groups, Intsint coding Tagging and parsing alignment under way The Aix-MARSEC Project An evolution from the SEC and MARSEC corpora SEC Spoken English Corpus 55,000 words, 339 min. and 18 sec. BBC 1980s recordings 11 speaking styles 53 (17 female and 36 male) speakers Orthographic transcription Syntactic tagging and parsing Prosodic annotation: 14 tonetic stress marks MARSEC Machine Readable SEC Aix-MARSEC Building Aix-MARSEC Alignment of words and tone groups with the signal Conversion of all the TSM to ASCII characters

The Aix-MARSEC Project

Availability of the database Online version: Annotation files (TextGrids) Phonemes data tables Perl and Praat scripts www.lpl.univ-aix.fr/~EPGA/ CD-Rom version: Annotation files (TextGrids) Phonemes data tables Perl and Praat scripts Sound files (.wav format)

The Aix-MARSEC Project Methodology Automatic alignment Orthographic transcription Raw phonemic transcription Optimised phonemic transcription Aligned phonemic transcription Elision prediction G2P conversion SC annotationSyllable annotation Word annotation TSM annotation Rhythmic annotation

Grapheme-Phoneme Conversion and Alignment

G2P Conversion and Alignment Orthographic transcription Raw phonemic transcription Optimised phonemic transcription Elision prediction G2P conversion The Aix-MARSEC Methodology Automatic alignment Aligned phonemic transcription SC annotationSyllable annotation Word annotation

G2P Conversion and Alignment Orthographic transcription Raw phonemic transcription G2P conversion The Aix-MARSEC Methodology

G2P Conversion and Alignment The Aix-MARSEC Methodology G2P Conversion: General principles Dictionary-based method (4 dictionaries used) Specific processing for numbers, abbreviations, etc. Syntagmatic effects (linking r, definite article) Raw transcription

G2P Conversion and Alignment The Aix-MARSEC Methodology G2P Conversion: The 4 dictionaries Primary pronunciation dictionary (Advanced Learners Dictionary, Oxford University Press; 71 000 entries) Complementary dictionary (700 entries) Problematic forms dictionary (for hesitations, partial words,…; 26 entries) Reduced forms dictionary (75 entries)dictionary

G2P Conversion and Alignment The Aix-MARSEC Methodology G2P Conversion: Specific issues Abbreviations Numbers Sequences of numbers and capitals (Post Codes) Genitives and Contractions 3 rd person and plural forms Preterite and past participle forms

G2P Conversion and Alignment Orthographic transcription Raw phonemic transcription G2P conversion The Aix-MARSEC Methodology Optimised phonemic transcription Elision prediction

G2P Conversion and Alignment The Aix-MARSEC Methodology Elision Prediction: General principles Raw transcription citation forms Continuous speech specific phenomena (elisions, epenthesis, metathesis, etc.)

G2P Conversion and Alignment The Aix-MARSEC Methodology Elision prediction: Constraints - Intonation constraints (TSM) - Temporal constraints: Minimal threshold: 5ms Thresholds for specific phonemes (Klatt, 1979) /t – d/= 55ms; /@/= 55ms; /T/= 110ms Lengthening « z » factor: z < 0 elision z 0 no elision - Phonotactic constraints (rules)

G2P Conversion and Alignment Elision prediction: Rules 1 Th.: duration threshold

G2P Conversion and Alignment Elision prediction: Evaluation 4077 elided phonemes out of 199,770 in the corpus ( 2 %) Half of all elisions are correctly predicted ¾ predicted elisions are correct Global quality of the algorithm

G2P Conversion and Alignment Orthographic transcription Raw phonemic transcription Optimised phonemic transcription Elision prediction G2P conversion The Aix-MARSEC Methodology Automatic alignment Aligned phonemic transcription

G2P Conversion and Alignment Alignment: General principles HMM and Viterbi based alignment by Christophe Lévy (LIA, France) - HMM trained on the TIMIT corpus of American English - Gaussian Mixture Model (8 components & diagonal covariance matrices estimated through the Expectation-Maximisation algorithm optimising the Maximum-Likelihood criterion) - 12 MFCC (filter bank analysis) increased by energy, delta and delta-delta coefficients 39-coefficient vector per speech frame

G2P Conversion and Alignment Absolute mean error: 22 ms Mean error: - 6,29 ms Kurtosis: 8,15 (narrow distribution) Skewness: -0,94 (left bias) Alignment: Evaluation

G2P Conversion and Alignment Acceptance Threshold Optimised transcription 64 ms93.25 % 32 ms82.02 % 20 ms68.37 % 16 ms59.97 % 15 ms57.40 % 10 ms42.43 % 5 ms23.72 % Alignment: Evaluation

Integration into PCE Integration: Motivations Double focus: Segmental phenomena Prosodic phenomena Formant charts Tonal alignment Phoneme level alignment For phoneticians and phonologists

Integration into PCE Integration: 2 possible policies Direct integration: Exact Aix-MARSEC methodology Requires word level manual alignment Alternative integration: Adaptation of the Aix-MARSEC methodology Optional elisions predicted on the basis of phonotactic rules only + decision during the alignment phase

Conclusions and Perspectives

An easily evolutive fully automatic methodology Diverse types of phonological / phonetic segmental / prosodic exploitation (formant charts, temporal, intonational and metrical studies, …) Full interactivity with other ProZEd modules (Momel-Intsint, …) Realistic integration into PCE (2 options)

Well… This time its for good !! Presentation available from www.lpl.univ-aix.fr/~EPGA/

14 ASCII prosodic annotation symbols: _low level ~high level <step-down >step-up / (high) rise-fall /high \high fall fall-rise /high rise,low rise low fall,\(low rise-fall – not used) \,low fall-rise *stressed but unaccented |minor intonation unit boundary ||major intonation unit boundary (Roach, 1994) Back to the presentation

Reduced forms processing Creation of a reduced forms dictionary based on OConnor (1967) and Faure (1975) Reduction constraint: TSM absence Aim: improving G2P conversion Back to the presentation Example: TSM: /and converted into /{nd/ No TSM: and converted into /@nd/

Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Similar presentations

Presentation on theme: "Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Similar presentations

Presentation on theme: "Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université"— Presentation transcript:

Similar presentations

About project

Feedback