Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)

Slides:

Advertisements

Similar presentations

1 Speech Sounds Introduction to Linguistics for Computational Linguists.

Advertisements

Building an ASR using HTK CS4706

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

The Sound Patterns of Language: Phonology

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.

Results ISI Variance in STP Corpus ISI Variance in BU Corpus * p

Tone, Accent and Stress February 14, 2014 Practicalities Production Exercise #2 is due at 5 pm today! For Monday after the break: Yoruba tone transcription.

Making & marking text for synthesis Caroline Henton 10 August 2006.

ACCENT & DIALECT IDENTIFICATION CHUCK CURTIS LING575 – DISCOURSE & DIALOGUE 6/1/2011.

MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.

Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):

Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704

CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech.

The Relation Between Stress Accent and Pronunciation Variation in Spontaneous American English Discourse Steven Greenberg, Hannah Carvey, Leah Hitchcock.

Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.

Chapter three Phonology

Teaching Pronunciation

Linguistics week 9 Phonology 2.

-- A corpus study using logistic regression Yao 1 Vowel alternation in the pronunciation of THE in American English.

Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.

Speech synthesis Recording and sampling Speech recognition Apr. 5

Speech Recognition Final Project Resources

Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.

A Tutorial on Pronunciation Modeling for Large Vocabulary Speech Recognition Dr. Eric Fosler-Lussier Presentation for CiS 788.

McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)

STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.

Phonetics and Phonology

Structure of Spoken Language

Introduction to Florian Jaeger, For the Methods class, December 3 rd, 2003.

Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-

Search and Decoding in Speech Recognition Phonetics.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

Prof. Pushpak Bhattacharyya, IIT Bombay.1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture /09/05.

1 Phonetics and Phonemics. 2 Phonetics and Phonemics : Phonetics The principle goal of Phonetics is to provide an exact description of every known speech.

The Phonetic Patterning of Spontaneous American English Discourse Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

English Linguistics: An Introduction

1 CS 551/651: Structure of Spoken Language Lecture 6: Phonological Processes John-Paul Hosom Fall 2008.

Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.

Daniel May Department of Electrical and Computer Engineering Mississippi State University Analysis of Correlation Dimension Across Phones.

Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.

Quantitative and qualitative differences in understanding sentences interrupted with noise by young normal-hearing and elderly hearing-impaired listeners.

Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.

Introduction to Linguistics Ms. Suha Jawabreh Lecture # 8.

Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

A Fully Annotated Corpus of Russian Speech

Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.

Robust speaking rate estimation using broad phonetic class recognition Jiahong Yuan and Mark Liberman University of Pennsylvania Mar. 16, 2010.

A quick walk through phonetic databases Read English –TIMIT –Boston University Radio News Spontaneous English –Switchboard ICSI transcriptions –Buckeye.

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.

ONZEminer Margaret Maclagan, ONZE director Robert Fromont, designer.

Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.

Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Temporal Properties of Spoken Language Steven Greenberg In Collaboration with Hannah Carvey,

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.

Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.

A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,

Structure of Spoken Language

Structure of Spoken Language

Text-To-Speech System for English

Jennifer J. Venditti Postdoctoral Research Associate

Semester project Write a phonological and phonetic description of your native language using at least three published sources.

Audio Books for Phonetics Research

CS 188: Artificial Intelligence Spring 2006

Phonetics and Phonemics

Presentation transcript:

Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)

Before we get to the real stuff… This presentation will be available online at:  aterial/ling110/ aterial/ling110/ Local support Where are our corpora? Setting up your account on AFS

Local support Where can you get help with your project?  Your TA  The website (  The list (you have to subscribe  The corpus TA

Where are our corpora? (1) AFS:  AFS is Stanford’s file sharing system  The linguistic corpora are stored at: /afs/ir/data/linguistic-data/  You need to register for AFS access  You need to set up your account

Where are our corpora? (2) Corpus Computer  The computer is the one closest to the printer in the linguistics department’s computer cluster (MJH, 1 st floor) The corpora are stored on partition D:\ Mapping the drive via a network:

The real part Example project Overview of available corpora Where to find them How does the annotation look like? How to search speech corpora

Example projects (1) Differences in the realization of phonemes depending on their context  ‘Context’ can be segmental [1] How does the realization of syllabic /m/ differ depending on the preceding onset? Word final vowel aspiration  ‘Context’ can be supra-segmental: [3] How does the realization of syllabic /m/ differ at the beginning/end of conversations/utterances/sentences? Reduction of complex clusters

Example projects (2)  ‘Context’ could also include the register, style (formal vs. informal), genre (reading a fairy tale vs. reading an article), different dialects, etc. [2] Pitch contours related to specific meanings [1]  Steady-state pitch contours

Available corpora Handout in ra/material/X_speech_corpora/X_phonetic corpora.doc ra/material/X_speech_corpora/ See also: 

Switchboard – spontaneous AE speech  Transcripts uploaded to AFS: /afs/ir/data/linguistic-data/Switchboard/  Sound files available on CD  available in several formats: All in one file Separate files for  Syllables  Words  Orthographic transcription

Example annotations (Switchboard) Some files in Switchboard

Switchboard – all in one file Annotation key (1) Key: SENTENCE: word1 word2... (2005_A_0041) WORD: word canonical? [lm-probs] [rates] [positions] [morebigrams] part-of-speech phone1 phone2... SYL: baseform transcribed syl_structure stress length [lm-probs] [rates] [positions] PHONE: baseform stress syl_part [lm-probs] [rates] [positions] tran1 tran2...

Switchboard – all in one file Annotation key (2) [lm-probs]= trigram unigram trigram-unigram [rates]= seg_tr_syl seg_tr_phn lex_syl lex_phn enrate vrate nvrate mrate mfrate enmmfrate mmfrate [positions] = word_num_in_utterance word_num_in_turn [morebigrams] = bigram reverse-bigram reverse-trigram center-trigram part-of-speech = syntactic part of speech (currently only done for the word "to") wordX= word number X in acoustically segmented `sentence' canonical?= can if canonical (pronlex) pronunciation, alt otherwise trigram= p(word | previous two words) unigram= p(word) trigram-unigram = difference between two probabilities seg_tr_syl= transcribed syllable rate between closest two pauses seg_tr_phn= transcribed phone rate between closest two pauses lex_syl= lexical syllabic rate (i.e. as determined from wd transcription) lex_phn= lexical phone rate (i.e. as determined from wd transcription)

Switchboard – all in one file Annotation key (3) enrate= old enrate measure vrate= voicing rate nvrate= another voicing rate mrate= sub-part of mrate measure mfrate= sub-part of mrate measure enmmfrate= *this is what we call mrate* average of enrate, mrate, mfrate mmffrate= average of mrate, mfrate baseform= pronunciation as written in dictionary transcribed= transcribed syllable syl_structure= onset/nucleus/coda markings from dictionary stress= syllable stress marking from dictionary P=primary S=secondary N=none length= syllable length tranX= transcribed phone X corresponding to baseform phone

Arpabet

Example annotations (Switchboard – all in one file) SENTENCE: like finding a proper nursing home (2005_A_0041) WORD: like 1 can l ay k SYL: l_ay_k l_ay_k O_N_C P PHONE: l P O l PHONE: ay P N ay PHONE: k P C k WORD: finding 2 alt f ay n ih ng SYL: f_ay_n f_ay_n O_N_C P PHONE: f P O f PHONE: ay P N ay PHONE: n P C n SYL: d_ih_ng NULL_ih_ng O_N_C N PHONE: d N O NULL 1 27 PHONE: ih N N ih PHONE: ng N C ng

Boston Radio Transcripts Includes read news etc. (i.e. non- spontaneous read speech) Transcripts uploaded to AFS at:  /afs/ir/data/linguistic-data/Boston-University-Radio Sound files available on CD

Example annotations (Boston Radio) Boston News Corpus  H# 0 4  >endsil  DH 4 5  IH  S 19 9  >This  HH 28 5  AA  L 4212  AX 54 4  DCL 58 3  D 61 1  EY 6216  >holiday  S 7811  IY  Z103 7  EN11020  …

Example annotations (Boston Radio) XWAVES/PRAAT readable:  signal st43/f3ast43p1  type 1  color 76  font -*-times-medium-r-*-*-17-*-*-*-*-*-*-*  separator ;  nfields 1  #  H#  DH  IH+1  S  HH  AA+1  L  AX  DCL  D  EY  S  …

CALLHOME Mandarin - Transcripts CALLHOME – Mandarin  Transcripts uploaded to AFS: /afs/ir/data/linguistic-data/CALLHOME/CALLHOME- Mandarin-Transcripts/  Lexicon with pronunciation information available at: /afs/ir/data/linguistic-data/CALLHOME/CALLHOME- Mandarin-Lexicon/  Sound files only available on CD/DVD, but I could put them on the corpus computer

TIMIT – dialect variation Telephone recording of 8 major dialects of American English (orthographic) transcripts on AFS, sound files available on CD Comparable dialect corpora exist for the British Isles (IViE; stored on the corpus computer)

Example annotations (TIMIT) TIMIT  Word label (.wrd): she had your dark suit in greasy  Phonetic label (.phn): (Note: beginning and ending silence regions are marked with h#) h# sh iy hv ae dcl jh axr

How to search transcribed corpora? Either load the files into your favorite text editor Or use a command from the ‘grep’ family (run on a UNIX shell)  This allows you to search many files as once for patterns that are described by regular expressions  For help, see our tutorial page at: grep.html grep.html

Example annotations (Switchboard – all in one file) SENTENCE: like finding a proper nursing home (2005_A_0041) WORD: like 1 can l ay k SYL: l_ay_k l_ay_k O_N_C P PHONE: l P O l PHONE: ay P N ay PHONE: k P C k WORD: finding 2 alt f ay n ih ng SYL: f_ay_n f_ay_n O_N_C P PHONE: f P O f PHONE: ay P N ay PHONE: n P C n SYL: d_ih_ng NULL_ih_ng O_N_C N PHONE: d N O NULL 1 27 PHONE: ih N N ih PHONE: ng N C ng

Demo search egrep '^SYL: [a-z_]+ [a-z_]*ow.{1,3}m[a-z_]* ’ Actual phonological pattern