Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
Before we get to the real stuff… This presentation will be available online at: aterial/ling110/ aterial/ling110/ Local support Where are our corpora? Setting up your account on AFS
Local support Where can you get help with your project? Your TA The website ( The list (you have to subscribe The corpus TA
Where are our corpora? (1) AFS: AFS is Stanford’s file sharing system The linguistic corpora are stored at: /afs/ir/data/linguistic-data/ You need to register for AFS access You need to set up your account
Where are our corpora? (2) Corpus Computer The computer is the one closest to the printer in the linguistics department’s computer cluster (MJH, 1 st floor) The corpora are stored on partition D:\ Mapping the drive via a network:
The real part Example project Overview of available corpora Where to find them How does the annotation look like? How to search speech corpora
Example projects (1) Differences in the realization of phonemes depending on their context ‘Context’ can be segmental [1] How does the realization of syllabic /m/ differ depending on the preceding onset? Word final vowel aspiration ‘Context’ can be supra-segmental: [3] How does the realization of syllabic /m/ differ at the beginning/end of conversations/utterances/sentences? Reduction of complex clusters
Example projects (2) ‘Context’ could also include the register, style (formal vs. informal), genre (reading a fairy tale vs. reading an article), different dialects, etc. [2] Pitch contours related to specific meanings [1] Steady-state pitch contours
Available corpora Handout in ra/material/X_speech_corpora/X_phonetic corpora.doc ra/material/X_speech_corpora/ See also:
Switchboard – spontaneous AE speech Transcripts uploaded to AFS: /afs/ir/data/linguistic-data/Switchboard/ Sound files available on CD available in several formats: All in one file Separate files for Syllables Words Orthographic transcription
Example annotations (Switchboard) Some files in Switchboard
Switchboard – all in one file Annotation key (1) Key: SENTENCE: word1 word2... (2005_A_0041) WORD: word canonical? [lm-probs] [rates] [positions] [morebigrams] part-of-speech phone1 phone2... SYL: baseform transcribed syl_structure stress length [lm-probs] [rates] [positions] PHONE: baseform stress syl_part [lm-probs] [rates] [positions] tran1 tran2...
Switchboard – all in one file Annotation key (2) [lm-probs]= trigram unigram trigram-unigram [rates]= seg_tr_syl seg_tr_phn lex_syl lex_phn enrate vrate nvrate mrate mfrate enmmfrate mmfrate [positions] = word_num_in_utterance word_num_in_turn [morebigrams] = bigram reverse-bigram reverse-trigram center-trigram part-of-speech = syntactic part of speech (currently only done for the word "to") wordX= word number X in acoustically segmented `sentence' canonical?= can if canonical (pronlex) pronunciation, alt otherwise trigram= p(word | previous two words) unigram= p(word) trigram-unigram = difference between two probabilities seg_tr_syl= transcribed syllable rate between closest two pauses seg_tr_phn= transcribed phone rate between closest two pauses lex_syl= lexical syllabic rate (i.e. as determined from wd transcription) lex_phn= lexical phone rate (i.e. as determined from wd transcription)
Switchboard – all in one file Annotation key (3) enrate= old enrate measure vrate= voicing rate nvrate= another voicing rate mrate= sub-part of mrate measure mfrate= sub-part of mrate measure enmmfrate= *this is what we call mrate* average of enrate, mrate, mfrate mmffrate= average of mrate, mfrate baseform= pronunciation as written in dictionary transcribed= transcribed syllable syl_structure= onset/nucleus/coda markings from dictionary stress= syllable stress marking from dictionary P=primary S=secondary N=none length= syllable length tranX= transcribed phone X corresponding to baseform phone
Arpabet
Example annotations (Switchboard – all in one file) SENTENCE: like finding a proper nursing home (2005_A_0041) WORD: like 1 can l ay k SYL: l_ay_k l_ay_k O_N_C P PHONE: l P O l PHONE: ay P N ay PHONE: k P C k WORD: finding 2 alt f ay n ih ng SYL: f_ay_n f_ay_n O_N_C P PHONE: f P O f PHONE: ay P N ay PHONE: n P C n SYL: d_ih_ng NULL_ih_ng O_N_C N PHONE: d N O NULL 1 27 PHONE: ih N N ih PHONE: ng N C ng
Boston Radio Transcripts Includes read news etc. (i.e. non- spontaneous read speech) Transcripts uploaded to AFS at: /afs/ir/data/linguistic-data/Boston-University-Radio Sound files available on CD
Example annotations (Boston Radio) Boston News Corpus H# 0 4 >endsil DH 4 5 IH S 19 9 >This HH 28 5 AA L 4212 AX 54 4 DCL 58 3 D 61 1 EY 6216 >holiday S 7811 IY Z103 7 EN11020 …
Example annotations (Boston Radio) XWAVES/PRAAT readable: signal st43/f3ast43p1 type 1 color 76 font -*-times-medium-r-*-*-17-*-*-*-*-*-*-* separator ; nfields 1 # H# DH IH+1 S HH AA+1 L AX DCL D EY S …
CALLHOME Mandarin - Transcripts CALLHOME – Mandarin Transcripts uploaded to AFS: /afs/ir/data/linguistic-data/CALLHOME/CALLHOME- Mandarin-Transcripts/ Lexicon with pronunciation information available at: /afs/ir/data/linguistic-data/CALLHOME/CALLHOME- Mandarin-Lexicon/ Sound files only available on CD/DVD, but I could put them on the corpus computer
TIMIT – dialect variation Telephone recording of 8 major dialects of American English (orthographic) transcripts on AFS, sound files available on CD Comparable dialect corpora exist for the British Isles (IViE; stored on the corpus computer)
Example annotations (TIMIT) TIMIT Word label (.wrd): she had your dark suit in greasy Phonetic label (.phn): (Note: beginning and ending silence regions are marked with h#) h# sh iy hv ae dcl jh axr
How to search transcribed corpora? Either load the files into your favorite text editor Or use a command from the ‘grep’ family (run on a UNIX shell) This allows you to search many files as once for patterns that are described by regular expressions For help, see our tutorial page at: grep.html grep.html
Example annotations (Switchboard – all in one file) SENTENCE: like finding a proper nursing home (2005_A_0041) WORD: like 1 can l ay k SYL: l_ay_k l_ay_k O_N_C P PHONE: l P O l PHONE: ay P N ay PHONE: k P C k WORD: finding 2 alt f ay n ih ng SYL: f_ay_n f_ay_n O_N_C P PHONE: f P O f PHONE: ay P N ay PHONE: n P C n SYL: d_ih_ng NULL_ih_ng O_N_C N PHONE: d N O NULL 1 27 PHONE: ih N N ih PHONE: ng N C ng
Demo search egrep '^SYL: [a-z_]+ [a-z_]*ow.{1,3}m[a-z_]* ’ Actual phonological pattern