IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03
IIT Bombay ISTE, IITB, Mumbai, 28 March, Speech units Sentences & phrases Words Syllables Phonemes Subphonemic acoustic segments Speech features Prosodic (suprasegmental) features Intensity variation Pitch variation Phonemic features Articulatory Acoustic Perceptual
IIT Bombay ISTE, IITB, Mumbai, 28 March, Classification of phonemes Vowels Pure vowels Diphthongs Consonants Semivowels Whisper Stops Nasals Fricatives Affricates
IIT Bombay ISTE, IITB, Mumbai, 28 March, Speech production system
IIT Bombay ISTE, IITB, Mumbai, 28 March, Schematic of speech production
IIT Bombay ISTE, IITB, Mumbai, 28 March, Vovel spectrum
IIT Bombay ISTE, IITB, Mumbai, 28 March, Speech synthesis Generation of speech by a machine Applications Voice response systems (limited vocabulary) Text-to-speech synthesis (unlimited vocabulary) Analysis-by-synthesis (speech research) Generation of speech-like test signals Analysis-synthesis systems * channel capacity reduction* secure commn. * speech enhancement * voice transformation * processing for hearing aids
IIT Bombay ISTE, IITB, Mumbai, 28 March, Development of speech synthesizers Mechanical / electro-mechanical ( ) Electronic analog with key-board input (1930’s) Electronic analog analysis-synthesis systems ( ) Digital synthesizer (1950..) * software based* hardware based
IIT Bombay ISTE, IITB, Mumbai, 28 March, Mechanical synthesizers Von Kempelen, 1780 Wheatstone’s speaking machine
IIT Bombay ISTE, IITB, Mumbai, 28 March, Riesz, 1930’s: Speaking machine
IIT Bombay ISTE, IITB, Mumbai, 28 March, Dudley, 1930s: Voder Electronic analog synthesizer with mechanical keyboard
IIT Bombay ISTE, IITB, Mumbai, 28 March, Fant, 1950s: OVE
IIT Bombay ISTE, IITB, Mumbai, 28 March, Holmes, 1960s: Parallel formant synth.
IIT Bombay ISTE, IITB, Mumbai, 28 March, Klatt, 1970s: Cascade/parallel formant synth.
IIT Bombay ISTE, IITB, Mumbai, 28 March, Modern synthesis approaches Waveform based high quality natural output limited vocabulary large storage requirement Speech model based unlimited speech synthesis with small storage difficulty in parameter generation & concatenation Text-to-speech synthesis Text pre-processing & phonetic transcription Parsing for syntactic & semantic structure Prosodic information & Sound units Speech waveform generation
IIT Bombay ISTE, IITB, Mumbai, 28 March, Speech model based approaches Articulatory Source-filter * channel vocoder * LPC vocoder * homomorphic vocoder * formant-based synthesizer Acoustic * phase vocoder * sinusoidal model * harmonic plus noise model (HNM)
IIT Bombay ISTE, IITB, Mumbai, 28 March, HARMONIC PLUS NOISE MODEL (Stylianou, 1995; 2001) Speech signal divided into: harmonic part noise part Harmonic part Noise part Parameters: Harmonic amplitudes and phases max. voiced frequency V/UV & pitch noise parameters
IIT Bombay ISTE, IITB, Mumbai, 28 March, IMPLEMENTATION OF HNM
IIT Bombay ISTE, IITB, Mumbai, 28 March, ANALYSIS
IIT Bombay ISTE, IITB, Mumbai, 28 March, SYNTHESIS
IIT Bombay ISTE, IITB, Mumbai, 28 March, SEGMENT CONCATENATION For generation of longer units from smaller ones. Steps: 1) Parsing of phonetic transcript 2) Fetching the parameters of required units 3) Pitch and intensity modifications for prosody 4) Smoothening of the parameter tracts at unit boundaries 5) Interpolation of the parameters over the frame length from end point values 6) Synthesis
IIT Bombay ISTE, IITB, Mumbai, 28 March, RESULTS All VCV syllables and vowels natural & intelligible if synthesized using harmonic part only, except / a∫a / and / asa / HNM preserve the styles (anger, high articulatory rate) Synthesized /a∫a/ Synthesized /asa/
IIT Bombay ISTE, IITB, Mumbai, 28 March, RESULTS (continued) GCIs from glottal signal give better synthesis. Pitch contours for "/ ap kΛhœn ja rΛhE hœn /" From glottal signal From speech (Childers and Hu’s, 1994)
IIT Bombay ISTE, IITB, Mumbai, 28 March, RESULTS (continued) Good quality of the larger units constructed from prarameters of the smaller units. Recorded /Λb h Imani/ Synthesized from /Λb h I/, /Ima/, /ani/
IIT Bombay ISTE, IITB, Mumbai, 28 March, /a/ R HN H Cardinal Vowels /I/ R HN H /u/ R HN H R HN H R HN H Stops /a k a / /a g a/ /a k h a/ /a g h a/ Fricatives /a ∫ a / /a s a/ Affricates /a t∫ a / /a d z a/ /a t∫ h a / /a dz h a/ Word / Λb h Imani / R HN Sentence / mœn dzΛmmu ja rΛha hun / R HN DEMONSTRATIONS
IIT Bombay ISTE, IITB, Mumbai, 28 March, Further developments High quality multilingual / multi-dialect text-to-speech synthesis Voice transformations Processing for aids for the hearing impaired
IIT Bombay ISTE, IITB, Mumbai, 28 March,