Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis

Similar presentations


Presentation on theme: "Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis"— Presentation transcript:

1 Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis gazepidis@ist.edu.gr

2 2 Outline  Introduction  Topics in speech processing Speech coding Speech recognition Speech synthesis Speaker verification/recognition  Audio Elements  Conclusion Speech in Multimedia

3 3 Introduction  Speech is our basic communication tool.  We have been hoping to be able to communicate with machines using speech. Speech in Multimedia

4 4 Speech Production Model Speech in Multimedia Anatomy Structure Mechanical Model

5 5 Speech Production Model Speech in Multimedia Waveform Spectrogram Speech

6 6 Voiced and Unvoiced Speech Speech in Multimedia Silenceunvoiced voiced

7 7 Short Time Parameters Speech in Multimedia Short time power Waveform Envelop

8 8 Short Time Parameters (cont.) Speech in Multimedia Zero crossing rate Pitch period

9 9 Linear Predictive Coding (LPC) Speech Coder Speech in Multimedia Speech buffer Speech Analysis Pitch Voiced/ unvoiced Vocal track Parameter Energy Parameter Quantizer Code generation speech Code stream Frame n Frame n+1

10 10 LPC and Vocal Track Speech in Multimedia  Mathematically, speech can be modeled as the following generation model:  {a 1, a 2, …, a k } are called Linear Prediction Coefficients (LPC), which can be used to model the shape of vocal track.  e(n) is the excitation to generate the speech. x(n) =  p=1 k a p x(n-p) + e(n)

11 11 An Example for Synthesizing Speech Speech in Multimedia Blending region Glottal Pulse Go through vocal track filter with gain control Go through radiation filter

12 12 Speech Recognition Speech in Multimedia  Speech recognition is the foundation of human computer interaction using speech.  Speech recognition in different contexts Dependent or independent on the speaker. Discrete words or continuous speech. Small vocabulary or large vocabulary. In quiet environment or noisy environment. Parameter analyzer Comparison and decision algorithm Language model Reference patterns speech Words

13 13 How does Speech Recognition Work? Speech in Multimedia Words: grey whales Phonemes: g r e y w e y l z Each phoneme has different characteristics (for example, The power distribution).

14 14 Speech Recognition Speech in Multimedia g g r ey ey ey ey w ey ey l l z How do we “match” the word when there are time and other variations?

15 15 Dynamic Programming in Decoding Speech in Multimedia time states We can find a path that corresponds to max-probable phonemes to generate the observation “feature” (extracted in each speech frame) sequence.

16 16 Speech Synthesis Speech in Multimedia  Speech synthesis is to generate (arbitrary) speech with desired prosperities (pitch, speed, loudness, articulation mode, etc.)  Speech synthesis has been widely used for text-to-speech systems and different telephone services.  The easiest and most often used speech synthesis method is waveform concatenation Increase the pitch without changing the speed

17 17 Speaker Recognition Speech in Multimedia  Identifying or verifying the identity of a speaker is an application where computer exceeds human being.  Vocal track parameter can be used as a feature for speaker recognition. Speaker oneSpeaker two

18 18 Applications Speech in Multimedia Speech recognition Call routing Directory Assistance Operator Services Document input Speaker recognition Personalized service Fraud Control Text-to-Speech synthesis Speech Interface Document Correction Voice Commands Speech Coding Wireless Telephone Voice over Internet

19 19 Audio Elements in Speech Audio Elements Auditory icons use an intuitive linkage between the model world of sonically represented objects and events, using sounds familiar to listeners from the everyday world. Auditory Icons Earcons Earcons are short, structured musical phrases that can be parameterized to communicate information in an Auditory Display.

20 20 Audio Elements in Speech Earcons An earcon is the audio equivalent of an icon and just like visual icons we hear earcons throughout the day. Its job is to communicate meaning through the use of sound. What’s powerful about this and sound in general is that even though light travels faster then sound we process sound quicker. Some examples of earcons: Empty trash sound on your computer Microwave end beeps (some models sing a song now) Seatbelt on warning signal in your car Car doors locked horn honk Beeps when you press a button on your phone

21 21 Audio Elements in Speech Auditory Icons Auditory icons are caricatures of naturally occurring sounds, could be used to provide information about sources of data. Some examples of auditory icons: Car Horn Warning Water splashing A flowing river Filling a bottle with water A car engine starting and idling A door opening or closing

22 22 Audio Elements in Speech Sound Filter Effects  http://manual.audacityteam.org/man/Effect_Menu 1.Volume Normalization Use the Normalize effect to set the peak amplitude of single or multiple tracks, equalize the peak amplitude of the left and right channels of stereo tracks 2. Noise Reduction This effect is ideal for removing constant background noise such as fans, tape noise, or hums. It will not work very well for removing talking or music in the background. 3. Amplitude This effect increases or decreases the volume of a track or set of tracks. When you open the dialog, Audacity automatically calculates the maximum amount you could amplify the selected audio without causing clipping (from being too loud).

23 23 Audio Elements in Speech Sound Filter Effects 4. Fade In Applies a fade-in to the selected audio, so that the amplitude changes gradually from silence at the start of the selection to the original amplitude at the end of the selection. The shape of the fade is linear. 5. Fade Out Applies a fade-out to the selected audio, so that the amplitude changes gradually from the original amplitude at the start of the selection down to silence at the end of the selection. The shape of the fade is linear. 6. Equalizer Equalization is a way of manipulating sounds by Frequency. It allows you to adjust the volume levels of particular frequencies.


Download ppt "Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis"

Similar presentations


Ads by Google