Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis

Slides:

Advertisements

Similar presentations

Acoustic/Prosodic Features

Advertisements

AVQ Automatic Volume and eQqualization control Interactive White Paper v1.6.

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Speech Enhancement through Noise Reduction By Yating & Kundan.

Hierarchy of Design Voice Controlled Remote Voice Input Control Path Speech Processing IR Interface.

Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.

Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.

Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.

Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.

1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.

6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital.

2001/07/18Chin-Kai Wu, CS, NTHU1 A Voicing-Driven Packet Loss Recovery Algorithm for Analysis- by-Synthesis Predictive Speech Coders over Internet Jhing-Fa.

Analysis & Synthesis The Vocoder and its related technology.

Auditory User Interfaces

09/09/2005ENEE408G Fall 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 1: Digital Speech.

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,

A PRESENTATION BY SHAMALEE DESHPANDE

Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.

Introduction to Sound Sounds are vibrations that travel though the air or some other medium A sound wave is an audible vibration that travels through.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Sound and audio. Table of Content 1.Introduction 2.Properties of sound 3.Characteristics of digital sound 4.Calculate audio data size 5.Benefits of using.

Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.

Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.

OCR Nationals: Unit 22 – Creating Sound using ICT A03 – Create an audio clip Sound Editing & Effects.

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.

Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.

MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.

Performance Comparison of Speaker and Emotion Recognition

Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

More On Linear Predictive Analysis

Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

Chapter 1. SIGNAL PROCESSING:  Signal processing is concerned with the efficient and accurate extraction of information in a signal process.  Signal.

Editing Digital AudioLab#7 Audacity is a free, easy-to-use and an open source platform audio editor and recorder for Windows, Mac OS, Linux and other operating.

1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.

Introduction to Digital Audio

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Digital Communications Chapter 13. Source Coding

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

The Vocoder and its related technology

Introduction to Digital Audio

Linear Prediction.

Introduction to Digital Audio

Speech Processing Final Project

Digital Audio Application of Digital Audio - Selected Examples

Auditory Morphing Weyni Clacken

Presentation transcript:

Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis

2 Outline  Introduction  Topics in speech processing Speech coding Speech recognition Speech synthesis Speaker verification/recognition  Audio Elements  Conclusion Speech in Multimedia

3 Introduction  Speech is our basic communication tool.  We have been hoping to be able to communicate with machines using speech. Speech in Multimedia

4 Speech Production Model Speech in Multimedia Anatomy Structure Mechanical Model

5 Speech Production Model Speech in Multimedia Waveform Spectrogram Speech

6 Voiced and Unvoiced Speech Speech in Multimedia Silenceunvoiced voiced

7 Short Time Parameters Speech in Multimedia Short time power Waveform Envelop

8 Short Time Parameters (cont.) Speech in Multimedia Zero crossing rate Pitch period

9 Linear Predictive Coding (LPC) Speech Coder Speech in Multimedia Speech buffer Speech Analysis Pitch Voiced/ unvoiced Vocal track Parameter Energy Parameter Quantizer Code generation speech Code stream Frame n Frame n+1

10 LPC and Vocal Track Speech in Multimedia  Mathematically, speech can be modeled as the following generation model:  {a 1, a 2, …, a k } are called Linear Prediction Coefficients (LPC), which can be used to model the shape of vocal track.  e(n) is the excitation to generate the speech. x(n) =  p=1 k a p x(n-p) + e(n)

11 An Example for Synthesizing Speech Speech in Multimedia Blending region Glottal Pulse Go through vocal track filter with gain control Go through radiation filter

12 Speech Recognition Speech in Multimedia  Speech recognition is the foundation of human computer interaction using speech.  Speech recognition in different contexts Dependent or independent on the speaker. Discrete words or continuous speech. Small vocabulary or large vocabulary. In quiet environment or noisy environment. Parameter analyzer Comparison and decision algorithm Language model Reference patterns speech Words

13 How does Speech Recognition Work? Speech in Multimedia Words: grey whales Phonemes: g r e y w e y l z Each phoneme has different characteristics (for example, The power distribution).

14 Speech Recognition Speech in Multimedia g g r ey ey ey ey w ey ey l l z How do we “match” the word when there are time and other variations?

15 Dynamic Programming in Decoding Speech in Multimedia time states We can find a path that corresponds to max-probable phonemes to generate the observation “feature” (extracted in each speech frame) sequence.

16 Speech Synthesis Speech in Multimedia  Speech synthesis is to generate (arbitrary) speech with desired prosperities (pitch, speed, loudness, articulation mode, etc.)  Speech synthesis has been widely used for text-to-speech systems and different telephone services.  The easiest and most often used speech synthesis method is waveform concatenation Increase the pitch without changing the speed

17 Speaker Recognition Speech in Multimedia  Identifying or verifying the identity of a speaker is an application where computer exceeds human being.  Vocal track parameter can be used as a feature for speaker recognition. Speaker oneSpeaker two

18 Applications Speech in Multimedia Speech recognition Call routing Directory Assistance Operator Services Document input Speaker recognition Personalized service Fraud Control Text-to-Speech synthesis Speech Interface Document Correction Voice Commands Speech Coding Wireless Telephone Voice over Internet

19 Audio Elements in Speech Audio Elements Auditory icons use an intuitive linkage between the model world of sonically represented objects and events, using sounds familiar to listeners from the everyday world. Auditory Icons Earcons Earcons are short, structured musical phrases that can be parameterized to communicate information in an Auditory Display.

20 Audio Elements in Speech Earcons An earcon is the audio equivalent of an icon and just like visual icons we hear earcons throughout the day. Its job is to communicate meaning through the use of sound. What’s powerful about this and sound in general is that even though light travels faster then sound we process sound quicker. Some examples of earcons: Empty trash sound on your computer Microwave end beeps (some models sing a song now) Seatbelt on warning signal in your car Car doors locked horn honk Beeps when you press a button on your phone

21 Audio Elements in Speech Auditory Icons Auditory icons are caricatures of naturally occurring sounds, could be used to provide information about sources of data. Some examples of auditory icons: Car Horn Warning Water splashing A flowing river Filling a bottle with water A car engine starting and idling A door opening or closing

22 Audio Elements in Speech Sound Filter Effects  1.Volume Normalization Use the Normalize effect to set the peak amplitude of single or multiple tracks, equalize the peak amplitude of the left and right channels of stereo tracks 2. Noise Reduction This effect is ideal for removing constant background noise such as fans, tape noise, or hums. It will not work very well for removing talking or music in the background. 3. Amplitude This effect increases or decreases the volume of a track or set of tracks. When you open the dialog, Audacity automatically calculates the maximum amount you could amplify the selected audio without causing clipping (from being too loud).

23 Audio Elements in Speech Sound Filter Effects 4. Fade In Applies a fade-in to the selected audio, so that the amplitude changes gradually from silence at the start of the selection to the original amplitude at the end of the selection. The shape of the fade is linear. 5. Fade Out Applies a fade-out to the selected audio, so that the amplitude changes gradually from the original amplitude at the start of the selection down to silence at the end of the selection. The shape of the fade is linear. 6. Equalizer Equalization is a way of manipulating sounds by Frequency. It allows you to adjust the volume levels of particular frequencies.