Speech Signal Processing

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Natural Language Processing - Speech Processing -
12/03Page1 Course Description Emphasis: adaptive digital (discrete-time) filters Secondary emphasis (application): digital data communications Course goals:
On-line Learning with Passive-Aggressive Algorithms Joseph Keshet The Hebrew University Learning Seminar,2004.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
COMP 4060 Natural Language Processing Speech Processing.
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Structure of Spoken Language
Physics 1251 The Science and Technology of Musical Sound Unit 3 Session 31 MWF The Fundamentals of the Human Voice Unit 3 Session 31 MWF The Fundamentals.
CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
1 CS 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
1 Phonetics and Phonemics. 2 Phonetics and Phonemics : Phonetics The principle goal of Phonetics is to provide an exact description of every known speech.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Daniel May Department of Electrical and Computer Engineering Mississippi State University Analysis of Correlation Dimension Across Phones.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson University.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
1 Speech Processing. 2 Speech Processing: Text:  Spoken language processing Huang, Acero, Hon, Prentice Hall, 2000  Discrete time processing of speech.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Introduction Part I Speech Representation, Models and Analysis Part II Speech Recognition Part III Speech Synthesis Part IV Speech Coding Part V Frontier.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Proposed Courses. Important Notes State-of-the-art challenges in TV Broadcasting o New technologies in TV o Multi-view broadcasting o HDR imaging.
Digital Signal Processing Rahil Mahdian LSV Lab, Saarland University, Germany.
CSE 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2005.
An Efficient Online Algorithm for Hierarchical Phoneme Classification
Speech Compression - Course outline and rules - Properties of the speech signal A. Enis Cetin.
Structure of Spoken Language
Structure of Spoken Language
Structure of Spoken Language
Vocoders.
Structure of Spoken Language
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Speech Processing Speech Recognition
Pitch Detection from Waveform and Spectrogram
Phonetics and Phonemics
CS 188: Artificial Intelligence Spring 2006
Phonetics and Phonemics
Presentation transcript:

Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems (S3)

Algorithms (Programming) Psychoacoustics Room acoustics Speech production Speech Processing Acoustics Signal Processing Information Theory Phonetics Fourier transforms Discrete time filters AR(MA) models Entropy Communication theory Rate-distortion theory Statistical SP Stochastic models

Topics, part I Analysis of speech signals: Fourier analysis; spectrogram Autocorrelation; pitch estimation Linear prediction; compression, recognition Cepstral analysis; pitch estimation, enhancement

Topics, part II Speech compression. Scalar quantization (PCM, DPCM). (Transform Coding.) Vector quantization. State of the art speech coders: CELP, sinusoidal

Topics, part III Statistical modeling of speech. Gaussian mixtures; speaker identification. Hidden Markov models; speech recognition.

Topics, part IV Speech enhancement: Microphone array processing. Beamforming. Blind signal separation (cocktail party). Echo cancellation. The LMS algorithm. Noise suppression. Spectral subtraction. The Wiener filter.

Practicalities 12 lectures, 12 exercises (48h altogether). 4 compulsory (graded) assignments. 1 written exam. 4 study points awarded if success. 4 pts = 17 h/week. “Spoken Language Processing. A guide…” by Huang et. al. available at Kårbokhandeln. Borrow headphones against 200 SEK deposit. More info in syllabus and on http://www.s3.kth.se/speech/courses/2E1400/

Tools for Speech Processing: Prerequisites Fourier transform (continuous and discrete time, periodic and aperiodic signals). Digital filter theory. Z-transform. Random processes. Innovation processes, AR, MA. Filtering of stochastic signals. Probability theory. ML and MMSE estimation. And more… cf. chapters 3 and 5 in Huang.

Speech Production On board: Presentation of source-filter model. Lungs

Speech Sounds Coarse classification with phonemes. A phone is the acoustic realization of a phoneme. Allophones are context dependent phonemes.

Phoneme Hierarchy Speech sounds Language dependent. About 50 in English. Vowels Diphtongs Consonants iy, ih, ae, aa, ah, ao,ax, eh, er, ow, uh, uw ay, ey, oy, aw Lateral liquid Glide Retroflex liquid l w, y Plosive p, b, t, d, k, g Fricative Nasal r f, v, th, dh, s, z, sh, zh, h m, n, ng

Speech Waveform Characteristics Loudness Voiced/Unvoiced. Pitch. Fundamental frequency. Spectral envelope. Formants.

Speech Waveform Characteristics Cont. Voiced Speech Unvoiced Speech /ih/ /s/

Short-Time Speech Analysis Segments (or frames, or vectors) are typically of length 20 ms. Speech characteristics are constant. Allows for relatively simple modeling. Often overlapping segments are extracted. On board: Windowing of signals. Short time Fourier transform. Relationship between analog spectrum and DFT based spectrum. Example with a pulse train. Compromise in choice of frame length.

B=1/N B B B B

The Spectrogram A classic analysis tool. Consists of DFTs of overlapping, and windowed frames. Displays the distribution of energy in time and frequency. is typically displayed.

The Spectrogram Cont.

Short time ACF /m/ /ow/ /s/ ACF |DFT| On board: Definition of short time ACF. Discussion on application to pitch estimation. |DFT|