Convex Optimization in Sinusoidal Modeling for Audio Signal Processing Michelle Daniels PhD Student, University of California, San Diego.

Slides:



Advertisements
Similar presentations
DCSP-13 Jianfeng Feng
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Source separation and analysis of piano music signals using instrument-specific sinusoidal model Wai Man SZETO and Kin Hong WONG
Pattern Recognition and Machine Learning
TRANSMISSION FUNDAMENTALS Recap Questions/Solutions
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Venkataramanan Balakrishnan Purdue University Applications of Convex Optimization in Systems and Control.
Data mining and statistical learning - lecture 6
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
School of Computing Science Simon Fraser University
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Time-Frequency and Time-Scale Analysis of Doppler Ultrasound Signals
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
Introduction to Wavelets
Source-Channel Prediction in Error Resilient Video Coding Hua Yang and Kenneth Rose Signal Compression Laboratory ECE Department University of California,
EE513 Audio Signals and Systems Wiener Inverse Filter Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Normalised Least Mean-Square Adaptive Filtering
Zbigniew LEONOWICZ, Tadeusz LOBOS Wroclaw University of Technology Wroclaw University of Technology, Poland International Conference.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
Lecture 1 Signals in the Time and Frequency Domains
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
PATTERN RECOGNITION AND MACHINE LEARNING
1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.
Wireless and Mobile Computing Transmission Fundamentals Lecture 2.
Basics of Neural Networks Neural Network Topologies.
The Physical Layer Lowest layer in Network Hierarchy. Physical transmission of data. –Various flavors Copper wire, fiber optic, etc... –Physical limits.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
CHEE825 Fall 2005J. McLellan1 Spectral Analysis and Input Signal Design.
Chapter 6 Spectrum Estimation § 6.1 Time and Frequency Domain Analysis § 6.2 Fourier Transform in Discrete Form § 6.3 Spectrum Estimator § 6.4 Practical.
Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.
EE513 Audio Signals and Systems
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Introduction to Digital Signals
A Semi-Blind Technique for MIMO Channel Matrix Estimation Aditya Jagannatham and Bhaskar D. Rao The proposed algorithm performs well compared to its training.
Z bigniew Leonowicz, Wroclaw University of Technology Z bigniew Leonowicz, Wroclaw University of Technology, Poland XXIX  IC-SPETO.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Autoregressive (AR) Spectral Estimation
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Analysis of Traction System Time-Varying Signals using ESPRIT Subspace Spectrum Estimation Method Z. Leonowicz, T. Lobos
Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
Speech Enhancement Summer 2009
Discrete Fourier Transform (DFT)
LECTURE 11: Advanced Discriminant Analysis
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Islamic University of Gaza
Outlier Processing via L1-Principal Subspaces
Sampling rate conversion by a rational factor
Linear Predictive Coding Methods
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
Signal Processing First
7.1 Introduction to Fourier Transforms
INTRODUCTION TO THE SHORT-TIME FOURIER TRANSFORM (STFT)
Emad M. Grais Hakan Erdogan
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Convex Optimization in Sinusoidal Modeling for Audio Signal Processing Michelle Daniels PhD Student, University of California, San Diego

Outline  Introduction to sinusoidal modeling  Existing approach  Proposed optimization post-processing  Testing and results  Conclusions  Future work 2

Analysis of Audio Signals  Audio signals have rapid variations  Speech  Music  Environmental sounds  Assume minimal change over short segments (frames)  Analyze on a frame-by-frame basis  Constant-length frames (46ms)  Frames typically overlap  Any audio signal can be represented as a sum of sinusoids (deterministic components) and noise (stochastic components) 3

Sinusoidal Modeling of Audio Signals  Given a signal y of length N, represent as K component sinusoids plus noise e :  y and e are N -dimensional vectors  Each sinusoid has frequency  )  magnitude ( a ), and phase  parameters  K is determined during the analysis process  Higher-resolution frequencies than DFT bins, no harmonic relationship required  Model, encode, and/or process these components independently  Applications:  Effects processing (time-scale modification, pitch shifting)  Audio compression  Feature extraction for machine listening  Auditory scene analysis 4

Estimation Algorithm  Using frequency domain analysis (e.g. FFT), iterate up to K times, until residual signal is small and/or has a flat spectrum:  Identify the highest-magnitude sinusoid in the signal  Estimate its frequency   Given , estimate its magnitude a and phase   Reconstruct the sinusoid  Subtract the reconstructed sinusoid to produce a residual signal  After all sinusoids have been removed, the final residual contains only noise 5

Sinusoidal Analysis Example 6

7

8

9

Estimation Challenges  Energy in any DFT bin can come from:  Multiple sinusoids with similar frequency  Both sinusoids and noise  Interference from other sinusoids and/or noise results in inaccurate estimates  Incorrect estimation of a single sinusoid corrupts the residual signal and affects all subsequent estimates 10

Possible Solution  Optimize frequency, magnitude, and phase to minimize the energy in the residual signal  The original parameter estimates are initial estimates for the optimization  Sinusoidal approximation:  Residual:  Optimization problem: 11

Is it Convex?  Want convexity so the problem is practical to solve  Not a convex optimization problem because each element of ŷ is a sum of cosine functions of  and   Want convex function inside of the 2-norm instead  With fixed frequencies, can reformulate optimization of magnitudes and phases as convex problem  Fix frequencies to initial estimates 12

Convex Optimization Problem Magnitude and phase recovered as: Classic least-squares problem: 13

Related Work  Petre Stoica, Hongbin Li, and Jian Li. “Amplitude estimation of sinusoidal signals: Survey, new results, and an application”,  Mentions least-squares as one approach to estimate amplitude of complex exponentials  No discussion of phase estimation  Hing-Cheung So. “On linear least squares approach for phase estimation of real sinusoidal signals”,  Focuses on phase estimation  Theoretical analysis  Not applied specifically to audio signals 14

Constraints  Analytic least-squares solution frequently results in unrealistic magnitude values  This is possibly the result of errors in frequency estimates  Constraints on magnitudes were required  Ideal constraint:  Relaxed constraint:  Result is a constrained least squares problem that can be solved using a generic quadratic program (QP) solver 15

Final Formulation 16  Quadratic Program:  Magnitude and phase recovered from x as:

Test Signals 17  Model test signals that reproduce challenging aspects of real-world signals  Reconstruct signal based on original model parameters and optimized parameters  Compare both reconstructions to original test signal and to each other

Test Signal 1: Overlapping Sinusoids  Signal consists of two sinusoids close in frequency  There is no additive noise, so the residual (the noise component of the model) should be zero 18

Results 1: Overlapping Sinusoids  Without optimization, there is significant energy left in the residual (very audible)  With optimization, the residual power at individual frequencies is reduced by as much as 50dB (now barely audible)  The improvement with optimization generally decreases as the frequency separation is increased 19

Test Signal 2: Sudden Onset  A single sinusoid starts half-way through an analysis frame (the first half is silence) 20

Results 2: Sudden Onset 21 Original: MSE* = 2.76x10 -5 Optimized: MSE* = 4.13x10 -6 *MSE = Mean Squared Error

Test Signal 3: Chirp  A single sinusoid with constant magnitude and continuously-increasing frequency 22

Results 3: Chirp  Non-optimized peak magnitudes are close to constant between consecutive frames  Optimized peak magnitudes vary significantly from frame to frame  The optimization produces peak parameters that do not reflect the underlying real-world phenomenon. 23

Conclusions  Problem can be formulated using convex programming  For several classic challenging signals, optimization produces a more accurate model  Constraints are necessary to ensure parameter estimates reflect possible real-world phenomena  Final formulation is quadratic program  Parameters obtained via optimization may still not represent the underlying real-world phenomenon as well as the original analysis (i.e. chirp) 24

Future Work  Explore robust optimization techniques to compensate for errors in frequency estimates  Integrate optimization into original analysis instead of a post-processing stage  Experiment with more real-world signals  Further investigate constraints  The ultimate goal: three-way joint optimization of frequency, magnitude, and phase 25

References  M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version May  R. McAulay and T. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4): , Aug  Xavier Serra. A System for Sound Analysis/Transformation/Synthesis Based on a Deterministic Plus Stochastic Decomposition. PhD thesis, Stanford University,  Kevin M. Short and Ricardo A. Garcia. Accurate low-frequency magnitude and phase estimation in the presence of DC and near-DC aliasing. In Proceedings of the 121st Convention of the Audio Engineering Society,  Kevin M. Short and Ricardo A. Garcia. Signal analysis using the complex spectral phase evolution (CSPE) method. In Proceedings of the 120th Convention of the Audio Engineering Society,  Hing-Cheung So. On linear least squares approach for phase estimation of real sinusoidal signals. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E88-A(12): , December  Petre Stoica, Hongbin Li, and Jian Li. Amplitude estimation of sinusoidal signals: Survey, new results, and an application. IEEE Transactions on Signal Processing, 48(2): ,

Thanks for your attention! For further information: 27

THE END 28

Convex Reformulation Define: Change of variables: Define: 29

Test Signal: Sinusoid in noise  A single sinusoid with stationary frequency and corrupted by additive white Gaussian noise  Noise is present at all frequencies, including that of the sinusoid, corrupting magnitude and phase estimates  Test repeated using different variances for the noise (varying signal-to-noise ratios) 30

Results: Sinusoid in noise Without optimization, the sinusoid’s magnitude is over-estimated and the noise’s energy is under-estimated The optimization gives residual energy slightly closer to the true noise energy. 31

Results: Overlapping Sinusoids The optimization is able to compensate for some of the errors in initial magnitude and phase estimation, resulting in a lower MSE. 32