EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006.

Slides:



Advertisements
Similar presentations
Bayesian Learning & Estimation Theory
Advertisements

Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Pattern Recognition and Machine Learning
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
The General Linear Model. The Simple Linear Model Linear Regression.
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Maximum likelihood (ML) and likelihood ratio (LR) test
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Maximum Likelihood (ML), Expectation Maximization (EM)
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Statistical Decision Theory
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
ECE 4331, Fall, 2009 Zhu Han Department of Electrical and Computer Engineering Class 12 Oct. 1 st, 2009.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
BCS547 Neural Decoding.
EE 3220: Digital Communication
Lecture 2: Statistical learning primer for biologists
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Estimation of Random Variables Two types of estimation: 1) Estimating parameters/statistics of a random variable (or several) from data. 2)Estimating the.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning 5. Parametric Methods.
Baseband Receiver Receiver Design: Demodulation Matched Filter Correlator Receiver Detection Max. Likelihood Detector Probability of Error.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Estimation Theory: A Tutorial
Yi Jiang MS Thesis 1 Yi Jiang Dept. Of Electrical and Computer Engineering University of Florida, Gainesville, FL 32611, USA Array Signal Processing in.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Special Topics In Scientific Computing
Computing and Statistical Data Analysis / Stat 8
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
EE513 Audio Signals and Systems
Pattern Recognition and Machine Learning
Stat 223 Introduction to the Theory of Statistics
Computing and Statistical Data Analysis / Stat 7
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
Presentation transcript:

EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006

EE 541/451 Fall 2006 Outline Homework Exam format Second half schedule –Chapter 7 –Chapter 16 –Chapter 8 –Chapter 9 –Standards Estimation and detection this class: chapter 14, not required –Estimation theory, methods, and examples –Detection theory, methods, and examples Information theory next Tuesday: chapter 15, not required

EE 541/451 Fall 2006 Estimation Theory Consider a linear process y = H  + n y = observed data  = sending information n = additive noise If  is known, H is unknown. Then estimation is the problem of finding the statistically optimal , given y,  and knowledge of noise properties. If H is known, then detection is the problem of finding the most likely sending information , given y,  and knowledge of noise properties. In practical system, the above two steps are conducted iteratively to track the channel changes then transmit data.

EE 541/451 Fall 2006 Different Approaches for Estimation Minimum variance unbiased estimators Subspace estimators Least Squares Maximum-likelihood Maximum a posteriori has no statistical basis uses knowledge of noise PDF uses prior information about 

EE 541/451 Fall 2006 Least Squares Estimator Least Squares:  LS = argmin ||y – H  || 2 Natural estimator– want solution to match observation Does not use any information about noise There is a simple solution (a.k.a. pseudo-inverse):  LS = (H T H) -1 H T y  What if we know something about the noise?  Say we know Pr(n)…

EE 541/451 Fall 2006 Maximum Likelihood Estimator Simple idea: want to maximize Pr(y|  ) Can write Pr(n) = e -L(n), n = y – H , and Pr(n) = Pr(y|  ) = e -L(y,  ) if white Gaussian n, Pr(n) = e -||n|| 2 /2  2 and L(y,  ) = ||y-H  || 2 /2  2  ML = argmax Pr(y|  ) = argmin L(y,  ) –called the likelihood function  ML = argmin ||y-H  || 2 /2  2 This is the same as Least Squares!

EE 541/451 Fall 2006 Maximum Likelihood Estimator But if noise is jointly Gaussian with cov. matrix C Recall C, E (nn T ). Then Pr(n) = e -½ n T C -1 n L(y|  ) = ½ (y-H  ) T C -1 (y-H  )  ML = argmin ½ (y-H  ) T C -1 (y-H  ) This also has a closed form solution  ML = (H T C -1 H) -1 H T C -1 y If n is not Gaussian at all, ML estimators become complicated and non-linear Fortunately, in most channel noise is usually Gaussian

EE 541/451 Fall 2006 Estimation example - Denoising Suppose we have a noisy signal y, and wish to obtain the noiseless image x, where y = x + n Can we use Estimation theory to find x? Try: H = I,  = x in the linear model Both LS and ML estimators simply give x = y!  we need a more powerful model Suppose x can be approximated by a polynomial, i.e. a mixture of 1 st p powers of r: x =  i=0 p a i r i

EE 541/451 Fall 2006 Example – Denoising  LS = (H T H) -1 H T y x =  i=0 p a i r i H  y Least Squares estimate: y 1 y 2  y n 1 r 1 1  r 1 p 1 r 2 1  r 2 p   1 r n 1  r n p = a 0 a 1  a p n 1 n 2  n n +

EE 541/451 Fall 2006 Maximum a Posteriori (MAP) Estimate This is an example of using a signal prior information Priors are generally expressed in the form of a PDF Pr(x) Once the likelihood L(x) and prior are known, we have complete statistical knowledge LS/ML are suboptimal in presence of prior MAP (aka Bayesian) estimates are optimal Bayes Theorem: Pr(x|y) = Pr(y|x) Pr(x) Pr(y) likelihood prior posterior

EE 541/451 Fall 2006 Maximum a Posteriori (Bayesian) Estimate Consider the class of linear systems y = Hx + n Bayesian methods maximize the posterior probability: Pr(x|y) ∝ Pr(y|x). Pr(x) Pr(y|x) (likelihood function) = exp(- ||y-Hx|| 2 ) Pr(x) (prior PDF) = exp(-G(x)) Non-Bayesian: maximize only likelihood x est = arg min ||y-Hx|| 2 Bayesian: x est = arg min ||y-Hx|| 2 + G(x), where G(x) is obtained from the prior distribution of x If G(x) = ||Gx|| 2  Tikhonov Regularization

EE 541/451 Fall 2006 Expectation and Maximization (EM) Expectation and Maximization (EM) algorithm alternates between performing an expectation (E) step, which computes an expectation of the likelihood by including the latent variables as if they were observed, and a maximization (M) step, which computes the maximum likelihood estimates of the parameters by maximizing the expected likelihood found on the E step. The parameters found on the M step are then used to begin another E step, and the process is repeated. –E-step: Estimation for unobserved event (which Gaussian is used), conditioned on the observation, using the values from the last maximization step. –M-step: You want to maximize the expected log-likelihood of the joint event

EE 541/451 Fall 2006 Minimum-variance unbiased estimator Biased and unbiased estimators An unbiased estimator of parameters, whose variance is minimized for all values of the parameters.unbiasedestimatorvariance The Cramer-Rao Lower Bound (CRLB) sets a lower bound on the variance of any unbiased estimator. Biased estimator might have better performances than unbiased estimator in terms of variance. Subspace methods –MUSIC –ESPRIT –Widely used in RADA –Helicopter, Weapon detection (from feature)

EE 541/451 Fall 2006 What is Detection Deciding whether, and when, an event occurs a.k.a. Decision Theory, Hypothesis testing Presence/absence of signal –RADA –Received signal is 0 or 1 –Stock goes high or not –Criminal is convicted or set free Measures whether statistically significant change has occurred or not

EE 541/451 Fall 2006 Detection “Spot the Money”

EE 541/451 Fall 2006 Hypothesis Testing with Matched Filter Let the signal be y(t), model be h(t) Hypothesis testing: H0: y(t) = n(t) (no signal) H1: y(t) = h(t) + n(t) (signal) The optimal decision is given by the Likelihood ratio test (Nieman-Pearson Theorem) Select H1 if L(y) = Pr(y|H1)/Pr(y|H0) > g otherwise select H0

EE 541/451 Fall 2006 Signal detection paradigm Signal trials Noise trials

EE 541/451 Fall 2006 Signal Detection

EE 541/451 Fall 2006 Receiver operating characteristic (ROC) curve

EE 541/451 Fall 2006 Matched Filters Optimal linear filter for maximizing the signal to noise ratio (SNR) at the sampling time in the presence of additive stochastic noise Given transmitter pulse shape g(t) of duration T, matched filter is given by h opt (t) = k g*(T-t) for all k g(t)g(t) Pulse signal w(t)w(t) x(t)x(t) h(t)h(t) y(t)y(t) t = T y(T)y(T) Matched filter

EE 541/451 Fall 2006 Questions?