Presentation is loading. Please wait. # SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.

## Presentation on theme: "SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart."— Presentation transcript:

SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart Ljung

lecture 7 Ali Karimpour Nov 2009 2 Lecture 7 Parameter Estimation Method Topics to be covered include: v Guiding Principles Behind Parameter Estimation Method. v Minimizing Prediction Error. v Linear Regressions and the Least-Squares Method. v A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method. v Correlation Prediction Errors with Past Data. v Instrumental Variable Methods.

lecture 7 Ali Karimpour Nov 2009 3 Models of linear time invariant system Topics to be covered include: v Guiding Principles Behind Parameter Estimation Method. v Minimizing Prediction Error. v Linear Regressions and the Least-Squares Method. v A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method. v Correlation Prediction Errors with Past Data. v Instrumental Variable Methods.

lecture 7 Ali Karimpour Nov 2009 4 Guiding Principles Behind Parameter Estimation Method Parameter Estimation Method Suppose that we have selected a certain model structure M. The set of models defined as: For each θ, model represents a way of predicting future outputs. The predictor is a linear filter as: Suppose the system is: Where:

lecture 7 Ali Karimpour Nov 2009 5 Guiding Principles Behind Parameter Estimation Method Suppose that we collect a set of data from system as: Formally we are going to find a map from the data Z N to the set D M Such a mapping is a parameter estimation method.

lecture 7 Ali Karimpour Nov 2009 6 Guiding Principles Behind Parameter Estimation Method Evaluating the candidate model Let us define the prediction error as: When the data set Z N is known, these errors can be computed for t=1, 2, …, N A guiding principle for parameter estimation is: Based on Z t we can compute the prediction error ε(t,θ). Select so that the prediction error t=1, 2, …, N, becomes as small as possible. ? We describe two approaches Form a scalar-valued criterion function that measure the size of ε. 7_2 till 7_4 Make uncorrelated with a given data sequence. 7_5 and 7_6

lecture 7 Ali Karimpour Nov 2009 7 Models of linear time invariant system Topics to be covered include: v Guiding Principles Behind Parameter Estimation Method. v Minimizing Prediction Error. v Linear Regressions and the Least-Squares Method. v A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method. v Correlation Prediction Errors with Past Data. v Instrumental Variable Methods.

lecture 7 Ali Karimpour Nov 2009 8 Minimizing Prediction Error Clearly the size of prediction error is the same as Z N Let to filter the prediction error by a stable linear filter L(q) Then use the following norm Where l(.) is a scalar-valued positive function. The estimate is then defined by:

lecture 7 Ali Karimpour Nov 2009 9 Minimizing Prediction Error Generally the term prediction error identification methods (PEM) is used for the family of this approaches. Particular methods with specific names are used according to: Choice of l(.) Choice of L(.) Choice of model structure Method by which the minimization is realized

lecture 7 Ali Karimpour Nov 2009 10 Minimizing Prediction Error Choice of L The effect of L is best understood in a frequency-domain interpretation. Thus L acts like frequency weighting. See also >> 14.4 Prefiltering Exercise: Consider following system Show that the effect of prefiltering by L is identical to changing the noise model from

lecture 7 Ali Karimpour Nov 2009 11 Minimizing Prediction Error Choice of l A standard choice, which is convenient both for computation and analysis. See also >> 15.2 Choice of norms: Robustness (against bad data) One can also parameterize the norm independent of the model parameterization.

lecture 7 Ali Karimpour Nov 2009 12 Models of linear time invariant system Topics to be covered include: v Guiding Principles Behind Parameter Estimation Method. v Minimizing Prediction Error. v Linear Regressions and the Least-Squares Method. v A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method. v Correlation Prediction Errors with Past Data. v Instrumental Variable Methods.

lecture 7 Ali Karimpour Nov 2009 13 Linear Regressions and the Least-Squares Method We introduce linear regressions before as: φ is the regression vector and for the ARX structure it is μ(t) is a known data dependent vector. For simplicity let it zero in the reminder of this section. Least-squares criterion Now let L(q)=1 and l(ε)= ε 2 /2 then This is Least-squares criterion for the linear regression

lecture 7 Ali Karimpour Nov 2009 14 Linear Regressions and the Least-Squares Method Least-squares criterion The least square estimate (LSE) is:

lecture 7 Ali Karimpour Nov 2009 15 Linear Regressions and the Least-Squares Method Properties of LSE The least square method is a special case of PEM (prediction error method) So we have

lecture 7 Ali Karimpour Nov 2009 16 Linear Regressions and the Least-Squares Method Properties of LSE

lecture 7 Ali Karimpour Nov 2009 17 Linear Regressions and the Least-Squares Method Weighted Least Squares Different measurement could be assigned different weights or The resulting estimate is the same as previous.

lecture 7 Ali Karimpour Nov 2009 18 Linear Regressions and the Least-Squares Method Colored Equation-error Noise if the disturbance v(t) is not white noise, then the LSE will not converge to the true value a i and b i. To deal with this problem, we may incorporate further modeling of the equation error v(t) as discussed in chapter 4, let us say We show that in a difference equation Now e(t) is white noise, but the new model take us out from LS environment, except in two cases: Known noise properties High-order models

lecture 7 Ali Karimpour Nov 2009 19 Linear Regressions and the Least-Squares Method Colored Equation-error Noise Known noise properties Suppose the values of a i and b i are unknown, but k is a known filter (not too realistic a situation), so we have Filtering through k -1 (q) gives where Since e(t) is white, the LS method can be applied without problems. Notice that this is equivalent to applying the filter L(q)=k -1 (q).

lecture 7 Ali Karimpour Nov 2009 20 Linear Regressions and the Least-Squares Method Colored Equation-error Noise Now we can apply LS method. Note that n A =n a +r, n B =n b +r High-order models Suppose that the noise v can be well described by k(q)=1/D(q) where D(q) is a polynomial of order r. So we have or

lecture 7 Ali Karimpour Nov 2009 21 Linear Regressions and the Least-Squares Method Consider a state space model as To derive the system 1- Parameterize A, B, C, D as in section 4.3 2- We have no insight into the particular structure and we would like to find any suitable matrices A, B, C, D. Note: Since there are infinite number of such matrices that describe the same system (similarity transformation), we will have to fix the coordinate basis of the state space realization.

lecture 7 Ali Karimpour Nov 2009 22 Linear Regressions and the Least-Squares Method Consider a state space model as Note: Since there are infinite number of such matrices that describe the same system (similarity transformation), we will have to fix the coordinate basis of the state space realization. Let us for a moment that not only y and u are measured the states are also measured. This would, by the way, fix the state-space realization coordinate basis. Now with known y, u, x the model becomes a linear regression Then But there is some problem? States are not available to measure!

lecture 7 Ali Karimpour Nov 2009 23 Linear Regressions and the Least-Squares Method Estimating State Space Models Using Least Squares Techniques (Subspace Methods) By subspace algorithm x(t+1) derived from observations. Chapter 10

lecture 7 Ali Karimpour Nov 2009 24 Models of linear time invariant system Topics to be covered include: v Guiding Principles Behind Parameter Estimation Method. v Minimizing Prediction Error. v Linear Regressions and the Least-Squares Method. v A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method. v Correlation Prediction Errors with Past Data. v Instrumental Variable Methods.

lecture 7 Ali Karimpour Nov 2009 25 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Estimation and the Principle of Maximum Likelihood That is: The area of statistical inference, deals with the problem of extracting information from observations that themselves could be unreliable. Suppose that observation y N =(y(1), y(2),…,y(N)) has following probability density function (PDF) θ is a d-dimensional parameter vector. The propose of the observation is in fact to estimate the vector θ using y N. Suppose the observed value of y N is y N *, then

lecture 7 Ali Karimpour Nov 2009 26 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Estimation and the Principle of Maximum Likelihood Many such estimator functions are possible. A particular one >>>>>>>>> maximum likelihood estimator (MLE). The probability that the realization(=observation) indeed should take the value y N * is proportional to This is a deterministic function of θ once the numerical value y N * is inserted and it is called Likelihood function. A reasonable estimator of θ could then be where the maximization performed for fixed y N *. This function is known as the maximum likelihood estimator (MLE).

lecture 7 Ali Karimpour Nov 2009 27 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Example: Let Be independent random variables with normal distribution with unknown means θ 0 and known variances λ i A common estimator is the sample mean: To calculate MLE, we start to determine the joint PDF for the observations. The PDF for y(i) is: Joint PDF for the observations is: (since y(i) are independent)

lecture 7 Ali Karimpour Nov 2009 28 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Example: Let Be independent random variables with normal distribution with unknown means θ 0 and known variances λ i A common estimator is the sample mean: Joint PDF for the observations is: (since y(i) are independent) So the likelihood function is: Maximizing likelihood function is the same as maximizing its logarithm. So

lecture 7 Ali Karimpour Nov 2009 29 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Example: Let Be independent random variables with normal distribution with unknown means θ 0 and known variances λ i Suppose N=15 and y(i) is derived from a random generation (normal distribution) such that the means is 10 but variances are: 10, 2, 3, 4, 61, 11, 0.1, 121, 10, 1, 6, 9, 11, 13, 15 The estimated means for 10 different experiments are shown in the figure: Exercise:Do the same procedure for another experiments and draw the corresponding figure. Exercise:Do the same procedure for another experiments and draw the corresponding figure. Suppose all variances as 10.

lecture 7 Ali Karimpour Nov 2009 30 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Relationship to the Maximum A Posteriori (MAP) Estimate Maximum likelihood estimator (MLE) The Bayesian approach is used to derive another parameter estimation problem. In the Bayesian approach the parameter itself is thought of as a random variable. Let the prior PDF for θ is: After some manipulation the Maximum A Posteriori (MAP) estimate is:

lecture 7 Ali Karimpour Nov 2009 31 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Cramer-Rao Inequality The quality of an estimator can be assessed by its mean-square error matrix: True value of θ We may be interested in selecting estimators that make P small. Cramer-Rao inequality give a lower bound for P M is Fisher Information matrix

lecture 7 Ali Karimpour Nov 2009 32 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Asymptotic Properties of the MLE Calculation of Is not an easy task. Therefore, limiting properties as the sample size tends to infinity are calculated instead. For the MLE in case of independent observations, Wald and Cramer obtain Suppose that the random variable {y(i)} are independent and identically distributed, so that Suppose also that the distribution of y N is given by f y (θ 0 ;x N ) for some value θ 0. Then tends to θ 0 with probability 1 as N tends to infinity, and converges in distribution to the normal distribution with zero mean covariance matrix given by Cramer-Rao lower bound M -1.

lecture 7 Ali Karimpour Nov 2009 33 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Likelihood function for Probabilistic Models of Dynamical Systems Suppose Recall this kind of model a complete probabilistic model. We note that, the output is: Now we must determine the likelihood function Probabilistic Models of Dynamical Systems

lecture 7 Ali Karimpour Nov 2009 34 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Lemma: Suppose u t is given as a deterministic sequence, and assume that the generation of y t is described by the model Then the joint probability density function for y t, given u t is: Proof: CPDF of y(t), given Z t-1, is Using Bayes’s rule, the joint CPDF of y(t) and y(t-1), given Z t-2 can be expressed as: Similarly we derive (I)

lecture 7 Ali Karimpour Nov 2009 35 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Suppose Now we must determine the likelihood function Probabilistic Models of Dynamical Systems By previous lemma Maximizing this function is the same as maximizing If we define

lecture 7 Ali Karimpour Nov 2009 36 A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method Probabilistic Models of Dynamical Systems Maximizing this function is the same as maximizing If we define We may write The ML method can thus be seen as a special case of the PEM. Exercise: Find the Fisher information matrix for this system. Exercise: Derive a lower bound for

lecture 7 Ali Karimpour Nov 2009 37 Models of linear time invariant system Topics to be covered include: v Guiding Principles Behind Parameter Estimation Method. v Minimizing Prediction Error. v Linear Regressions and the Least-Squares Method. v A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method. v Correlation Prediction Errors with Past Data. v Instrumental Variable Methods.

lecture 7 Ali Karimpour Nov 2009 38 Correlation Prediction Errors with Past Data Ideally, the prediction error ε(t,θ) for good model should be independent of the past data Z t-1 If ε(t,θ) is correlated with Z t-1 then there was more information available in Z t-1 about y(t) than picked up by To test if ε(t,θ) is independent of the data set Z t-1 we must check This is of course not feasible in practice. Uncorrelated with All transformation of ε(t,θ) All possible function of Z t-1 Instead, we may select a certain finite-dimensional vector sequence {ζ(t)} derived from Z t-1 and a certain transformation of { ε(t,θ)} to be uncorrelated with this sequence. This would give Derived θ would be the best estimate based on the observed data.

lecture 7 Ali Karimpour Nov 2009 39 Correlation Prediction Errors with Past Data Choose a linear filter L(q) and let Choose a sequence of correlation vectors Choose a function α(ε) and define Then calculate Instrumental variable method (next section) is the best known representative of this family.

lecture 7 Ali Karimpour Nov 2009 40 Correlation Prediction Errors with Past Data Normally, the dimension of ξ would be chosen so that f N is a d-dimensional vector. Then there is many equations as unknowns. Sometimes one use ξ with higher dimension than d so there is an over determined set of equations, typically without solution. so Exercise: Show that the prediction-error estimate obtained from can be also seen as a correlation estimate for a particular choice of L, ζ and α.

lecture 7 Ali Karimpour Nov 2009 41 Correlation Prediction Errors with Past Data Pseudolinear Regressions We saw in chapter 4 that a number of common prediction models could be written as: Pseudo-regression vector φ(t,θ) contains relevant past data, it is reasonable to require the resulting prediction errors be uncorrelated with φ(t,θ) so: Which the term PLR estimate.

lecture 7 Ali Karimpour Nov 2009 42 Models of linear time invariant system Topics to be covered include: v Guiding Principles Behind Parameter Estimation Method. v Minimizing Prediction Error. v Linear Regressions and the Least-Squares Method. v A Statistical Framework for Parameter Estimation and the Maximum Likelihood Method. v Correlation Prediction Errors with Past Data. v Instrumental Variable Methods.

lecture 7 Ali Karimpour Nov 2009 43 Instrumental Variable Methods Consider linear regression as: The least-square estimate of θ is given by So it is a kind of PEM with L(q)=1 and ξ(t,θ)=φ(t) Now suppose that the data actually described by We found in section 7.3 that LSE will not tend to θ 0 in typical cases.

lecture 7 Ali Karimpour Nov 2009 44 Instrumental Variable Methods We found in section 7.3 that LSE will not tend to θ 0 in typical cases. Such an application to a linear regression is called instrumental-variable method. The elements of ξ are then called instruments or instrumental variables. Estimated θ is:

lecture 7 Ali Karimpour Nov 2009 45 Instrumental Variable Methods Exercise: Show that will be exist and tend to θ 0 if following equations exists. We found in section 7.3 that LSE will not tend to θ 0 in typical cases.

lecture 7 Ali Karimpour Nov 2009 46 Instrumental Variable Methods Suppose an ARX model: Choices of instruments A natural idea is to generate the instruments similarly to above model. But at the same time not let them be influenced by this leads to Where K is a linear filter and x(t) is generated from the input through a linear system

lecture 7 Ali Karimpour Nov 2009 47 Instrumental Variable Methods Here Most instruments used in practice are generated in this way. Obviously, is obtained from past inputs by linear filtering and can be written, consequently, as

lecture 7 Ali Karimpour Nov 2009 48 Instrumental Variable Methods If the input is generated in open loop so that is does not depend on the noise in the system. Then clearly the following property holds: Since both the -vector and -vector are generated form the same input sequence, it might be expected that the following property should hold in general.

lecture 7 Ali Karimpour Nov 2009 49 Instrumental Variable Methods Model-dependent Instruments It may be desirable to choose the filetrs N and M to those of the true system They are clearly not known, but we may let the instruments depend on the parameters in the obvious way

lecture 7 Ali Karimpour Nov 2009 50 Instrumental Variable Methods The IV method could be summarized as follows In general, we could write the generation of Where is a d-dimentional column vector of linear filters where

Similar presentations

Ads by Google