Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE-7000: Nonlinear Dynamical Systems 12.5.1 Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.

Similar presentations


Presentation on theme: "ECE-7000: Nonlinear Dynamical Systems 12.5.1 Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted."— Presentation transcript:

1 ECE-7000: Nonlinear Dynamical Systems 12.5.1 Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted to the data.  However, too many adjustable parameters makes global features of system to be erroneously.  Overfitting becomes evident if the in-sample prediction error is significantly smaller than out-of-sample error.  Add a term for the model-costs to the minimisation problem.  Add an appropriate function of the number of adjustable parameters to the likelihood function. Cost function  Suppose we have a general function for the dynamics., depending on k adjustable parameters  We want to find the particular set of parameters which maximizes the probability  That is, maximize the log-likelyhood function

2 ECE-7000: Nonlinear Dynamical Systems 12.5.1 Overfitting and model costs Cost function (continued)  If the errors are Gaussian distribution with variance, the probability of the data is  Estimating the variance by we obtain which is maximal when the mean squared error is minimal.  The complexity of a model is now taken into account by adding a term- to the log-likelyhood function  The choice of and modified log-likelihood function  = 1, redundant parameters may lead to better predictions.  = 1/2, If the main interest is the model itselt.

3 ECE-7000: Nonlinear Dynamical Systems Cost function (continued)  If we have N data points, the number of relevant bits in each parameter will typically scale as, and the total description of length is  The modified log-likelihood functions are  Mean squares cost function for Gaussian errors 12.5.1 Overfitting and model costs (Akaike) (Rissanen) (Akaike) (Rissanen)

4 ECE-7000: Nonlinear Dynamical Systems 12.5.2 The errors-in-variables problem Error-in-variables problem  Occurs when an ordinary least squares procedure is applied when the independent and dependent variables are noisy.  Over 10% noise level, the consequence become obvious. Solution to Error-in-variables problem  Treat all variables in a symmetric way.  Consider the sum of the orthogonal distance  Fit the surface to a collection of point, and ignoring the fact that the points are not mutually independent.  Lead to problem when the noise level is over 15~20% Other possible cost function  auto-synchronisation  This is done dynamically while the model is coupled to the data and thus constitutes a very attractive concept for real application. The best predictor is always found be the minization of the prediction error.

5 ECE-7000: Nonlinear Dynamical Systems 12.6 Model verification Other model verification method  Up to now, the actual value of the cost function at its minimum is only rough indicator for model verification.  The simplest step is to sub-divide the data set into a training set and a test set.  The fit of the parameters is performed on the data of the training set, and the resulting F is inserted into the cost function and its value is computed for the test set.  If the error is larger on the second set, something has gone wrong. When forecast error is weak  The difference between embedding dimension and the attractor dimension is large, there is much freedom to construct dynamical equations.  Iteration points would escape from the observed attractor. Iterate equations  The most severe case  Select one point as initial condition, create one’s tractory by fitted model.  In ideal case, this attractor should look like a skeleton in the body of noise observation.

6 ECE-7000: Nonlinear Dynamical Systems The stochastic counterpart of the equation of motion of a deterministic dynamical system in continuous time is Langevin eqaution: (1) where the phase space vector and a deterministic vector field in, noise term composed of a - dependent dimensional tensor G, white noise process. Since the solution of equation (1) is highly unstable, the time evolution of phase space densities are studied. The equation of motion for the phase space density is called the Fokker-Planck equation: 12.7.1 Fokker-Planck equations from data (2)

7 ECE-7000: Nonlinear Dynamical Systems 12.7.1 Fokker-Planck equations from data The transition from the Langevin equation to the Fokker-Planck equation and back is uniquely given by the way the drift term and the diffusion tensor Correspond to the deterministic and stochastic part of the Lengevin equation: If eq (1) and (2) in the previous slide is suitable for a given data set,  Fit the Langevin equation to the observed data  Fit the Fokker-Planck equation to the observed data  Instead of above two method, we can directly exploit the time series data to find estimate the drift-and the diffusion terms of the Fokker-Planck equation  Under the assumption of the time independence of the parameter, drift and diffusion can be determined by the following conditional average:

8 ECE-7000: Nonlinear Dynamical Systems 12.7.1 Fokker-Planck equations from data In practice, since equals only with zero probability, one will exploit the average on a neighborhood of with a suitable neighborhood diameter The time interval is limited from the below by the sampling interval of data. So, a useful first order correction in is given by the following estimate of the diffusion term: In these expression, the knowledge of in physical units in not required for existent estimates, both and can be measured in arbitrary temporal units.

9 ECE-7000: Nonlinear Dynamical Systems 12.7.2 Markov chains in embedding space Both autoregressive models and deterministic dynamical systems can be regarded as special classes of Markov models. Rather than providing a unique future state, probability density for the occurrence of the future states is obtained.  In continuous time, Langevin equation generate a Markov model.  In discrete time and space, simple transition matrix describes Markov model  Univariate Markov model in continuous space but discrete time is often called a Markov chain, where the order of the Markov chain denotes how many past time steps are needed in order to define the current state vector. Symbolic dynamics :  Time discrete Markov models with a discrete state space occur  The big issue is to find a partitioning of state space  The probabilistic nature of the dynamics is introduced through a coarse gaining  Scalar real valued Markov chain of order  produces a sequence of random variables  Its dynamics is fully characterised by all transition probability densities from the last random variables onto the next one

10 ECE-7000: Nonlinear Dynamical Systems 12.7.3 No embedding for Markov chains Assuming the followings:  A reasonable description of a noisy dynamical system is a vector valued Langevin equation.  The measurement apparatus provides us with a time series of a single observable only. We can say that Langevin dynamics generates a Markov process in continuous time in the original state space.  Scalar, real valued time series represent a Markov chain of some finite order.  Like an embedding theorem, the order of the Markov chain were related to the dimensionality of the original space.  It is wrong. However, the memory is decaying fast, so that a finite order Markov chain may be a reasonable approximation, where the errors made by ignoring some temporal correlation with the past are then smaller than other modeling errors resulting from the fitness of the data base.


Download ppt "ECE-7000: Nonlinear Dynamical Systems 12.5.1 Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted."

Similar presentations


Ads by Google