Presentation is loading. Please wait.

Presentation is loading. Please wait.

Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.

Similar presentations


Presentation on theme: "Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering."— Presentation transcript:

1 Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering

2 Outline Introduction of CWM Introduction of CWM Least-Mean-Square Training of CWM Least-Mean-Square Training of CWM Experiments Experiments Summary Summary Future work Future work Q&A Q&A

3 Cluster-Weighted Modeling (CWM) CWM is a supervised learning model which are based on the joint probability density estimation of a set of input and output (target) data. CWM is a supervised learning model which are based on the joint probability density estimation of a set of input and output (target) data. The joint probability is expended into clusters which describe local subspaces well. Each local Gaussian expert can have its own local function The joint probability is expended into clusters which describe local subspaces well. Each local Gaussian expert can have its own local function (constant, linear or quadratic function). (constant, linear or quadratic function). The global (nonlinear) model can be constructed by combining all the local models. The global (nonlinear) model can be constructed by combining all the local models. The resulting model has transparent local structures and meaningful parameters. The resulting model has transparent local structures and meaningful parameters.

4 Architecture sdff sdff

5 Prediction calculation Conditional forecast: The expected output given the input. Conditional forecast: The expected output given the input. Conditional error (output uncertainty): The expected output covariance given the input Conditional error (output uncertainty): The expected output covariance given the input

6 Objective function: Log-likelihood function Objective function: Log-likelihood function Initialize cluster means (k-means), variances (maximal range for each dimension). Initialize Initialize cluster means (k-means), variances (maximal range for each dimension). Initialize =1/M. M: Predetermined number of clusters. =1/M. M: Predetermined number of clusters. E-step: Evaluate the posterior probability E-step: Evaluate the posterior probability M-step: M-step: Update clusters means Update clusters means Update prior probability Update prior probability Training (EM Algorithm)

7 M-step ( Cont.) Define cluster-weighted expectation Define cluster-weighted expectation Update cluster-weighted covariance matrices Update cluster-weighted covariance matrices Update cluster parameters which maximizes Update cluster parameters which maximizes the data likelihood the data likelihood where where Update output covariance matrices Update output covariance matrices

8 Least-Mean-Square Training of CWM To train CWM ’ s model parameters from a least- squared perspective. To train CWM ’ s model parameters from a least- squared perspective. Minimizing squared error function of CWM ’ s training result to find another solution which can have a better accuracy. Minimizing squared error function of CWM ’ s training result to find another solution which can have a better accuracy. To find another solution when CWM is trapped in local minima. To find another solution when CWM is trapped in local minima. Applying supervised selection of cluster centers instead of unsupervised method. Applying supervised selection of cluster centers instead of unsupervised method.

9 LMS Learning Algorithm The instantaneous error produced by sample n is The prediction formula is Using softmax function to constrain prior probability to have value between 0 and 1 and their summation equal to 1.

10 LMS Learning Algorithm (cont.) The derivation of gradients: The derivation of gradients:

11 LMS CWM Learning Algorithm Initialization: Initialize Initialization: Initialize Using CWM ’ s training result. Initialize Iterate until convergence: For n=1:N For n=1:N Estimate error Estimate error Estimate gradients Estimate gradients Update Update End EndE-step:M-step:

12 Simple Demo cwm1d cwm1d cwmprdemo cwmprdemo cwm2d cwm2d lms1d lms1d

13 Experiments A simple Sin function. A simple Sin function. LMS-CWM has a better interpolation result. LMS-CWM has a better interpolation result.

14 Mackey-Glass Chaotic Time Series Prediction 1000 data points. We take the first 500 points as training set, the last 500 points are chosen as test set. 1000 data points. We take the first 500 points as training set, the last 500 points are chosen as test set. Single-step prediction Single-step prediction Input: [s(t),s(t-6),s(t-12),s(t-18)] Input: [s(t),s(t-6),s(t-12),s(t-18)] Output: s(t+85) Output: s(t+85) Local linear model Local linear model Number of clusters: 30 Number of clusters: 30

15 Results (1) CWMLMS-CWM

16 Results (2) Learning curve Learning curve CWM LMS CWM MSECWM LMS CWM Test set 0.00080270.0004480 Training set 0.00065680.0004293

17 Local Minima The initial locations of four clusters. The initial locations of four clusters. The initial locations of four clusters The resulting centers ’ locations after each training session of CWM and LMS-CWM.

18 Summary A LMS learning method for CWM is presented. A LMS learning method for CWM is presented. May lose the benefits of data density estimation and characterizing data. May lose the benefits of data density estimation and characterizing data. Provides an alternative training option. Provides an alternative training option. Parameters can be trained by EM and LMS alternatively. Parameters can be trained by EM and LMS alternatively. Combine both advantages of EM and LMS learning. Combine both advantages of EM and LMS learning. LMS-CWM learning can be viewed as a refinement to CWM if only prediction accuracy is our main concern. LMS-CWM learning can be viewed as a refinement to CWM if only prediction accuracy is our main concern.

19 Future work Regularization. Regularization. Comparison between different models (from theoretical, performance point of views) Comparison between different models (from theoretical, performance point of views)

20 Q&A Thank You! Thank You!


Download ppt "Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering."

Similar presentations


Ads by Google