Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology

Similar presentations


Presentation on theme: "COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology"— Presentation transcript:

1 COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology http://www.cse.ust.hk/~lzhang/ Can be used as cheat sheet

2 Page 2 Pre-Midterm l Algorithms for supervised learning n Decision trees n Instance-based learning n Naïve Bayes classifiers n Neural networks n Support vector machines l General issues regarding supervised learning n Classification error and confidence interval n Bias-Variance tradeoff n PAC learning theory

3 Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning

4 Clustering

5 Distance/Similarity Measures

6 Distance-Based Clustering l Partitional and Hierarchical clustering

7 K-Means: Partitional Clustering

8 l Different initial points might lead to different partitions l Solution: n Multiple runs, n Use evaluation criteria such as SSE to pick the best one

9 Hierarchical Clustering l Agglomerative and Divisive

10 Cluster Similarity

11 Cluster Validation l External indices n Entropy: Average purity of clusters obtained n Mutual Information between class label and cluster label

12 Cluster Validation l External Measure n Jaccard Index n Rand Index Measure similarity between two relationships: in-same-class & in-same-cluster # pairs in same cluster# pairs in diff cluster # pairs w/ same labelab # pairs w/ diff labelcd

13 Cluster Validation l Internal Measure n Dunn’s index

14 Cluster Validation l Internal Measure

15 Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning

16 Model-Based Clustering l Assume data generated from a mixture model with K components l Estimate parameters of the model from data l Assign objects to clusters based posterior probability: Soft Assignment

17 Gaussian Mixtures

18 Learning Gaussian Mixture Models

19 EM

20

21 l l(t): Log likelihood of model after t-th iteration l l(t): increases monotonically with t l But might go to infinite in case of singularity n Solution: place bound on eigen values of covariance matrix l Local maximum n Multiple restart n Use likelihood to pick best model

22 EM and K-Means l K-Means is hard-assignment EM

23 Mixture Variable for Discrete Data

24 Latent Class Model

25 Learning Latent Class Models Always converges

26 Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning

27 Dimension Reduction l Necessary because there are data sets with large numbers of attributes that are difficult to learning algorithms to handle.

28 Principal Component Analysis

29

30

31

32 PCA Solution

33 PCA Illustration

34 Eigenvalues and Projection Error

35 Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning

36 Reinforcement Learning

37 Markov Decision Process l A model of how agent interact with its environment

38 Markov Decision Process

39 Value Iteration

40 Reinforcement Learning

41 Q-Learning

42 l From Q-function based value iteration l Ideas n In-place/asynchronous value iteration n Approximate expectation using samples n ε-greedy policy (for exploration/exploitation) tradeoff

43 Time Difference Learning

44 Sarsa is also time difference learning

45 Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning

46 Ensemble Learning

47 Bagging: Reduce Variance

48 Boosting: Reduce Classification Error

49 AdaBoost: Exponential Error


Download ppt "COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology"

Similar presentations


Ads by Google