Download presentation

Presentation is loading. Please wait.

Published byAnthony Hogan Modified over 2 years ago

1
Data analysis Lecture 10 Tijl De Bie

2
Lets do some real data analysis st+Cancer+Wisconsin+(Diagnostic)http://archive.ics.uci.edu/ml/datasets/Brea st+Cancer+Wisconsin+(Diagnostic) A biologist comes to you and says: I have some data on breast cancer here, if you analyse it, I will win the Nobel prize How to start??

3
Lets do some real data analysis Real data is messy: –Missing values… – Infer them as the mean of the corresponding feature (this is a basic technique for imputation) [MATLAB intermezzo]

4
Lets do some real data analysis What now?? Lets visualize the data! How?? 9-dimensional! Principal Component Analysis (PCA) [MATLAB intermezzo]

5
Mathematical intermezzo: PCA Two views: –Variance maximization –Error minimization Solved using eigenvalue problem Do not forget to centre the data (subtract from each feature its mean in the dataset)

6
Looks interesting… Could we perhaps predict the label from the data? I.e., find a rule that says when a cancer is benign and when its malignant (important for therapy and more!) Classification! [MATLAB intermezzo]

7
Mathematical intermezzo: LSR/FDA Least Squares Regression (LSR) –Solved by means of a system of linear equations –Xw=y (approx) –Missfit: ||Xw-y|| 2 the mean squared error Fisher Discriminant Analysis: –The same thing, if the labels y are -1/1

8
Could there be more? Perhaps there are more than 2 clusters? Cancers requiring different treatments? Lets cluster the data! 2-clusters? (Benign vs malign?) More clusters? (Other cancer types?) [MATLAB intermezzo]

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google