Download presentation

Presentation is loading. Please wait.

1
Data analysis Lecture 10 Tijl De Bie

2
**Let’s do some real data analysis**

A biologist comes to you and says: “I have some data on breast cancer here, if you analyse it, I will win the Nobel prize” How to start??

3
**Let’s do some real data analysis**

Real data is messy: Missing values… Infer them as the mean of the corresponding feature (this is a basic technique for ‘imputation’) [MATLAB intermezzo]

4
**Let’s do some real data analysis**

What now?? Let’s visualize the data! How?? 9-dimensional! Principal Component Analysis (PCA) [MATLAB intermezzo]

5
**Mathematical intermezzo: PCA**

Two views: Variance maximization Error minimization Solved using eigenvalue problem Do not forget to centre the data (subtract from each feature its mean in the dataset)

6
**Looks interesting… Could we perhaps predict the label from the data?**

I.e., find a rule that says when a cancer is benign and when it’s malignant (important for therapy and more!) Classification! [MATLAB intermezzo]

7
**Mathematical intermezzo: LSR/FDA**

Least Squares Regression (LSR) Solved by means of a system of linear equations Xw=y (approx) Missfit: ||Xw-y||2 the mean squared error Fisher Discriminant Analysis: The same thing, if the labels y are -1/1

8
**Could there be more? Perhaps there are more than 2 clusters?**

Cancers requiring different treatments? Let’s cluster the data! 2-clusters? (Benign vs malign?) More clusters? (Other cancer types?) [MATLAB intermezzo]

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google