Download presentation

Presentation is loading. Please wait.

1
Data analysis Lecture 10 Tijl De Bie

2
**Let’s do some real data analysis**

A biologist comes to you and says: “I have some data on breast cancer here, if you analyse it, I will win the Nobel prize” How to start??

3
**Let’s do some real data analysis**

Real data is messy: Missing values… Infer them as the mean of the corresponding feature (this is a basic technique for ‘imputation’) [MATLAB intermezzo]

4
**Let’s do some real data analysis**

What now?? Let’s visualize the data! How?? 9-dimensional! Principal Component Analysis (PCA) [MATLAB intermezzo]

5
**Mathematical intermezzo: PCA**

Two views: Variance maximization Error minimization Solved using eigenvalue problem Do not forget to centre the data (subtract from each feature its mean in the dataset)

6
**Looks interesting… Could we perhaps predict the label from the data?**

I.e., find a rule that says when a cancer is benign and when it’s malignant (important for therapy and more!) Classification! [MATLAB intermezzo]

7
**Mathematical intermezzo: LSR/FDA**

Least Squares Regression (LSR) Solved by means of a system of linear equations Xw=y (approx) Missfit: ||Xw-y||2 the mean squared error Fisher Discriminant Analysis: The same thing, if the labels y are -1/1

8
**Could there be more? Perhaps there are more than 2 clusters?**

Cancers requiring different treatments? Let’s cluster the data! 2-clusters? (Benign vs malign?) More clusters? (Other cancer types?) [MATLAB intermezzo]

Similar presentations

OK

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google