Classification and risk prediction

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Chapter 4: Linear Models for Classification
Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Machine Learning CMPT 726 Simon Fraser University
Decision Theory Naïve Bayes ROC Curves
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Genome-wide association studies Usman Roshan. Recap Single nucleotide polymorphism Genome wide association studies –Relative risk, odds risk (or odds.
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Principles of Pattern Recognition
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
1 E. Fatemizadeh Statistical Pattern Recognition.
Regression Usman Roshan CS 698 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Review of statistical modeling and probability theory Alan Moses ML4bio.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Classifier Team teaching.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Applied statistics Usman Roshan.
Lecture 2. Bayesian Decision Theory
Regression Usman Roshan.
Usman Roshan CS 675 Machine Learning
Probability Theory and Parameter Estimation I
Empirical risk minimization
CH 5: Multivariate Methods
Classification of unlabeled data:
Ying shen Sse, tongji university Sep. 2016
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Estimation
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
Generally Discriminant Analysis
Regression Usman Roshan.
Parametric Methods Berlin Chen, 2005 References:
Empirical risk minimization
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Bayesian Decision Theory
Probabilistic Surrogate Models
ECE – Pattern Recognition Lecture 4 – Parametric Estimation
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Classification and risk prediction Usman Roshan

Disease risk prediction What is the best method to predict disease risk? (Focus on this today) Which SNPs and non-genetic variables best predict risk?

Review Chi-square statistic for ranking SNPs Logistic regression model for ranking SNPs and classifying disease risk Supervised learning Empirical risk minimization Maximum likelihood Least squares

Evaluating probabilities Under logistic regression model the probability of disease given genotype is Receiver Operating Characteristic Curve (ROC) Plot true positive rate against false positive rate Area under ROC curve is probability that a classifer will rank a positive higher than negative

True and false positives Actual value P N P’ True positive (TP) False positive (FP) N’ False negative (FN) True negative (TN) Predicted value True positive rate (TPR)=TP/P False positive rate (FPR)=FP/N

Classification: Bayesian learning Bayes rule: To classify a given datapoint x we select the model (class) Mi with the highest P(Mi|x) The denominator is a normalizing term and does not affect the classification. Therefore P(x|M) is called the likelihood and P(M) is the prior probability. To classify a given datapoint x we need to know the likelihood and the prior. If priors P(M) are uniform (the same) then finding the model that maximizes P(M|D) is the same as finding M that maximizes the likelihood P(D|M).

Gaussian models Assume that class likelihood is represented by a Gaussian distribution with parameters  (mean) and  (standard deviation) We find the model (in other words mean and variance) that maximize the likelihood (or equivalently the log likelihood). Suppose we are given training points x1,x2,…,xn1 from class C1. Assuming that each datapoint is drawn independently from C1 the sample log likelihood is

Gaussian models The log likelihood is given by By setting the first derivatives dP/d1 and dP/d1 to 0. This gives us the maximum likelihood estimate of 1 and 1 (denoted as m1 and s1 respectively) Similarly we determine m2 and s2 for class C2.

Gaussian models After having determined class parameters for C1 and C2 we can classify a given datapoint by evaluating P(x|C1) and P(x|C2) and assigning it to the class with the higher likelihood (or log likelihood). The likelihood can also be used as a loss function and has an equivalent representation in empirical risk minimization.

Gaussian classification example Consider one SNP genotype for case and control subjects. Case (class C1): 1, 1, 2, 1, 0, 2 Control (class C2): 0, 1, 0, 0, 1, 1 Under the Gaussian assumption case and control classes are represented by Gaussian distributions with parameters (1, 1) and (2, 2) respectively. The maximum likelihood estimates of means are

Gaussian classification example The estimates of class standard deviations are Similarly s2=.25 Which class does x=1 belong to? What about x=0 and x=2? What happens if class variances are equal?

Multivariate Gaussian classification Suppose each datapoint is a m-dimensional vector. In the previous example we would have m SNP genotypes instead of one. The class likelihood is given by where 1 is the class covariance matrix. 1 is of dimensional d x d. The (i,j)th entry of 1 is the covariance of the ith and jth variable.

Multivariate Gaussian classification The maximum likelihood estimates of  and  are The class log likelihoods with estimated parameters (ignoring constant terms) are

Multivariate Gaussian classification If S1=S2 then the class log likelihoods with estimated parameters (ignoring constant terms) are Depends on distance to means.

Naïve Bayes algorithm If we assume that variables are independent (no interaction between SNPs) then the off-diagonal terms of S are zero and the log likelihood becomes (ignoring constant terms)

Nearest means classifier If we assume all variances sj to be equal then (ignoring constant terms) we get

Gaussian classification example Consider three SNP genotype for case and control subjects. Case (class C1): (1,2,0), (2,2,0), (2,2,0), (2,1,1), (0,2,1), (2,1,0) Control (class C2): (0,1,2), (1,1,1), (1,0,2), (1,0,0), (0,0,2), (0,1,0) Classify (1,2,1) and (0,0,1) with the nearest means classifier

Discriminative learning No assumptions about the underlying model Support vector machine: optimally separating hyperplane between two sets of points

Support vector machine: optimally separating hyperplane

SVMs