Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.

Similar presentations


Presentation on theme: "Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular."— Presentation transcript:

1 Regression Usman Roshan CS 675 Machine Learning

2 Regression Same problem as classification except that the target variable y i is continuous. Popular solutions – Linear regression (perceptron) – Support vector regression – Logistic regression (for regression)

3 Linear regression Suppose target values are generated by a function y i = f(x i ) + e i We will estimate f(x i ) by g(x i,θ). Suppose each e i is being generated by a Gaussian distribution with 0 mean and σ 2 variance (same variance for all e i ). This implies that the probability of y i given the input x i and variables θ (denoted as p(y i |x i,θ) is normally distributed with mean g(x i,θ) and variance σ 2.

4 Linear regression Apply maximum likelihood to estimate g(x, θ) Assume each (x i,y i ) i.i.d. Then probability of data given model (likelihood) is P(X|θ) = p(x 1,y 1 )p(x 2,y 2 )…p(x n,y n ) Each p(x i,y i )=p(y i |x i )p(x i ) p(y i |x i ) is normally distributed with meang(x i,θ) and variance σ 2 Maximizing the log likelihood (like for classification) gives us least squares (linear regression)

5 Logistic regression Similar to linear regression derivation Minimize sum of squares between predicted and actual value However – predicted is given by sigmoid function and – y i is constrained in the range [0,1]

6 Support vector regression Makes no assumptions about probability distribution of the data and output (like support vector machine). Change the loss function in the support vector machine problem to the e-sensitive loss to obtain support vector regression

7 Support vector regression Solved by applying Lagrange multipliers like in SVM Solution w is given by a linear combination of support vectors (like in SVM) The solution w can also be used for ranking features. From regularized risk minimization the loss would be

8 Application Prediction of continuous phenotypes in mice from genotype (Predicting unobserved phen…)Predicting unobserved phen Data are vectors x i where each feature takes on values 0, 1, and 2 to denote number of alleles of a particular single nucleotide polymorphism (SNP) Data has about 1500 samples and 12,000 SNPs Output y i is a phenotype value. For example coat color (represented by integers), chemical levels in blood

9 Mouse phenotype prediction from genotype Rank SNPs by Wald test – First perform linear regression y = wx + w 0 – Calculate p-value on w using t-test t-test: (w-w null )/stderr(w)) w null = 0 T-test: w/stderr(w) stderr(w) given by Σ i (y i -wx i -w 0 ) 2 /(x i -mean(x i )) – Rank SNPs by p-values – OR by Σ i (y i -wx i -w 0 ) Rank SNPs by Pearson correlation coefficient Rank SNPs by support vector regression (w vector in SVR) Rank SNPs by ridge regression (w vector) Run SVR and ridge regression on top k ranked SNP under cross-validation.

10 MCH phenotype in mice

11 CD8 phenotype in mice

12 Rice phenotype prediction from genotype Same experimental study as previously Improving the Accuracy of Whole Genome Prediction for Complex Traits Using the Results of Genome Wide Association Studies Improving the Accuracy of Whole Genome Prediction for Complex Traits Using the Results of Genome Wide Association Studies Data has 413 samples and 37,000 SNPs (features) Basic unbiased linear prediction (BLUP) method improved by prior SNP knowledge (given in genome-wide association studies)

13 Days to flower

14 Flag leaf length

15 Panicle length


Download ppt "Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular."

Similar presentations


Ads by Google