Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Biostatistics and Bioinformatics Regression and Correlation.

Similar presentations


Presentation on theme: "Introduction to Biostatistics and Bioinformatics Regression and Correlation."— Presentation transcript:

1 Introduction to Biostatistics and Bioinformatics Regression and Correlation

2 Learning Objectives Regression – estimation of the relationship between variables Linear regression Assessing the assumptions Non-linear regression

3 Learning Objectives Regression – estimation of the relationship between variables Linear regression Assessing the assumptions Non-linear regression Correlation Correlation coefficient quantifies the association strength Sensitivity to the distribution

4 Relationships Relationship No Relationship

5 Relationships Linear RelationshipsNon-Linear Relationship

6 Relationships Linear, StrongLinear, Weak

7 Linear Regression Linear, StrongLinear, WeakNon-Linear

8 Linear Regression - Residuals Linear, StrongLinear, WeakNon-Linear Residuals

9 Linear Regression Model Linear component Intercept Slope Random Error Dependent Variable Independent Variable Random Error component

10 Linear Regression Assumptions The relationship between the variables is linear.

11 Linear Regression Assumptions The relationship between the variables is linear. Errors are independent, normally distributed with mean zero and constant variance.

12 Linear Regression Assumptions LinearNon-Linear Residuals

13 Linear Regression Assumptions Constant VarianceVariable Variance Residuals

14 Linear Regression Model Linear component Intercept Slope Random Error Dependent Variable Independent Variable Random Error component

15 Linear Regression – Estimating the Line Estimated Intercept Estimated Slope Estimated Value Independent Variable

16 Least Squares Method Find slope and intercept given measurements X i,Y i, i=1..N that minimizes the sum of the squares of the residuals.

17 Least Squares Method Find slope and intercept given measurements X i,Y i, i=1..N that minimizes the sum of the squares of the residuals.

18 Least Squares Method Find slope and intercept given measurements X i,Y i, i=1..N that minimizes the sum of the squares of the residuals.

19 Least Squares Method Find slope and intercept given measurements X i,Y i, i=1..N that minimizes the sum of the squares of the residuals.

20 Linear Regression in Python import scipy.stats as stats slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)

21 Linear Regression Example Linear, Strong Residuals x=np.linspace(-1,1,points) y=x+0.1*np.random.normal(size=points) slope,intercept,r_value,p_value,std_err = stats.linregress(x,y) y_line=slope*x+intercept fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y,color='#4D0132',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) ax1.plot(x,y_line,color='red',lw=2) fig.savefig('linear.png') fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) fig.savefig('linear-residuals.png')

22 Linear Regression Example x=np.linspace(-1,1,points) y=x+0.4*np.random.normal(size=points) slope,intercept,r_value,p_value,std_err = stats.linregress(x,y) y_line=slope*x+intercept fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y,color='#4D0132',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) ax1.plot(x,y_line,color='red',lw=2) fig.savefig('linear-weak.png') fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) fig.savefig('linear-weak-residuals.png') Linear, Weak Residuals

23 Linear Regression Example Outlier

24 Regression – Non-linear data Solution 1: Transformation Solution 2: Non-linear Regression

25 Correlation Coefficient A measure of the correlation between the two variables Quantifies the association strength Pearson correlation coefficient:

26 Correlation Coefficient

27

28

29

30

31 Source: Wikipedia

32 Coefficient of Variation Variance Sample Mean Coefficient of Variation (CV)

33 Correlation Coefficient and CV Uniform distribution

34 Correlation Coefficient and CV Uniform distributionNormal distributionLognormal distribution

35 Correlation Coefficient - Outliers Outlier

36 Correlation Coefficient – Non-linear Solutions: Transformation Rank correlation (Spearman, r=0.93)

37 Correlation Coefficient and p-value Hypothesis: Is there a correlation? r rr p pp

38 Application: Analytical Measurements Theoretical Concentration Measured Concentration

39 A Few Characteristics of Analytical Measurements Accuracy: Closeness of agreement between a test result and an accepted reference value. Precision: Closeness of agreement between independent test results. Robustness: Test precision given small, deliberate changes in test conditions (preanalytic delays, variations in storage temperature). Lower limit of detection: The lowest amount of analyte that is statistically distinguishable from background or a negative control. Limit of quantification: Lowest and highest concentrations of analyte that can be quantitatively determined with suitable precision and accuracy. Linearity: The ability of the test to return values that are directly proportional to the concentration of the analyte in the sample.

40 Limit of Detection and Linearity Theoretical Concentration Measured Concentration

41 Precision and Accuracy Theoretical Concentration Measured Concentration

42 Summary - Regression Source: http://xkcdsw.com/content/img/2274.png

43 Summary - Correlation

44 Next Lecture: Experimental Design & Analysis Experimental Design by Christine Ambrosino www.hawaii.edu/fishlab/Nearside.htm


Download ppt "Introduction to Biostatistics and Bioinformatics Regression and Correlation."

Similar presentations


Ads by Google