Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOSTATISTICS Explorative data analysis. Box plot QQ plot Classification analysis Copyright ©2012, Joanna Szyda INTRODUCTION.

Similar presentations


Presentation on theme: "BIOSTATISTICS Explorative data analysis. Box plot QQ plot Classification analysis Copyright ©2012, Joanna Szyda INTRODUCTION."— Presentation transcript:

1 BIOSTATISTICS Explorative data analysis

2 Box plot QQ plot Classification analysis Copyright ©2012, Joanna Szyda INTRODUCTION

3 Explorative data analysis Confirmatory data analysis INDP.0P.132P.265P.397P.530 3460.29991.39384.0478.936514.4663 3470.42651.95786.680915.945827.3269 3480.49912.02846.066413.716622.7103 3490.17391.25154.469511.079318.7735 3500.37121.83655.957514.427723.8408 3510.27271.33363.98848.723814.138 3521.15423.72949.872120.245932.292 3530.31751.76145.67813.82422.7556 3540.17261.21564.46411.281419.679 3550.69352.87038.487319.179130.8544 3560.54982.34337.288717.202228.4123 3570.72762.57787.417716.265625.7423 3580.58792.38767.063317.232828.7312 3590.48062.3397.745218.944431.8284 3600.4812.21667.08717.039827.9577 3610.27691.665.670714.989725.8092 3620.72812.62457.313916.073526.359 3630.34181.67915.619813.56822.6985 3640.37641.70245.270112.586621.5353 3650.58492.19086.230813.381221.5758 Copyright ©2012, Joanna Szyda

4 CONFIRMATORY DATA ANALYSIS formulate a hypothesis determine the maximum I type error select and calculate a statistical test calculate the I type error decision on the hypothesis formulate a hypothesis determine the maximum I type error select and calculate a statistical test calculate the I type error decision on the hypothesis Copyright ©2012, Joanna Szyda

5 John Tukey no preassumed hypothesis use of various analytical tools: o statistical o graphical exploration of data structure identification of the important variables identification of outliers John Tukey no preassumed hypothesis use of various analytical tools: o statistical o graphical exploration of data structure identification of the important variables identification of outliers Copyright ©2012, Joanna Szyda EXPLORATORY DATA ANALYSIS

6 EXAMPLES OF EXPLORATORY DATA ANALYSIS

7 5 NUMBER DATA SUMMARY BOX PLOT - 5 number data summary Copyright ©2012, Joanna Szyda

8 BOX PLOT - 5 number data summary median: 50% data 1 quarile: 25% data 3 quartile: 75% data minimum maximum outlier Copyright ©2012, Joanna Szyda

9 EXAMPLES - box plot

10 Quantile:Quantile plot – comparing distributions distribution 2 quantiles distribution 1 quantiles Copyright ©2012, Joanna Szyda

11 QQ plot of SNP effects comparing − a theoretical distribution N − observed distribution interpretation −points on the y=x line → distributions are equal −steep line → Normal distribution has lower variance QQ plot of SNP effects comparing − a theoretical distribution N − observed distribution interpretation −points on the y=x line → distributions are equal −steep line → Normal distribution has lower variance Copyright ©2012, Joanna Szyda Q:Q plot – comparing distributions

12 QQ plot of SNP effects Comparison of 2 distributions Interpretation? QQ plot of SNP effects Comparison of 2 distributions Interpretation? Copyright ©2012, Joanna Szyda Q:Q plot – comparing distributions

13 CLASSIFICATION ANALYSIS

14 CLASSIFICATION METHODS - k nearest neighbors 1.Classification of observations = allocation of observations to a group 2.Classification based on some variables Training data set = known classification Test data set = unknown classification 3.E.g. Taxonomy of organisms on the basis of measurements Classification of irises based on flower shape Iris setosaIris versicolor Copyright ©2012, Joanna Szyda

15 Training data set sepal lengthsepal widthSpecies 5.13.5Iris-setosa 4.93Iris-setosa 4.73.2Iris-setosa 4.63.1Iris-setosa 53.6Iris-setosa 5.43.9Iris-setosa 4.63.4Iris-setosa 53.4Iris-setosa 4.42.9Iris-setosa 4.93.1Iris-setosa 73.2Iris-versicolor 6.43.2Iris-versicolor 6.93.1Iris-versicolor 5.52.3Iris-versicolor 6.52.8Iris-versicolor 5.72.8Iris-versicolor 6.33.3Iris-versicolor 4.92.4Iris-versicolor 6.62.9Iris-versicolor 5.22.7Iris-versicolor 52 5.93Iris-versicolor 62.2Iris-versicolor 6.12.9Iris-versicolor Iris setosaIris versicolor Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

16 Iris setosaIris versicolor Training data set sepal lengthsepal widthspecies 5.13.5Iris-setosa 4.93Iris-setosa 4.73.2Iris-setosa 4.63.1Iris-setosa 53.6Iris-setosa 5.43.9Iris-setosa 4.63.4Iris-setosa 53.4Iris-setosa 4.42.9Iris-setosa 4.93.1Iris-setosa 73.2Iris-versicolor 6.43.2Iris-versicolor 6.93.1Iris-versicolor 5.52.3Iris-versicolor 6.52.8Iris-versicolor 5.72.8Iris-versicolor 6.33.3Iris-versicolor 4.92.4Iris-versicolor 6.62.9Iris-versicolor 5.22.7Iris-versicolor 52 5.93Iris-versicolor 62.2Iris-versicolor 6.12.9Iris-versicolor Test data set 52.4??? 4.92.6??? Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

17 Training data setk=8 sepal lengthsepal widthspeciesdistancenearest neighbors 5.13.5Iris-setosa 1.22 4.93Iris-setosa 0.37Iris-setosa 4.73.2Iris-setosa 0.73 4.63.1Iris-setosa 0.65 53.6Iris-setosa 1.44 5.43.9Iris-setosa 2.41 4.63.4Iris-setosa 1.16 53.4Iris-setosa 1 4.42.9Iris-setosa 0.61Iris-setosa 4.93.1Iris-setosa 0.5Iris-setosa 73.2Iris-versicolor 4.64 6.43.2Iris-versicolor 2.6 6.93.1Iris-versicolor 4.1 5.52.3Iris-versicolor 0.26Iris-versicolor 6.52.8Iris-versicolor 2.41 5.72.8Iris-versicolor 0.65Iris-versicolor 6.33.3Iris-versicolor 2.5 4.92.4Iris-versicolor 0.01Iris-versicolor 6.62.9Iris-versicolor 2.81 5.22.7Iris-versicolor 0.13Iris-versicolor 52 0.16Iris-versicolor 5.93Iris-versicolor 1.17 62.2Iris-versicolor 1.04 6.12.9Iris-versicolor 1.46 Test data set 52.4??? = Iris-versicolor 4.92.6??? Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

18 Training data setk=8 sepal lengthsepal widthspeciesdistancenearest neighbors 5.13.5Iris-setosa 0.85 4.93Iris-setosa 0.16Iris-setosa 4.73.2Iris-setosa 0.4Iris-setosa 4.63.1Iris-setosa 0.34Iris-setosa 53.6Iris-setosa 1.01 5.43.9Iris-setosa 1.94 4.63.4Iris-setosa 0.73 53.4Iris-setosa 0.65 4.42.9Iris-setosa 0.34Iris-setosa 4.93.1Iris-setosa 0.25Iris-setosa 73.2Iris-versicolor 4.77 6.43.2Iris-versicolor 2.61 6.93.1Iris-versicolor 4.25 5.52.3Iris-versicolor 0.45 6.52.8Iris-versicolor 2.6 5.72.8Iris-versicolor 0.68 6.33.3Iris-versicolor 2.45 4.92.4Iris-versicolor 0.04Iris-versicolor 6.62.9Iris-versicolor 2.98 5.22.7Iris-versicolor 0.1Iris-versicolor 52 0.37Iris-versicolor 5.93Iris-versicolor 1.16 62.2Iris-versicolor 1.37 6.12.9Iris-versicolor 1.53 Test data set 52.4??? = Iris-versicolor 4.92.6??? = Iris setosa Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

19 IRISES – FULL DATA SET categories: I. setosa, I. versicolor, I. virginica 150 individuals decision areas based on petal width and petal length Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

20 EDA Box plotQQ plot Classification methods


Download ppt "BIOSTATISTICS Explorative data analysis. Box plot QQ plot Classification analysis Copyright ©2012, Joanna Szyda INTRODUCTION."

Similar presentations


Ads by Google