BIOSTATISTICS Explorative data analysis. Box plot QQ plot Classification analysis Copyright ©2012, Joanna Szyda INTRODUCTION.

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Biostatistics-Lecture 4 More about hypothesis testing Ruibin Xi Peking University School of Mathematical Sciences.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Measures of Dispersion
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Chapter 2 Simple Comparative Experiments
Chapter In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries.
Chapter 5 – 1 Chapter 5: Measures of Variability The Importance of Measuring Variability The Range IQR (Inter-Quartile Range) Variance Standard Deviation.
The Five-Number Summary And Boxplots. Chapter 3 – Section 5 ●Learning objectives  Compute the five-number summary  Draw and interpret boxplots 1 2.
Box and Whisker Plots A Modern View of the Data. History Lesson In 1977, John Tukey published an efficient method for displaying a five-number data summary.
BOX PLOTS/QUARTILES. QUARTILES: 3 points in a set of data that separate the set into 4 equal parts. Lower Quartile: Q1 (The median for the lower half.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
Lecture 8 Distributions Percentiles and Boxplots Practical Psychology 1.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Kayla Jordan D. Wayne Mitchell RStats Institute Missouri State University.
First Quantitative Variable: Ear Length  The unit of measurement for this variable is INCHES.  A few possible values for this first quantitative variable.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
Slide 1 Statistics Workshop Tutorial 6 Measures of Relative Standing Exploratory Data Analysis.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Tan,Steinbach, Kumar: Exploratory Data Analysis (with modifications by Ch. Eick) Data Mining: “New” Teaching Road Map 1. Introduction to Data Mining and.
 Create a PowerPoint from template using R software R and ReporteRs package Isaac Newton1/4.
Continued… Obj: draw Box-and-whisker plots representing a set of data Do now: use your calculator to find the mean for 85, 18, 87, 100, 27, 34, 93, 52,
1 Further Maths Chapter 2 Summarising Numerical Data.
1 Results from Lab 0 Guessed values are biased towards the high side. Judgment sample means are biased toward the high side and are more variable.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
IE 585 Competitive Network – Learning Vector Quantization & Counterpropagation.
Box and Whisker Plots. Introduction: Five-number Summary Minimum Value (smallest number) Lower Quartile (LQ) Median (middle number) Upper Quartile (UP)
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 5 – Slide 1 of 21 Chapter 3 Section 5 The Five-Number Summary And Boxplots.
MIS 451 Building Business Intelligence Systems Demo on Classification and Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Various Topics of Interest to the Inquiring Orthopedist Richard Gerkin, MD, MS BGSMC GME Research.
Descriptive Statistics Chapter 2. § 2.5 Measures of Position.
Using Measures of Position (rather than value) to Describe Spread? 1.
1 WHY WE USE EXPLORATORY DATA ANALYSIS DATA YES NO ESTIMATES BASED ON NORMAL DISTRIB. KURTOSIS, SKEWNESS TRANSFORMATIONS QUANTILE (ROBUST) ESTIMATES OUTLIERS.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
BIOSTATISTICS Statistical tests part IV: nonparametric tests.
What is a box-and-whisker plot? 5-number summary Quartile 1 st, 2 nd, and 3 rd quartiles Interquartile Range Outliers.
BIOSTATISTICS Hypotheses testing and parameter estimation.
BIOSTATISTICS Analysis of Variance (ANOVA). Copyright ©2012, Joanna Szyda INTRODUCTION 1.Applicability 2.One way analysis of variance 3.Partitioning the.
3/13/2016 Data Mining 1 Lecture 2-1 Data Exploration: Understanding Data Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB)
Introduction Exploring Categorical Variables Exploring Numerical Variables Exploring Categorical/Numerical Variables Selecting Interesting Subsets of Data.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Gilad Lerman Math Department, UMN
Exploring Data: Summary Statistics and Visualizations
Chapter 16: Exploratory data analysis: numerical summaries
Exploring, Displaying, and Examining Data
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Erich Smith Coleman Platt
Calculating Median and Quartiles
Chapter 16: Exploratory data analysis: Numerical summaries
STATISTICS ELEMENTARY MARIO F. TRIOLA
Exploratory Data Analysis (EDA)
Chapter 2 Simple Comparative Experiments
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Bar graphs are used to compare things between different groups
Summary Statistics 9/23/2018 Summary Statistics
Data Mining: Exploring Data
Statistics and Data (Algebraic)
DESIGN OF EXPERIMENT (DOE)
Data exploration and visualization
Introductory Statistics
Number Summaries and Box Plots.
STAT 515 Statistical Methods I Sections
Presentation transcript:

BIOSTATISTICS Explorative data analysis

Box plot QQ plot Classification analysis Copyright ©2012, Joanna Szyda INTRODUCTION

Explorative data analysis Confirmatory data analysis INDP.0P.132P.265P.397P Copyright ©2012, Joanna Szyda

CONFIRMATORY DATA ANALYSIS formulate a hypothesis determine the maximum I type error select and calculate a statistical test calculate the I type error decision on the hypothesis formulate a hypothesis determine the maximum I type error select and calculate a statistical test calculate the I type error decision on the hypothesis Copyright ©2012, Joanna Szyda

John Tukey no preassumed hypothesis use of various analytical tools: o statistical o graphical exploration of data structure identification of the important variables identification of outliers John Tukey no preassumed hypothesis use of various analytical tools: o statistical o graphical exploration of data structure identification of the important variables identification of outliers Copyright ©2012, Joanna Szyda EXPLORATORY DATA ANALYSIS

EXAMPLES OF EXPLORATORY DATA ANALYSIS

5 NUMBER DATA SUMMARY BOX PLOT - 5 number data summary Copyright ©2012, Joanna Szyda

BOX PLOT - 5 number data summary median: 50% data 1 quarile: 25% data 3 quartile: 75% data minimum maximum outlier Copyright ©2012, Joanna Szyda

EXAMPLES - box plot

Quantile:Quantile plot – comparing distributions distribution 2 quantiles distribution 1 quantiles Copyright ©2012, Joanna Szyda

QQ plot of SNP effects comparing − a theoretical distribution N − observed distribution interpretation −points on the y=x line → distributions are equal −steep line → Normal distribution has lower variance QQ plot of SNP effects comparing − a theoretical distribution N − observed distribution interpretation −points on the y=x line → distributions are equal −steep line → Normal distribution has lower variance Copyright ©2012, Joanna Szyda Q:Q plot – comparing distributions

QQ plot of SNP effects Comparison of 2 distributions Interpretation? QQ plot of SNP effects Comparison of 2 distributions Interpretation? Copyright ©2012, Joanna Szyda Q:Q plot – comparing distributions

CLASSIFICATION ANALYSIS

CLASSIFICATION METHODS - k nearest neighbors 1.Classification of observations = allocation of observations to a group 2.Classification based on some variables Training data set = known classification Test data set = unknown classification 3.E.g. Taxonomy of organisms on the basis of measurements Classification of irises based on flower shape Iris setosaIris versicolor Copyright ©2012, Joanna Szyda

Training data set sepal lengthsepal widthSpecies Iris-setosa 4.93Iris-setosa Iris-setosa Iris-setosa 53.6Iris-setosa Iris-setosa Iris-setosa 53.4Iris-setosa Iris-setosa Iris-setosa 73.2Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor 62.2Iris-versicolor Iris-versicolor Iris setosaIris versicolor Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

Iris setosaIris versicolor Training data set sepal lengthsepal widthspecies Iris-setosa 4.93Iris-setosa Iris-setosa Iris-setosa 53.6Iris-setosa Iris-setosa Iris-setosa 53.4Iris-setosa Iris-setosa Iris-setosa 73.2Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor 62.2Iris-versicolor Iris-versicolor Test data set 52.4??? ??? Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

Training data setk=8 sepal lengthsepal widthspeciesdistancenearest neighbors Iris-setosa Iris-setosa 0.37Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa 0.61Iris-setosa Iris-setosa 0.5Iris-setosa 73.2Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor 0.26Iris-versicolor Iris-versicolor Iris-versicolor 0.65Iris-versicolor Iris-versicolor Iris-versicolor 0.01Iris-versicolor Iris-versicolor Iris-versicolor 0.13Iris-versicolor Iris-versicolor 5.93Iris-versicolor Iris-versicolor Iris-versicolor 1.46 Test data set 52.4??? = Iris-versicolor ??? Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

Training data setk=8 sepal lengthsepal widthspeciesdistancenearest neighbors Iris-setosa Iris-setosa 0.16Iris-setosa Iris-setosa 0.4Iris-setosa Iris-setosa 0.34Iris-setosa 53.6Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa 0.34Iris-setosa Iris-setosa 0.25Iris-setosa 73.2Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor 0.04Iris-versicolor Iris-versicolor Iris-versicolor 0.1Iris-versicolor Iris-versicolor 5.93Iris-versicolor Iris-versicolor Iris-versicolor 1.53 Test data set 52.4??? = Iris-versicolor ??? = Iris setosa Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

IRISES – FULL DATA SET categories: I. setosa, I. versicolor, I. virginica 150 individuals decision areas based on petal width and petal length Copyright ©2012, Joanna Szyda CLASSIFICATION METHODS - k nearest neighbors

EDA Box plotQQ plot Classification methods