Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics. 2007 Instructor:

Similar presentations


Presentation on theme: "Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics. 2007 Instructor:"— Presentation transcript:

1 Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics. 2007 Instructor: Dr. Longin Jan Latecki Slides: QUINCY R WALKER

2 16.1 The Center of the Data Set Center of the Data= sample mean, sample median Mean: x bar n = the sample size Example: Sample mean of the following data is 44.7 43, 43, 41, 41, 41, 42, 43, 58, 58, 41, 41

3 Outliers an outlier is an observation that is numerically distant from the rest of the data

4 Variability in A Data Set Variance: Standard Deviation=sqrt(Var(X)): Where: n=number samples x bar =mean

5 Variability cont. Median of Absolute Deviation (MAD): The Median of the Absolute Deviations of a Sample. Med n = median of sample Absolute Deviation: The absolute value of the distance Of a point x[i] in a data set from the median

6 Empirical quantiles The order statistics consist of the same elements as the original dataset x 1, x 2 x 3,…, x k, but in ascending order. Denote by the kth element in the ordered list. Then: To compute the pth quartile use this formula: F inv (p) where F(p) is the cumulative distribution function

7 Quartiles Lower quartile: qn(.25) Upper quartile: qn(.75) Interquartile Range (IQR) IQR = qn(0.75) − qn(0.25) Median(Middle Quartile): qn(.50)

8 The box-and-whisker plot Advantages: Good representation of statistical data Shows quartiles, median and outliers Disadvantages poor graphical display of the dataset histogram and kernel density estimate are more informative displays of a single dataset

9 Using boxplots to compare several datasets Boxplots become useful if we want to compare several sets of data in a simple graphical display:


Download ppt "Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics. 2007 Instructor:"

Similar presentations


Ads by Google