MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor

Slides:



Advertisements
Similar presentations
Measures of Dispersion or Measures of Variability
Advertisements

Measures of Central Tendency
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Numerical Descriptive Techniques
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
STA Lecture 131 STA 291 Lecture 13, Chap. 6 Describing Quantitative Data – Measures of Central Location – Measures of Variability (spread)
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Agenda Descriptive Statistics Measures of Spread - Variability.
The field of statistics deals with the collection,
One-Variable Statistics. Descriptive statistics that analyze one characteristic of one sample  Where’s the middle?  How spread out is it?  How do different.
Descriptive Statistics ( )
CHAPTER 1 Exploring Data
One-Variable Statistics
Chapter 1: Exploring Data
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 2: Describing Distributions with Numbers
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Numerical Descriptive Measures
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
Summary Statistics 9/23/2018 Summary Statistics
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
Descriptive Statistics
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
CHAPTER 1 Exploring Data
Describing Quantitative Data with Numbers
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Exploratory Data Analysis
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Measures of Center.
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor Kenneth R. Martin Lecture 7 October 12, 2016 Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Agenda Housekeeping Readings Exam #1 review Chapter 1, 14, 10, 2, & 3 Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Housekeeping Read, Chapter 1.1 – 1.4 Read, Chapter 14.1 – 14.2 Read, Chapter 10.1 Read, Chapter 2 Read, Chapter 3 Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Housekeeping Exam #1 Review Confidential - Kenneth R. Martin

Statistics – Application to Research Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Why collect samples ? Often impractical to collect all the data from the entire population (i.e. U.S. census). Some test methods are destructive – we wouldn’t have any products or services left to ship to a customer! Too expensive to sample the entire population. Don’t have to collect 100% of the population ! We can use inferential statistics to make sound conclusions about the population. Population and Sample Sampling Scheme POPULATION SAMPLE Measure Data! Use data from the SAMPLE to make conclusions about the POPULATION Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Describing the Data Two methods to summarize the data: Graphical - Histogram Analytical - Central Tendency Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Central Tendency A statistical measure which describes how the data is distributed around its central value: which includes the Mean, Median, and Mode. However, Central Tendency does not tell about data Variation / spread. Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Relationship of Central Tendency *** Normal distribution: Mean = Median = Mode Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Frequency Distributions Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Various curves (Different data spreads, common means) Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Various curves (Different means, common data spreads) Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Various Normal Curves Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability - how the data is spread from it’s central value The central tendency does not indicate any levels of variability (dispersion) from the mean. A = {100, 200, 300, 400, 500} B = {50, 150, 300, 450, 550} C = {250, 300, 300, 300, 350} The mean & median of this data are all the same, but the variability of data is different in all data sets. Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability: Can be values from 0 to ∞ (infinity) 0 means no variability of data A large value indicates lots of variability of data Values can never be negative As soon as one value in a data set differs from another, variability exists Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - Range Range (R) = Max. value – Min. value = X H – X L As data set size , the accuracy of using range . Limit the usage of Range to ~ 10 readings. Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability – Range Example A = {100, 200, 300, 400, 500} B = {50, 150, 300, 450, 550} C = {250, 300, 300, 300, 350} RA = ? RB = ? RC = ? Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability So what is the limitation of all three of these Range calculations ? Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - Variance Variance: a measure of the variability of the average squared distance that data points deviate from their mean. Variance calculations include all data points. Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - Variance Sum of Squares (SS): the sum (addition) of the squared deviations of values from their mean. The SS is the numerator of the variance formula. Variance, 2 , for the Population. μ is the population average Variance, S2, for a Sample. M is the sample average. Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Variance - Example A = {100, 200, 300, 400, 500} In this case, notice that the SS of both the population and the sample will be the same Remember: PREMDAS What is 2 ? What is S2 ? Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - Variance What is a big limitation with Variance ? What do you notice about the units of the mean, and the units of Variance ? Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Dispersion – Standard Deviation Also called the Root Mean Square deviation, it is a measure of the spread of the variability of the data; the average distance data deviate from their mean. Calculated by taking the square root of the Variance Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Dispersion – Standard Deviation When the data comes from the “population”, we shall use “” (sigma) to denote the Standard Deviation. The mean value will be represented by the Greek symbol  (mu) The denominator does not have “uncertainty”, thus N When the data comes from a “sample”, we shall use “SD” to denote the Standard Deviation. The mean value will be represented by M or X ( X-bar) The denominator shows “uncertainty”, thus n-1 Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Dispersion – Standard Deviation We typically always want the standard deviation (variance) value to be as small as possible. We typically want to minimize variability ! Standard deviation is always a better measure to precisely describe the data distribution versus range. Other formulas exist for Standard Deviation, but will not be covered. Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Standard Deviation - Example A = {100, 200, 300, 400, 500} What do we notice about the units of Standard Deviation and the units of the mean ?  The Mean and Standard Deviation are typically reported together. Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Standard Deviation - Example B = {50, 150, 300, 450, 550} Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Dispersion – Coefficient of Variation CVar – Allows a comparison of standard deviations when the units of measure are not the same Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Coefficient of Variation - Example Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Box and Whisker Plot – Boxplot Simple graphical tool to summarize data. Need to determine 5 values (five-number summary) from data, to generate a boxplot: Median (2nd Quartile) Maximum data value Minimum data value 1st Quartile (values below 1/4 observations)[whisker end] 3rd Quartile (values below 3/4 observations)[whisker end] Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Process aim = 9.0 minutes Spec = + / - 1.5 minutes n = 125 R = 1.7 Box and Whisker Plot – Boxplot Example Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Box and Whisker Plot - Boxplot Example Inside box is the median value, and approximately 50% of observations Whiskers extend from the box to extreme values Example: Median; n=125: Median = 63rd value = 9.8 Max = 10.7 Min = 9.0 1st Quartile = X 125 * 0.25 ~ X Avg 31 & 32 value = 9.6 3rd Quartile = X 125 * 0.75 ~ X Avg 94 & 95 value = 10.0 Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Box and Whisker Plot - Boxplot Example Long Whiskers denote the existence of values much larger than other values. For this example, mean  median. Other variants exist, i.e. + / - 1.5*IQR [whisker ends], all other points are “outliers” as depicted as asterisks IQR = Inner Quartile Range Q1 Q3 Q2 Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Box and Whisker Plot - Boxplot Example Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - IQR IQR – Interquartile Range IQR = Q3 – Q1 Confidential - Kenneth R. Martin

Confidential - Kenneth R. Martin Statistics Box and Whisker Plot - Boxplot Example For this example, IQR = ? Q1 Q3 Q2 Confidential - Kenneth R. Martin