Presentation is loading. Please wait.

Presentation is loading. Please wait.

MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor

Similar presentations


Presentation on theme: "MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor"— Presentation transcript:

1 MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Kenneth R. Martin Lecture 7 October 12, 2016 Confidential - Kenneth R. Martin

2 Confidential - Kenneth R. Martin
Agenda Housekeeping Readings Exam #1 review Chapter 1, 14, 10, 2, & 3 Confidential - Kenneth R. Martin

3 Confidential - Kenneth R. Martin
Housekeeping Read, Chapter 1.1 – 1.4 Read, Chapter 14.1 – 14.2 Read, Chapter 10.1 Read, Chapter 2 Read, Chapter 3 Confidential - Kenneth R. Martin

4 Confidential - Kenneth R. Martin
Housekeeping Exam #1 Review Confidential - Kenneth R. Martin

5 Statistics – Application to Research
Confidential - Kenneth R. Martin

6 Confidential - Kenneth R. Martin
Statistics Why collect samples ? Often impractical to collect all the data from the entire population (i.e. U.S. census). Some test methods are destructive – we wouldn’t have any products or services left to ship to a customer! Too expensive to sample the entire population. Don’t have to collect 100% of the population ! We can use inferential statistics to make sound conclusions about the population. Population and Sample Sampling Scheme POPULATION SAMPLE Measure Data! Use data from the SAMPLE to make conclusions about the POPULATION Confidential - Kenneth R. Martin

7 Confidential - Kenneth R. Martin
Statistics Describing the Data Two methods to summarize the data: Graphical - Histogram Analytical - Central Tendency Confidential - Kenneth R. Martin

8 Confidential - Kenneth R. Martin
Statistics Central Tendency A statistical measure which describes how the data is distributed around its central value: which includes the Mean, Median, and Mode. However, Central Tendency does not tell about data Variation / spread. Confidential - Kenneth R. Martin

9 Confidential - Kenneth R. Martin
Statistics Relationship of Central Tendency *** Normal distribution: Mean = Median = Mode Confidential - Kenneth R. Martin

10 Confidential - Kenneth R. Martin
Statistics Frequency Distributions Confidential - Kenneth R. Martin

11 Confidential - Kenneth R. Martin
Statistics Various curves (Different data spreads, common means) Confidential - Kenneth R. Martin

12 Confidential - Kenneth R. Martin
Statistics Various curves (Different means, common data spreads) Confidential - Kenneth R. Martin

13 Confidential - Kenneth R. Martin
Statistics Various Normal Curves Confidential - Kenneth R. Martin

14 Confidential - Kenneth R. Martin
Statistics Measures of Variability - how the data is spread from it’s central value The central tendency does not indicate any levels of variability (dispersion) from the mean. A = {100, 200, 300, 400, 500} B = {50, 150, 300, 450, 550} C = {250, 300, 300, 300, 350} The mean & median of this data are all the same, but the variability of data is different in all data sets. Confidential - Kenneth R. Martin

15 Confidential - Kenneth R. Martin
Statistics Measures of Variability: Can be values from 0 to ∞ (infinity) 0 means no variability of data A large value indicates lots of variability of data Values can never be negative As soon as one value in a data set differs from another, variability exists Confidential - Kenneth R. Martin

16 Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - Range Range (R) = Max. value – Min. value = X H – X L As data set size , the accuracy of using range . Limit the usage of Range to ~ 10 readings. Confidential - Kenneth R. Martin

17 Confidential - Kenneth R. Martin
Statistics Measures of Variability – Range Example A = {100, 200, 300, 400, 500} B = {50, 150, 300, 450, 550} C = {250, 300, 300, 300, 350} RA = ? RB = ? RC = ? Confidential - Kenneth R. Martin

18 Confidential - Kenneth R. Martin
Statistics Measures of Variability So what is the limitation of all three of these Range calculations ? Confidential - Kenneth R. Martin

19 Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - Variance Variance: a measure of the variability of the average squared distance that data points deviate from their mean. Variance calculations include all data points. Confidential - Kenneth R. Martin

20 Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - Variance Sum of Squares (SS): the sum (addition) of the squared deviations of values from their mean. The SS is the numerator of the variance formula. Variance, 2 , for the Population. μ is the population average Variance, S2, for a Sample. M is the sample average. Confidential - Kenneth R. Martin

21 Confidential - Kenneth R. Martin
Statistics Variance - Example A = {100, 200, 300, 400, 500} In this case, notice that the SS of both the population and the sample will be the same Remember: PREMDAS What is 2 ? What is S2 ? Confidential - Kenneth R. Martin

22 Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - Variance What is a big limitation with Variance ? What do you notice about the units of the mean, and the units of Variance ? Confidential - Kenneth R. Martin

23 Confidential - Kenneth R. Martin
Statistics Measures of Dispersion – Standard Deviation Also called the Root Mean Square deviation, it is a measure of the spread of the variability of the data; the average distance data deviate from their mean. Calculated by taking the square root of the Variance Confidential - Kenneth R. Martin

24 Confidential - Kenneth R. Martin
Statistics Measures of Dispersion – Standard Deviation When the data comes from the “population”, we shall use “” (sigma) to denote the Standard Deviation. The mean value will be represented by the Greek symbol  (mu) The denominator does not have “uncertainty”, thus N When the data comes from a “sample”, we shall use “SD” to denote the Standard Deviation. The mean value will be represented by M or X ( X-bar) The denominator shows “uncertainty”, thus n-1 Confidential - Kenneth R. Martin

25 Confidential - Kenneth R. Martin
Statistics Measures of Dispersion – Standard Deviation We typically always want the standard deviation (variance) value to be as small as possible. We typically want to minimize variability ! Standard deviation is always a better measure to precisely describe the data distribution versus range. Other formulas exist for Standard Deviation, but will not be covered. Confidential - Kenneth R. Martin

26 Confidential - Kenneth R. Martin
Statistics Standard Deviation - Example A = {100, 200, 300, 400, 500} What do we notice about the units of Standard Deviation and the units of the mean ?  The Mean and Standard Deviation are typically reported together. Confidential - Kenneth R. Martin

27 Confidential - Kenneth R. Martin
Statistics Standard Deviation - Example B = {50, 150, 300, 450, 550} Confidential - Kenneth R. Martin

28 Confidential - Kenneth R. Martin
Statistics Measures of Dispersion – Coefficient of Variation CVar – Allows a comparison of standard deviations when the units of measure are not the same Confidential - Kenneth R. Martin

29 Confidential - Kenneth R. Martin
Statistics Coefficient of Variation - Example Confidential - Kenneth R. Martin

30 Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot – Boxplot Simple graphical tool to summarize data. Need to determine 5 values (five-number summary) from data, to generate a boxplot: Median (2nd Quartile) Maximum data value Minimum data value 1st Quartile (values below 1/4 observations)[whisker end] 3rd Quartile (values below 3/4 observations)[whisker end] Confidential - Kenneth R. Martin

31 Confidential - Kenneth R. Martin
Statistics Process aim = 9.0 minutes Spec = + / minutes n = 125 R = 1.7 Box and Whisker Plot – Boxplot Example Confidential - Kenneth R. Martin

32 Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot - Boxplot Example Inside box is the median value, and approximately 50% of observations Whiskers extend from the box to extreme values Example: Median; n=125: Median = 63rd value = 9.8 Max = 10.7 Min = 9.0 1st Quartile = X 125 * 0.25 ~ X Avg 31 & 32 value = 9.6 3rd Quartile = X 125 * 0.75 ~ X Avg 94 & 95 value = 10.0 Confidential - Kenneth R. Martin

33 Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot - Boxplot Example Long Whiskers denote the existence of values much larger than other values. For this example, mean  median. Other variants exist, i.e. + / - 1.5*IQR [whisker ends], all other points are “outliers” as depicted as asterisks IQR = Inner Quartile Range Q1 Q3 Q2 Confidential - Kenneth R. Martin

34 Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot - Boxplot Example Confidential - Kenneth R. Martin

35 Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - IQR IQR – Interquartile Range IQR = Q3 – Q1 Confidential - Kenneth R. Martin

36 Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot - Boxplot Example For this example, IQR = ? Q1 Q3 Q2 Confidential - Kenneth R. Martin


Download ppt "MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor"

Similar presentations


Ads by Google