Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.

Slides:



Advertisements
Similar presentations
Class Session #2 Numerically Summarizing Data
Advertisements

Measures of Dispersion or Measures of Variability
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Intro to Descriptive Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
1 Measures of Central Tendency Greg C Elvers, Ph.D.
Measures of Central Tendency
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Describing Data: Numerical
Describing Data Using Numerical Measures
Chapter 3 Descriptive Measures
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
LECTURE 6 TUESDAY, 10 FEBRUARY 2008 STA291. Administrative Suggested problems from the textbook (not graded): 4.2, 4.3, and 4.4 Check CengageNow for second.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
STA Lecture 111 STA 291 Lecture 11 Describing Quantitative Data – Measures of Central Location Examples of mean and median –Review of Chapter 5.
Numerical Descriptive Techniques
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Lecture 3 Dustin Lueker. 2  Suppose the population can be divided into separate, non-overlapping groups (“strata”) according to some criterion ◦ Select.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Created by Tom Wegleitner, Centreville, Virginia Section 2-4 Measures of Center.
Describing distributions with numbers
Lecture 3 Describing Data Using Numerical Measures.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
THURSDAY, 24 SEPTEMBER 2009 STA291. Announcement Exam 1: September 30 th at 5pm to 7pm. Location MEH, Memorial Auditoriam. The make-up will be at 7:30pm.
Lecture 2 Dustin Lueker.  Center of the data ◦ Mean ◦ Median ◦ Mode  Dispersion of the data  Sometimes referred to as spread ◦ Variance, Standard deviation.
Descriptive Statistics: Presenting and Describing Data.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures (Summary Measures) Basic Business Statistics.
Summary Statistics: Measures of Location and Dispersion.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
CHAPTER 2: Basic Summary Statistics
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
TUESDAY, 22 SEPTEMBER 2009 STA291. Exam 1: September 30 th at 5pm to 7pm. Location MEH, Memorial Auditoriam. The make-up will be at 7:30pm to 9:30pm at.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Descriptive Statistics ( )
Central Tendency and Variability
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Description of Data (Summary and Variability measures)
Numerical Descriptive Measures
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
STA 291 Summer 2008 Lecture 4 Dustin Lueker.
CHAPTER 2: Basic Summary Statistics
STA 291 Spring 2008 Lecture 4 Dustin Lueker.
Numerical Descriptive Measures
Presentation transcript:

Lecture 4 Dustin Lueker

 The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets finer and finer  Similar to the idea of using smaller and smaller rectangles to calculate the area under a curve when learning how to integrate  Symmetric distributions ◦ Bell-shaped ◦ U-shaped ◦ Uniform  Not symmetric distributions: ◦ Left-skewed ◦ Right-skewed ◦ Skewed 2STA 291 Summer 2010 Lecture 4

 Center of the data ◦ Mean ◦ Median ◦ Mode  Dispersion of the data  Sometimes referred to as spread ◦ Variance, Standard deviation ◦ Interquartile range ◦ Range 3STA 291 Summer 2010 Lecture 4

 Mean ◦ Arithmetic average  Median ◦ Midpoint of the observations when they are arranged in order  Smallest to largest  Mode ◦ Most frequently occurring value 4STA 291 Summer 2010 Lecture 4

 Sample size n  Observations x 1, x 2, …, x n  Sample Mean “x-bar” 5STA 291 Summer 2010 Lecture 4

 Population size N  Observations x 1, x 2,…, x N  Population Mean “mu”  Note: This is for a finite population of size N 6STA 291 Summer 2010 Lecture 4

 Requires numerical values ◦ Only appropriate for quantitative data ◦ Does not make sense to compute the mean for nominal variables ◦ Can be calculated for ordinal variables, but this does not always make sense  Should be careful when using the mean on ordinal variables  Example “Weather” (on an ordinal scale) Sun=1, Partly Cloudy=2, Cloudy=3, Rain=4, Thunderstorm=5 Mean (average) weather=2.8  Another example is “GPA = 3.8” is also a mean of observations measured on an ordinal scale 7STA 291 Summer 2010 Lecture 4

 Center of gravity for the data set  Sum of the differences from values above the mean is equal to the sum of the differences from values below the mean ◦ = STA 291 Summer 2010 Lecture 48

 Mean ◦ Sum of observations divided by the number of observations  Example ◦ {7, 12, 11, 18} ◦ Mean = 9STA 291 Summer 2010 Lecture 4

 Highly influenced by outliers ◦ Data points that are far from the rest of the data ◦ Example  Monthly income for five people 1,0002,0003,0004,000100,000  Average monthly income =  What is the problem with using the average to describe this data set? 10STA 291 Summer 2010 Lecture 4

 Measurement that falls in the middle of the ordered sample  When the sample size n is odd, there is a middle value ◦ It has the ordered index (n+1)/2  Ordered index is where that value falls when the sample is listed from smallest to largest  An index of 2 means the second smallest value ◦ Example  1.7, 4.6, 5.7, 6.1, 8.3 n=5, (n+1)/2=6/2=3, index = 3 Median = 3 rd smallest observation = STA 291 Summer 2010 Lecture 4

 When the sample size n is even, average the two middle values ◦ Example  3, 5, 6, 9, n=4 (n+1)/2=5/2=2.5, Index = 2.5 Median = midpoint between 2 nd and 3 rd smallest observations = (5+6)/2 = STA 291 Summer 2010 Lecture 4

 For skewed distributions, the median is often a more appropriate measure of central tendency than the mean  The median usually better describes a “typical value” when the sample distribution is highly skewed  Example ◦ Monthly income for five people 1,000 2,000 3,000 4, ,000 ◦ Median monthly income:  Why is the median better to use with this data than the mean? 13STA 291 Summer 2010 Lecture 4

14 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured x i = Measurement of the i th unit Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order STA 291 Summer 2010 Lecture 4

 Example: Highest Degree Completed 15 Highest DegreeFrequencyPercentage Not a high school graduate 38, High school only 65, Some college, no degree 33, Associate, Bachelor, Master, Doctorate, Professional 41, Total 177, STA 291 Summer 2010 Lecture 4

 n = 177,618  (n+1)/2 = 88,809.5  Median = midpoint between the th smallest and th smallest observations ◦ Both are in the category “High school only”  Mean wouldn’t make sense here since the variable is ordinal  Median ◦ Can be used for interval data and for ordinal data ◦ Can not be used for nominal data because the observations can not be ordered on a scale 16STA 291 Summer 2010 Lecture 4

 Mean ◦ Interval data with an approximately symmetric distribution  Median ◦ Interval data ◦ Ordinal data  Mean is sensitive to outliers, median is not 17STA 291 Summer 2010 Lecture 4

 Symmetric distribution ◦ Mean = Median  Skewed distribution ◦ Mean lies more toward the direction which the distribution is skewed 18STA 291 Summer 2010 Lecture 4

 While the median is better than the mean for skewed distributions there is one large disadvantage to using the median ◦ Insensitive to changes within the lower or upper half of the data ◦ Example  1, 2, 3, 4, 5  1, 2, 3, 100, 100 ◦ Sometimes, the mean is more informative even when the distribution is skewed 19STA 291 Summer 2010 Lecture 4

 Keeneland Sales STA 291 Summer 2010 Lecture 420

 The deviation of the i th observation x i from the sample mean is the difference between them, ◦ Sum of all deviations is zero ◦ Therefore, we use either the sum of the absolute deviations or the sum of the squared deviations as a measure of variation 21STA 291 Summer 2010 Lecture 4

 Variance of n observations is the sum of the squared deviations, divided by n-1 22STA 291 Summer 2010 Lecture 4

23 ObservationMeanDeviationSquared Deviation Sum of the Squared Deviations n-1 Sum of the Squared Deviations / (n-1) STA 291 Summer 2010 Lecture 4

 About the average of the squared deviations ◦ “average squared distance from the mean”  Unit ◦ Square of the unit for the original data  Difficult to interpret ◦ Solution  Take the square root of the variance, and the unit is the same as for the original data  Standard Deviation 24STA 291 Summer 2010 Lecture 4

 s ≥ 0 ◦ s = 0 only when all observations are the same  If data is collected for the whole population instead of a sample, then n-1 is replaced by N  s is sensitive to outliers 25STA 291 Summer 2010 Lecture 4

 Sample ◦ Variance ◦ Standard Deviation  Population ◦ Variance ◦ Standard Deviation 26STA 291 Summer 2010 Lecture 4

 Population mean and population standard deviation are denoted by the Greek letters μ (mu) and σ (sigma) ◦ They are unknown constants that we would like to estimate  Sample mean and sample standard deviation are denoted by and s ◦ They are random variables, because their values vary according to the random sample that has been selected 27STA 291 Summer 2010 Lecture 4

 If the data is approximately symmetric and bell-shaped then ◦ About 68% of the observations are within one standard deviation from the mean ◦ About 95% of the observations are within two standard deviations from the mean ◦ About 99.7% of the observations are within three standard deviations from the mean 28STA 291 Summer 2010 Lecture 4

 Scores on a standardized test are scaled so they have a bell-shaped distribution with a mean of 1000 and standard deviation of 150 ◦ About 68% of the scores are between ◦ About 95% of the scores are between ◦ If you have a score above 1300, you are in the top % 29STA 291 Summer 2010 Lecture 4