Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)

Slides:



Advertisements
Similar presentations
Quantitative Methods in HPELS 440:210
Advertisements

Measures of Dispersion
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
Calculating & Reporting Healthcare Statistics
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-1.
Intro to Descriptive Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Measures of Central Tendency
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
Chapter 3 Descriptive Measures
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Statistics for the Behavioral Sciences Second Edition Chapter 4: Central Tendency and Variability iClicker Questions Copyright © 2012 by Worth Publishers.
Describing distributions with numbers
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
MGQ 201 WEEK 4 VICTORIA LOJACONO. Help Me Solve This Tool.
Describing Data from One Variable
Chapter 3 – Descriptive Statistics
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Statistics Workshop Tutorial 3
Measures of Central Tendency or Measures of Location or Measures of Averages.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Dispersion
Statistics Measures Chapter 15 Sections
Chapter 2 Describing Data.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Descriptive Statistics: Presenting and Describing Data.
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Introduction to Statistics Santosh Kumar Director (iCISA)
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Measures of Center. 2 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
LIS 570 Summarising and presenting data - Univariate analysis.
Unit 2 Section 2.3. What we will be able to do throughout this part of the chapter…  Use statistical methods to summarize data  The most familiar method.
CHAPTER 2: Basic Summary Statistics
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Summation Notation, Percentiles and Measures of Central Tendency Overheads 3.
Descriptive Statistics(Summary and Variability measures)
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Making Sense of Statistics: A Conceptual Overview Sixth Edition PowerPoints by Pamela Pitman Brown, PhD, CPG Fred Pyrczak Pyrczak Publishing.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Descriptive Statistics: Presenting and Describing Data
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Numerical Descriptive Measures
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
CHAPTER 2: Basic Summary Statistics
Lecture 4 Psyc 300A.
Numerical Descriptive Measures
Central Tendency & Variability
Presentation transcript:

Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e) Measures of Central Tendency And Dispersion

Measures of Central Tendency 1. Mode = can be used for any kind of data but only measure of central tendency for nominal or qualitative data. Formula: value that occurs most often or the category or interval with highest frequency. Note: Omit Formula 3.1 Variation Ratio in Healey and Prus 2nd Cdn.

Example for Nominal Variables: Religion frequency cf proportion % Cum% Catholic 17 17 .41 41 41 Protestant 4 21 .10 10 51 Jewish 2 23 .05 5 56 Muslim 1 24 .02 2 58 Other 9 33 .22 9 80 None 8 41 .20 20 100 Total 41 1.00 100% Central Tendency: MODE = largest category = Catholic

Central Tendency (cont.) 2. Median = exact centre or middle of ordered data. The 50th percentile. Formula: Array data. When sample even #, median falls halfway between two middle numbers. To calculate: find(n/2)and (n/2)+1, and divide the total by 2 to find the exact median. When sample is odd #, median is exact middle (n+1) /2)

Example for Raw Data: Suppose you have the following set of test scores: 66, 89, 41, 98, 76, 77, 69, 60, 60, 66, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69 1. Array data: 98 98 95 89 89 77 76 74 69 69 69 66 66 66 66 66 60 60 52 41 N = 20 (N is even)

To calculate: - find middle numbers(n/2)+(n/2 )+1 - add together the two middle numbers - divide the total by 2 First middle number: (20/2) = the 10th number 2nd middle number: (20/2)+1 = the 11th Look at data: the middle numbers are 69 and 69 The median would be (69+69)/2 = 69

Median for Aggregate (grouped) Data This formula is shown in Healey 1st Cdn Edition but NOT in 2/3 Cdn We will NOT COVER this one!

Properties of median: - for numerical data at interval or ordinal level -"balance point“ -not affected by outliers -median is appropriate when distribution is highly skewed.

3. Mean for Raw Data The mean is the sum of measurements / number of subjects Formula: (X-bar) = ΣXi / N Data (from above): 66, 89, 41, 98, 76, 77, 69, 60, 60, 66, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69

Example for Mean Formula: = ΣXi / N = 1446 / 20 = 72.3 The mean for these test scores is 72.3

Mean for Aggregate (Grouped) Data (Note: not in text but covered in class) To calculate the mean for grouped data, you need a frequency table that includes a column for the midpoints, for the product of the frequencies times the midpoints (fm). Formula: = Σ (fm) N

Frequency table: Score f m* (fm) 41-50 1 45.5 45.5 51-60 3 55.5 166.5 41-50 1 45.5 45.5 51-60 3 55.5 166.5 61-70 8 65.5 524 71-80 3 75.5 226.5 81-90 2 85.5 171 91-100 3 95.5 286.5 N = 20 Σ (fm) = 1420 * Find midpoints first

Calculating Mean for Grouped Data: Formula: = Σ (fm) N = 1420 / 20 = 71 The mean for the grouped data is 71.

Properties of the Mean: - only for numerical data at interval level - "balance point“ - can be affected by outliers = skewed distribution - tail becomes elongated and the mean is pulled in direction of outlier. Example… no outlier: $30000, 30000, 35000, 25000, 30000 then mean = $30000 but if outlier is present, then: $130000, 30000, 35000, 25000, 30000 then mean = $50000 (the mean is pulled up or down in the direction of the outlier)

NOTE: When distribution is symmetric, mean = median = mode For skewed, mean will lie in direction of skew. i.e. skewed to right, mean > median (positive skew) skewed to left, median > mean (negative skew)

Measures of Dispersion Describe how variable the data are. i.e. how spread out around the mean Also called measures of variation or variability

Variability for Non-numerical Data (Nominal or Ordinal Level Data) Measures of variability for non-numerical nominal or ordinal) data are rarely used We will not be covering these in class Omit Formula 4.1 IQV in Healey and Prus 1st Canadian Edition Omit Formula 3.1 Variation Ratio in Healey and Prus 2/3 Canadian Edition

2. Range (for numerical data) Range = difference between largest and smallest observations i.e. if data are $130000, 35000, 30000, 30000, 30000, 30000, 25000, 25000 then range = 130000 - 25000 = $105000

Interquartile Range (Q): This is the difference between the 75th and the 25th percentiles (the middle 50%) Gives better idea than range of what the middle of the distribution looks like. Formula: Q = Q3 - Q1 (where Q3 = N x .75, and Q1 = N x .25) Using above data: Q = Q3 - Q1 = (6th – 2nd case) = $30000-25000 =$5000 The interquartile range (Q) is $5000.

3. Variance and Standard Deviation: For raw data at the interval/ratio level. Most common measure of variation. The numerator in the formula is known as the sum of squares, and the denominator is either the population size N or the sample size n-1 The variance is denoted by S2 and the standard deviation, which is the square root of the variance, by S

Definitional Formula for Variance and Standard Deviation: Variance: s2 = Σ (xi - )2 / N S.D.: s = A working formula (the one you use) for s.d is: 1 N ∑ Xi2 - ( ∑ Xi ) 2 N

Example for S and S2 : Data: 66, 89, 41, 98, 76, 77, 69, 60, 60, 66, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69 Find ∑ Xi2 : Square each Xi and find total. Find (∑ Xi)2 : Find total of all Xi and square. Substitute above and N into formula for S. For S2 , simply square S. S = 14.76 S2 = 217.91

Another working formula for the standard deviation: Note that the definitional formula for s.d. is not practical for use with data when N>10. The working formulae should be used instead. All three formulae give exactly the same result.

Properties of S: always greater than or equal to 0 the greater the variation about mean, the greater S is n-1 (corrects for bias when using sample data.) S tends to underestimate the population s.d. so to correct for this, we use n-1. The larger the sample size, the smaller difference this correction makes. When calculating the s.d. for the whole population, use N in the denominator.

NOTE: σ, N and Mu (µ) denote population parameters s, n, x-bar ( ) denote sample statistics

Remember the Rounding Rules! Always use as many decimal places as your calculator can handle. Round your final answer to 2 decimal places, rounding to nearest number. Engineers Rule: When last digit is exactly 5 (followed by 0’s), round the digit before the last digit to nearest EVEN number.

Homework Questions Healey and Prus 1e: #3.1, #3.5, #3.11 (compute s for 8 nations also), #3.15 SPSS: Read the SPSS sections for Ch. 3 and 4 in 1st Cdn. Edition and for Ch. 4 in 2/3 Cdn. Edition Try some of the SPSS exercises for practice