CHAPTER 2 2 2.1 - Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables.

Slides:



Advertisements
Similar presentations
Quantitative Methods in HPELS 440:210
Advertisements

Describing Quantitative Variables
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Chapter 6 Sampling and Sampling Distributions
CHAPTER Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables.
Basic Statistical Concepts
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Chapter 3: Descriptive Measures STP 226: Elements of Statistics Jenifer Boshes Arizona State University.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-1.
Basic Business Statistics 10th Edition
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Sampling Distributions
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
1 Chapter 4: Variability. 2 Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure.
Measures of Variability: Range, Variance, and Standard Deviation
Measures of Central Tendency
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Describing distributions with numbers
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 2 NUMERICAL DATA REPRESENTATION.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
Continuous Probability Distributions Continuous random variable –Values from interval of numbers –Absence of gaps Continuous probability distribution –Distribution.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
1.1 - Populations, Samples and Processes Pictorial and Tabular Methods in Descriptive Statistics Measures of Location Measures of Variability.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
1 2.4 Describing Distributions Numerically – cont. Describing Symmetric Data.
Chapter 2 Describing Data.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Describing distributions with numbers
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Descriptive Statistics: Presenting and Describing Data.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Chapter 4 Numerical Methods for Describing Data.
Chapter 2 Descriptive Statistics
Introduction to statistics I Sophia King Rm. P24 HWB
Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics.
Numerical descriptions of distributions
Slide 3-1 Copyright © 2008 Pearson Education, Inc. Chapter 3 Descriptive Measures.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Chapter 3 EXPLORATION DATA ANALYSIS 3.1 GRAPHICAL DISPLAY OF DATA 3.2 MEASURES OF CENTRAL TENDENCY 3.3 MEASURES OF DISPERSION.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Chapter 6 Sampling and Sampling Distributions
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 3 Section 3 Measures of variation. Measures of Variation Example 3 – 18 Suppose we wish to test two experimental brands of outdoor paint to see.
1 - Introduction 2 - Exploratory Data Analysis 3 - Probability Theory 4 - Classical Probability Distributions 5 - Sampling Distrbns / Central Limit Theorem.
Virtual University of Pakistan
Exploratory Data Analysis
Chapter 1 Overview and Descriptive Statistics
Reasoning in Psychology Using Statistics
Descriptive Statistics
MEASURES OF CENTRAL TENDENCY
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Example: Sample exam scores, n = 20 (“sample size”) {60, 60, 70, 70, 70, 70, 70, 70, 70, 70, 80, 80, 80, 80, 90, 90, 90, 90, 90, 90} Because there are.
Presentation transcript:

CHAPTER Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables (Numerical vs. Categorical) 2 2.2, Exploratory Data Analysis  G raphical Displays  D escriptive Statistics M Measures of Center (mode, median, mean) easures of Spread (range, variance, standard deviation)

“Classical Scientific Method” Hypothesis – Define the study population... What’s the question? Experiment – Designed to test hypothesis Observations – Collect sample measurements Analysis – Do the data formally tend to support or refute the hypothesis, and with what strength? (Lots of juicy formulas...) Conclusion – Reject or retain hypothesis; is the result statistically significant? Interpretation – Translate findings in context! Statistics is implemented in each step of the classical scientific method! 2

Analysis – Do the data formally tend to support or refute the hypothesis, and with what strength? (Lots of juicy formulas...) To help answer this question, we should first try to obtain an informal “feel” for the sample data we have collected, and see if it suggests anything about the population distribution. ~ Exploratory Data Analysis ~ 1.Visual Displays (charts, tables, graphs, etc.) “What do the data look like?” 2.“Descriptive Statistics” (measures of center, measures of spread, proportions, etc.) “How can the data be summarized?” 3

Example: Suppose the random variable is X = Age (years) in a certain population of individuals, and we select the following random sample of n = 20 ages. In published journal articles, the original data are almost never shown, but displayed in tabular form as above. This summary is called “grouped data.” 4 values 8 values5 values 2 values 1 value From these values, we can construct a table which consists of the frequencies of each age-interval in the dataset, i.e., a frequency table. {18, 19, 19, 19, 20, 21, 21, 23, 24, 24, 26, 27, 31, 35, 35, 37, 38, 42, 46, 59}{18, 19, 19, 19, 20, 21, 21, 23, 24, 24, 26, 27, 31, 35, 35, 37, 38, 42, 46, 59} Frequency Histogram Suggests population may be skewed to the right (i.e., positively skewed). Class IntervalFrequency [10, 20)4 [20, 30)8 [30, 40)5 [40, 50)2 [50, 60)1 Totaln = 20 “Endpoint convention” Here, the left endpoint is included, but not the right. Note!... Stay away from “10-20,” “20-30,” “30-40,” etc. 4

Class IntervalFrequency [10, 20)4 [20, 30)8 [30, 40)5 [40, 50)2 [50, 60)1 Totaln = 20 Relative Frequency 4/20 = /20 = /20 = /20 = /20 = /20 = 1.00 Example: Suppose the random variable is X = Age (years) in a certain population of individuals, and we select the following random sample of n = 20 ages. {18, 19, 19, 19, 20, 21, 21, 23, 24, 24, 26, 27, 31, 35, 35, 37, 38, 42, 46, 59} Often though, it is preferable to work with proportions, i.e., relative frequencies… Divide frequencies by n = 20. ↓ Relative frequencies are always between 0 and 1, and sum to 1. Relative Frequency Histogram

Class IntervalFrequency [10, 20)4 [20, 30)8 [30, 40)5 [40, 50)2 [50, 60)1 Totaln = 20 Relative Frequency 4/20 = /20 = /20 = /20 = /20 = /20 = 1.00 Example: Suppose the random variable is X = Age (years) in a certain population of individuals, and we select the following random sample of n = 20 ages. {18, 19, 19, 19, 20, 21, 21, 23, 24, 24, 26, 27, 31, 35, 35, 37, 38, 42, 46, 59} Often though, it is preferable to work with proportions, i.e., relative frequencies… Divide frequencies by n = 20. ↓ Relative frequencies are always between 0 and 1, and sum to 1. Relative Frequency Histogram “0.20 of the sample is under 20 yrs old” “0.60 of the sample is under 30 yrs old” “0.85 of the sample is under 40 yrs old” “0.95 of the sample is under 50 yrs old” “1.00 of the sample is under 60 yrs old” “0.00 of the sample is under 10 yrs old” Cumulative (0.00)

Example: Exactly what proportion of the sample is under 34 years old? Approximately Class IntervalFrequency [10, 20)4 [20, 30)8 [30, 40)5 [40, 50)2 [50, 60)1 Totaln = 20 Relative Frequency 4/20 = /20 = /20 = /20 = /20 = /20 = 1.00 Example: Suppose the random variable is X = Age (years) in a certain population of individuals, and we select the following random sample of n = 20 ages. {18, 19, 19, 19, 20, 21, 21, 23, 24, 24, 26, 27, 31, 35, 35, 37, 38, 42, 46, 59} Often though, it is preferable to work with proportions, i.e., relative frequencies… Divide frequencies by n = 20. ↓ Relative frequencies are always between 0 and 1, and sum to 1. Relative Frequency Histogram Cumulative (0.00) Cumulative relative frequencies always increase from 0 to 1. Solution: [30, 34) contains 4/10 of 0.25 = 0.1, [0, 30) contains 0.6, 0.7 sum = 0.7

Class IntervalFrequency [10, 20)4 [20, 30)8 [30, 40)5 [40, 50)2 [50, 60)1 Totaln = 20 Relative Frequency 4/20 = /20 = /20 = /20 = /20 = /20 = 1.00 Example: Suppose the random variable is X = Age (years) in a certain population of individuals, and we select the following random sample of n = 20 ages. {18, 19, 19, 19, 20, 21, 21, 23, 24, 24, 26, 27, 31, 35, 35, 37, 38, 42, 46, 59} Often though, it is preferable to work with proportions, i.e., relative frequencies… Divide frequencies by n = 20. ↓ Relative frequencies are always between 0 and 1, and sum to 1. Relative Frequency Histogram Cumulative (0.00) Cumulative relative frequencies always increase from 0 to 1. Solution: [30, 34) contains 4/10 of 0.25 = 0.1, [0, 30) contains 0.6, 0.7 sum = 0.7 Example: Approximately what proportion of the sample is under 34 years old?Exactly But alas, there is a major problem….

Relative Frequency Histogram Suppose that, for the purpose of the study, we are not primarily concerned with those 30 or older, and wish to “lump” them into a single class interval. {18, 19, 19, 19, 20, 21, 21, 23, 24, 24, 26, 27, What effect will this have on the histogram? Class IntervalFrequency [10, 20)4 [20, 30)8 [30, 40)5 [40, 50)2 [50, 60)1 Totaln = 20 Relative Frequency 4/20 = /20 = /20 = /20 = /20 = /20 = values 8 values 31, 35, 35, 37, 38, 42, 46, 59} Class Interval [10, 20) [20, 30) [30, 60) Total Relative Frequency 4/20 = /20 = /20 = The skew no longer appears. The histogram is distorted because of the presence of an outlier (59) in the data, creating the need for unequal class widths. 8 values

What are they? Informally, an outlier is a sample data value that is either “much” smaller or larger than the other values. How do they arise? o experimental error o measurement error o recording error o not an error; genuine What can we do about them? o double-check them if possible o delete them? o include them… somehow o perform analysis both ways (A Pain in the Tuches) 10

IDEA: Instead of having height of each class rectangle = relative frequency, make... area of each class rectangle = relative frequency. Class Interval Relative Frequency [10, 20) 0.20 [20, 30) 0.40 [30, 60) 0.40 Total20/20 = 1.00 The outlier is included, and the overall skewed appearance is restored. Exercise: What if the outlier was 99 instead of 59? Density (= height) 0.20/10 = /10 = /30 = height“Density” = relative frequency × width/ width = 10 width = 30 Density Histogram … Total Area = 1! 11

Analysis – Do the data formally tend to support or refute the hypothesis, and with what strength? (Lots of juicy formulas...) To help answer this question, we should first try to obtain an informal “feel” for the sample data we have collected, and see if it suggests anything about the population distribution. ~ Exploratory Data Analysis ~ 1.Visual Displays (charts, tables, graphs, etc.) “What do the data look like?” 2.“Descriptive Statistics” (measures of center, measures of spread, proportions, etc.) “How can the data be summarized?” 12

CHAPTER Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables (Numerical vs. Categorical) 2 2.2, Exploratory Data Analysis  G raphical Displays  D escriptive Statistics M Measures of Center (mode, median, mean) easures of Spread (range, variance, standard deviation)

“Measures of ” Example: Sample exam scores = {70, 80, 80, 80, 80, 90, 90, 100, 100, 100} sample mode most frequent value = 80 sample median “middle” value = ( ) / 2 = 85 sample mean average value = 14 Data values x i Frequencies f i Totaln = 10 i = 1 i = 2 i = 3 i = 4 (70)(1) + (80)(4) + (90)(2) + (100)(3) x =  x i f i = 87 (Quartiles are found similarly: Q 1 =, Q 2 = 85, Q 3 = )80100 Center 1/10

sample mode most frequent value = 80 sample median “middle” value = ( ) / 2 = 85 sample mean average value = “Measures of Center” Example: Sample exam scores = {70, 80, 80, 80, 80, 90, 90, 100, 100, 100} 15 Data values x i Frequencies f i Totaln = 10 (70)(1) + (80)(4) + (90)(2) + (100)(3)1/10 = 87 x =  x i f i

sample mean “Measures of Center” Example: Sample exam scores = {70, 80, 80, 80, 80, 90, 90, 100, 100, 100} 16 Data values x i Frequencies f i Totaln = 10 Relative Frequencies f (x i ) = f i /n 1/10 = 0.1 4/10 = 0.4 2/10 = 0.2 3/10 = /10 = 1.0 (70)(1) + (80)(4) + (90)(2) + (100)(3)1/10 x =  x i f (x i ) “Notation, notation, notation.” (70)(1) + (80)(4) + (90)(2) + (100)(3) =1/10 87 x =  x i f i x = 87

sample mean 17 Data values x i Frequencies f i Totaln = 10 … but how do we measure the “spread” of a set of values? First attempt: sample range = x n – x 1 = 100 – 70 = 30. Simple, but… Spread “Measures of ” Example: Sample exam scores = {70, 80, 80, 80, 80, 90, 90, 100, 100, 100} x = 87 Ignores all of the data except the extreme points, thus far too sensitive to outliers to be of any practical value. Example: Company employee salaries, including CEO Can modify with… sample interquartile range (IQR) = Q 3 – Q 1 = 100 – 80 = 20. We would still prefer a measure that uses all of the data.

Deviations from mean x i – x 70 – 87 = –17 80 – 87 = –7 90 – 87 = – 87 = +13 sample mean 18 Data values x i Frequencies f i Totaln = 10 … but how do we measure the “spread” of a set of values? Better attempt: Calculate the average of the “deviations from the mean.” 1/10 [ (–17)(1) + (–7)(4) + (3)(2) + (13)(3) ] = 0. ???????? This is not a coincidence – the deviations always sum to 0* – so it is not a good measure of variability. Spread “Measures of ” Example: Sample exam scores = {70, 80, 80, 80, 80, 90, 90, 100, 100, 100}  (x i – x) f i = * Physically, the sample mean is a “balance point” for the data. x = 87

Deviations from mean x i – x 70 – 87 = –17 80 – 87 = –7 90 – 87 = – 87 = +13 sample mean 19 Data values x i Frequencies f i Totaln = 10  (x i – x) 2 f i [ (–17) 2 (1) + (–7) 2 (4) + (3) 2 (2) + (13) 2 (3) ] Calculate the “Measures of Spread” Example: Sample exam scores = {70, 80, 80, 80, 80, 90, 90, 100, 100, 100} s 2 = sample variance sample standard deviation s = 1/9 = average of the “squared deviations from the mean.” x = 87 s = a modified

Comments is an unbiased estimator of the population mean , s 2 is an unbiased estimator of the population variance  2. (Their “expected values” are  and  2, respectively.) Beware of roundoff error!!! There is an alternate, more computationally stable formula for sample variance s 2. The numerator of s 2 is called a sum of squares (SS); the denominator “n – 1” is the number of degrees of freedom (df) of the n deviations x i –, because they must satisfy a constraint (sum = 0), hence 1 degree of freedom is “lost.” A natural setting for these formulas and concepts is geometric, specifically, the Pythagorean Theorem: a 2 + b 2 = c 2. See lecture notes appendix… 20 a c b