Chapter 9 Statistics.

Slides:



Advertisements
Similar presentations
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Advertisements

Agricultural and Biological Statistics
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
Calculating & Reporting Healthcare Statistics
Descriptive Statistics Chapter 3 Numerical Scales Nominal scale-Uses numbers for identification (student ID numbers) Ordinal scale- Uses numbers for.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Introductory Mathematics & Statistics
Central Tendency and Variability Chapter 4. Central Tendency >Mean: arithmetic average Add up all scores, divide by number of scores >Median: middle score.
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
MEASURES of CENTRAL TENDENCY.
Numerical Measures of Central Tendency. Central Tendency Measures of central tendency are used to display the idea of centralness for a data set. Most.
Summarizing Scores With Measures of Central Tendency
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
Topic 1: Descriptive Statistics CEE 11 Spring 2001 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering.
Statistics 1 Measures of central tendency and measures of spread.
BUS250 Seminar 4. Mean: the arithmetic average of a set of data or sum of the values divided by the number of values. Median: the middle value of a data.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
Descriptive Statistics: Numerical Methods
Statistics Measures Chapter 15 Sections
Descriptive Statistics
STATISTICS. Statistics * Statistics is the area of science that deals with collection, organization, analysis, and interpretation of data. * A collection.
Chapter 4 – 1 Chapter 4: Measures of Central Tendency What is a measure of central tendency? Measures of Central Tendency –Mode –Median –Mean Shape of.
Chapter 3 Central Tendency and Variability. Characterizing Distributions - Central Tendency Most people know these as “averages” scores near the center.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Descriptive Statistics: Presenting and Describing Data.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Chapter 11 Review Important Terms, Symbols, Concepts Sect Graphing Data Bar graphs, broken-line graphs,
FARAH ADIBAH ADNAN ENGINEERING MATHEMATICS INSTITUTE (IMK) C HAPTER 1 B ASIC S TATISTICS.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
LIS 570 Summarising and presenting data - Univariate analysis.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Chapter 2 Describing and Presenting a Distribution of Scores.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Summation Notation, Percentiles and Measures of Central Tendency Overheads 3.
STROUD Worked examples and exercises are in the text Programme 28: Data handling and statistics DATA HANDLING AND STATISTICS PROGRAMME 28.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Data Descriptions.
Exploratory Data Analysis
Topic 3: Measures of central tendency, dispersion and shape
Summarizing Scores With Measures of Central Tendency
Descriptive Statistics: Presenting and Describing Data
Descriptive Statistics
Description of Data (Summary and Variability measures)
MEASURES OF CENTRAL TENDENCY
Descriptive Statistics
Numerical Descriptive Measures
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Lecture 4 Psyc 300A.
Presentation transcript:

Chapter 9 Statistics

Frequency Distributions; Measures of Central Tendency Three types of frequency distributions: Categorical – primarily for nominal, ordinal level data (FYI) Grouped – range of data is large Ungrouped – range of data is small, single data values for each class (FYI)

Frequency Distributions; Measures of Central Tendency Grouped Frequency Distributions Step 1: Order data from smallest to largest Step 2: Determine the number of classes (e.g. class intervals) using Sturges’ Rule k=1+3.322(log10n) where n is the number of observations (data values). *Always round up Class intervals are contiguous, nonoverlapping intervals selected in such a way that they are mutually exclusive and exhaustive. That is, each and every value in the set of data can be placed in one, and only one, of the intervals.

Frequency Distributions; Measures of Central Tendency Grouped Frequency Distributions Step 3: Determine width of class intervals Width (W) = Range (R) k where Range= largest value-smallest value k represents Sturges’ Rule

Frequency Distributions; Measures of Central Tendency Grouped Frequency Distributions Step 4: Assign observations to class intervals The count in each class interval represents the frequency for that interval. The smallest observation serves as the first lower class limit (LCL). Add the ‘width minus one’ to the LCL to get UCL (upper class limit) NOTE: Technically, class limits (i.e., 0-5, 6-11, 12-17 and so on) are not adjacent. However, class boundaries account for the space between the class limit intervals (i.e., 0.5 – 5.5, 5.5-11.5, 11.5-17.5 and so on). Boundaries are written for convenience but understood to mean all values up to but not including the upper boundary.

Frequency Distributions; Measures of Central Tendency Grouped Frequency Distributions Step 5: Calculate cumulative & relative frequencies Cumulative Frequency-Add number of observations from the first interval through the preceding interval, inclusive. Relative Frequency – Divide number of observations in each class interval by the total number of observations Cumulative Relative Frequency-Same calculation as cum-ulative frequency, but using the relative frequencies A Frequency Distribution Table Class Int. Freq. Cum. Freq. Rel. Freq. Cum. Rel. Freq. LCL - UCL

Frequency Distributions; Measures of Central Tendency Measures of Central Tendency – the value(s) the data tends to center around Arithmetic mean (average) Mode Median

Frequency Distributions; Measures of Central Tendency Arithmetic mean (sample mean or sample average) --“x-bar” Ungrouped data (individual data such as 5, 6, 10, 14, etc. _ x =  xi n x = x1 + x2 + x3 +… + xn where xi is each data value (observation) in the data set. where n is the number of observations in the data set

Frequency Distributions; Measures of Central Tendency Calculate the sample mean for ungrouped data: Step 1: add all values in a data set Step 2: divide the total by the number of values summed.

Frequency Distributions; Measures of Central Tendency Example 7.0 6.2 7.7 8.0 6.4 6.2 7.2 5.4 6.4 6.5 7.2 5.4 n = 12 *This is ungrouped data _ x = 7.0+6.2+7.7+8.0+6.4+6.2+7.2+5.4+6.4+6.5+7.2+5.4 12 = 79.6 = 6.63

Frequency Distributions; Measures of Central Tendency Grouped data (assumes each value (observation) falling within a given class interval is equal to the value of the midpoint of that interval _ x =  fi  xi n where xi represents each class interval midpoint (class mark)* *an easy way to determine the class mark is to simply add the upper class limit (boundary) to the lower class limit (boundary) then divide by 2.

Frequency Distributions; Measures of Central Tendency Calculate the sample mean for grouped data: Step 1: multiply each class mark by its corresponding frequency Step 2: add the resulting products Step 3: divide the total by the number of observations

Frequency Distributions; Measures of Central Tendency Example Class Limits Frequency Class Mark xI  fI 90 – 98 6 (see note below) 94 564 99-107 22 103 2266 108-116 43 112 4816 117-125 28 121 3388 126-134 9 130 1170 108 12204 _ x = 12204 = 113 108 Note: Where did the number 6 come from? There are 6 data values (observations) in the data set that fall between the range 90-98 (inclusive)

Frequency Distributions; Measures of Central Tendency Mode – value that occurs most frequently Ungrouped data Step 1: identify the data value that occurs most frequently Bi-modal -two values occurring at the same frequency No mode – all values different (not same as mode=0) Grouped data Step 1: specify the modal class (i.e., the class interval containing the largest number of observations

Frequency Distributions; Measures of Central Tendency For ungrouped data <mode> 7.0 6.2 7.7 8.0 6.4 6.2 7.2 5.4 6.4 6.5 7.2 5.4 There are four numbers that appear two times each: 5.4 6.2 6.4 7.2 Therefore there are four modes. The data set is quad-modal

Frequency Distributions; Measures of Central Tendency For grouped data <modal class> The modal class: 108-116 or 3rd class (The class with the largest number of data values)

Frequency Distributions; Measures of Central Tendency Median – The value above which half the values in a data set lie and below which the other half lie. (The middle value) Ungrouped Data Step 1: arrange the values in order of magnitude (smallest to largest) Step 2: locate the middle value

Frequency Distributions; Measures of Central Tendency For ungrouped data <median> 5.4 5.4 6.2 6.2 6.4 6.4 6.5 7.0 7.2 7.2 7.7 8.0 Even number of values therefore we must get an average of the middle two values 6.4 + 6.5 = 6.45 2

Measures of Variation (Dispersion) Range (R) (for ungrouped data only) Ungrouped data Step 1: Take the difference between the largest and smallest values in a data set. For example, a data set such as 5, 6, 10, 14 has a range of 9 because 14 (the largest value) minus 5 (the smallest value) is 9.

Measures of Variation (Dispersion) Deviations from the Mean Differences found by subtracting the mean from each number in a sample Given 3, 5, 2, 6 The mean ( ) is 4 The deviations from the mean would be -1, 1, -2, 2

Measures of Variation (Dispersion) Variance (s2) - an average of the squares of the deviations of the individual values from their mean. Ungrouped data s2 =  (xi – )2 n-1

Measures of Variation (Dispersion) Standard deviation (s) Step 1: Calculate the sample standard deviation for grouped or ungrouped data by: taking the square root of the variance

Measures of Variation (Dispersion) Example 8 6 3 0 0 5 9 2 1 3 7 10 0 3 6 _ *This is ungrouped data x = 4.2 n = 15 (a) Range (R) = 10 – 0 = 10 (b) variance (s2) = (8-4.2)2 + (6-4.2)2 + (3-4.2)2 + (0-4.2)2 + (0-4.2)2 + (5-4.2)2 + (9-4.2)2 + (2-4.2)2 + (1-4.2)2 + (3-4.2)2 + (7-4.2)2 + (10-4.2)2 + (0-4.2)2 +(3-4.2)2 + (6-4.2)2 _________ 15-1 = 158.40__ 14 = 11.31 (c) standard deviation (s) = the square root of 11.31 = 3.36

Measures of Variation (Dispersion) Grouped data s2 = n ( xi2  fi) - (xi  fi)2 n(n-1) where xi represents each class boundary (or limit) midpoint (class mark)* where fi represents each class frequency *an easy way to determine the class mark is to simply add the upper class limit (boundary) to the lower class limit (boundary) then divide by 2.

Measures of Variation (Dispersion) Calculate the sample variance for grouped data: Step 1: multiply each squared class mark by its corresponding frequency Step 2: add the resulting products Step 3: multiply the sum by n [A] Step 4: multiply each class mark by its corresponding frequency Step 5: add the resulting products Step 6 :square the sum [B] Step 7: perform subtraction [C] = [A] – [B] Step 8: divide [C] by n(n-1)

Measures of Variation (Dispersion) Example Class limits freq(fi) xi xifi xi2fi 90 – 98 6 94 564 (946) 53,016 [(942)6] 99-107 22 103 2266 233,398 108-116 43 112 4816 539,392 117-125 28 121 3388 409,948 126-134 9 130 1170 152,100 108 12204 1,387,854

Measures of Variation (Dispersion) Refer to the formula for variance of grouped data below and see if you can fill in the formula using values from the table on the previous slide. s2 = n ( xi2  fi) - (xi  fi)2 n(n-1)

Measures of Variation (Dispersion) 108(107) = 149,888,232.0 - 148,937,616.0 11,556 = 950,616 = 82.26 Therefore s = 9.07

The Normal Distribution Also known as the “bell-shaped” curve Some statisticians say it is the most important distribution in statistics Most popular distribution in statistics

The Normal Distribution The normal density function is given by where ∏≈ 3.142 and ex ≈ 2.718

The Normal Distribution Properties of the Normal Distribution - symmetrical about mean; - mean = median = mode - area under the curve = 1 - each different and specifies different normal distribution, thus the normal distribution is really a family of distributions - a very important member of the family is the standard normal distribution

The Normal Distribution The Standard Normal Distribution has mean (μ) = 0 has standard deviation (σ) = 1 the normal density function reduces to

The Normal Distribution The probability that z lies between any two points on the z-axis is determined by the area bounded by perpendiculars erected at each of the points, the curve, and the horizontal axis. P(a <z< b)

The Normal Distribution Generally we find the area under the curve for a continuous distribution via calculus by integrating the function between a & b. dz

The Normal Distribution However, we don't have to integrate because we have a table that has calculated this area See TABLE 1 of Appendix A-2

The Normal Distribution Exercises 6-3 #7 p. 282 Find the area under the normal distribution curve between z = 0 and z = 0.56 So, we want P (0 < z < 0.56) From the standard normal table we find that P (0 < z < 0.56) = 0.2123 where a = 0 and b = 0.56

The Normal Distribution Exercises 6-3 #16 p. 283 Find the area under the normal distribution curve between z = -0.87 and z = -0.21 So we want P(-0.87 < z < -0.21) a b 0 where a = -0.87 and b =-0.21

The Normal Distribution Exercises 6-3 #16 p. 283 con’t The table gives a probability of 0.3078 at z = 0.87 (note area same for negative or positive z since distribution is symmetrical). This area covers values of z from 0 out to -.87. Since we don’t want that entire area we subtract the area from 0 out to -.21. That is , we subtract .0832 which is the area under the curve at z = 0.21 So 0.3078 – 0.0832 = 0.2246

The Normal Distribution Exercises 6-3 #25 p. 283 Find the area under the normal distribution curve to the right of z = 1.92 and to the left of z = -0.44 So we want P(z >1.92)  P(z < -0.44) = 0.3574 where a = -0.44 and b = 1.92 a 0 b

The Normal Distribution Exercises 6-3 #25 p. 283 Con’t Since the area at z = .44 is 0.1700 which is the area under the curve from 0 out to 0.44, the remaining area of interest has to be 0.5 – 0.1700 = 0.3300. AND Since the area at z = 1.92 is .4726 which is the area under the curve from 0 out to 1.92, the remaining area of interest has to be 0.5 – 0.4726 = 0.0274. So the combined areas of interest are 0.3300 + 0.0274 = 0.3574

The Normal Distribution Exercises 6-3 #45 z = ? Given that the shaded area is 0.8962, what would be the value of z? z has to be equal to -1.26. Since the area from 0 out to z is equal to 0.3962 (0.8962 - 0.5000) Recall that one-half of the area under the curve is .5. If we look in the body of the standard normal table for an area of 0.3962 we find that value at the intersection of the 13th row and 7th column which corresponds to a z value of 1.26. Since z is located to the left of 0 it has to be negative, hence – 1.26. 0.8962 z 0

The Normal Distribution Section 6-4 Applications of the Normal Distribution To solve problems for a normally distributed variable with a   0 or   1 we MUST transform the variable to a standard normal variable, that is P(x1 < X < x2) becomes P(z1 < Z < z2) which allows us to use the standard normal table. Using z = value – mean = x -  standard dev. 

The Normal Distribution Example A survey found that people keep their television sets an average of 4.8 years. The standard deviation is 0.89 year. If a person decides to buy a new TV set, find the probability that he or she has owned the set for the following amount of time. Assume the variable is normally distributed. Less than 2.5 years Between 3 and 4 years More than 4.2 years  = 4.8  = 0.89 (a) P(x < 2.5) becomes P(z<-2.58) because z = (2.5 – 4.8)/ 0.89 = -2.58 The area under the curve at Z=2.58 is 0.4951 therefore the P(z<-2.58) = 0.5 – 0.4951 = 0.0049 -2.58 0

The Normal Distribution (b) P(3 < X < 4) becomes P(-2.02 < z < -0.9) because z = (3-4.8)/ .89 = -2.02 and z=(4-4.8)/.89 = -0.90 from the standard normal table at a z of 2.02 we get .4783 and at a z of .9 we get .3159 so the P(-2.02 < z < -0.9) = .4783 - .3159 = .1624 -2.02 -.9 0

The Normal Distribution (c) P (x > 4.2) becomes P(z > -0.67) because z = (4.2-4.8)/.89 = -0.67 from the standard normal table at z of .67 we get .2486 so the P(z > -0.67) = 0.2486 + 0.5 = 0.7486 -.67 0

The Normal Distribution Review Exercises #9 Area (%age) = .5 = 100  = 15 We can find the X values that correspond to the z values by using the same transformation equation. -0.67 = (x – 100)/15 and 0.67 = (x -100)/15 15(-.67) = x – 100 15(.67) = x - 100 x = 89.95 x = 110.05 therefore the highest and lowest scores are in the range (89.95 < x < 110.05) -.67 0 .67