Presentation is loading. Please wait.

Presentation is loading. Please wait.

EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

Similar presentations


Presentation on theme: "EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population."— Presentation transcript:

1 EXPECTATION, VARIANCE ETC. - APPLICATION 1

2 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population characteristics: –Central location –Variability or spread

3 3 With one data point clearly the central location is at the point itself. Measures of Central Location The measure of central location reflects the locations of all the data points. How? But if the third data point appears on the left hand-side of the midrange, it should “pull” the central location to the left. With two data points, the central location should fall in the middle between them (in order to reflect the location of both of them).

4 4 Sum of the observations Number of observations Mean = This is the most popular measure of central location The Arithmetic Mean

5 5 Sample meanPopulation mean Sample sizePopulation size The Arithmetic Mean

6 6 Example The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find the mean time on the Internet. 0 0 7 7 22 11.0 The Arithmetic Mean

7 7 Drawback of the mean: It can be influenced by unusual observations, because it uses all the information in the data set.

8 8 Odd number of observations 0, 0, 5, 7, 8 9, 12, 14, 22 0, 0, 5, 7, 8, 9, 12, 14, 22, 33 Even number of observations Example Find the median of the time on the internet for the 10 adults of previous example The Median of a set of observations is the value that falls in the middle when the observations are arranged in order of magnitude. It divides the data in half. The Median Suppose only 9 adults were sampled (exclude, say, the longest time (33)) Comment 8.5, 8

9 The Median Depth of median = (n+1)/2 9

10 10 The Mode of a set of observations is the value that occurs most frequently. Set of data may have one mode (or modal class), or two or more modes. The modal class The Mode

11 11 Find the mode for the data in the Example. Here are the data again: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 Solution All observation except “0” occur once. There are two “0”s. Thus, the mode is zero. Is this a good measure of central location? The value “0” does not reside at the center of this set (compare with the mean = 11.0 and the median = 8.5). The Mode

12 12 Relationship among Mean, Median, and Mode If a distribution is from a bell shaped symmetrical one, the mean, median and mode coincide If a distribution is asymmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode Mean = Median = Mode Mode < Median < Mean

13 13 If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode Mean Median Mode A negatively skewed distribution (“skewed to the left”) Relationship among Mean, Median, and Mode Mean < Median < Mode

14 14 Measures of variability Measures of central location fail to tell the whole story about the distribution. A question of interest still remains unanswered: How much are the observations spread out around the mean value?

15 15 Measures of variability Observe two hypothetical data sets: The average value provides a good representation of the observations in the data set. Small variability This data set is now changing to...

16 16 Measures of Variability Observe two hypothetical data sets: The average value provides a good representation of the observations in the data set. Small variability Larger variability The same average value does not provide as good representation of the observations in the data set as before.

17 17 – The range of a set of observations is the difference between the largest and smallest observations. – Its major advantage is the ease with which it can be computed. – Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points. ? ? ? But, how do all the observations spread out? Smallest observation Largest observation The range cannot assist in answering this question Range The Range

18 18 l This measure reflects the dispersion of all the observations l The variance of a population of size N x 1, x 2,…,x N whose mean is  is defined as l The variance of a sample of n observations x 1, x 2, …,x n whose mean is is defined as The Variance

19 19 Why not use the sum of deviations? Consider two small populations: 10 98 74 1112 1316 8-10= -2 9-10= -1 11-10= +1 12-10= +2 4-10 = - 6 7-10 = -3 13-10 = +3 16-10 = +6 Sum = 0 The mean of both populations is 10... …but measurements in B are more dispersed than those in A. A measure of dispersion Should agrees with this observation. Can the sum of deviations Be a good measure of dispersion? A B The sum of deviations is zero for both populations, therefore, is not a good measure of dispersion.

20 20 Let us calculate the variance of the two populations Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of variation instead? After all, the sum of squared deviations increases in magnitude when the variation of a data set increases!! The Variance

21 21 Which data set has a larger dispersion? 131 32 5 AB Data set B is more dispersed around the mean Let us calculate the sum of squared deviations for both data sets The Variance

22 22 13 1 3 2 5 AB Sum A = (1-2) 2 +…+(1-2) 2 +(3-2) 2 + … +(3-2) 2 = 10 Sum B = (1-3) 2 + (5-3) 2 = 8 Sum A > Sum B. This is inconsistent with the observation that set B is more dispersed. The Variance

23 23 13 1 3 2 5 AB However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked.  A 2 = Sum A /N = 10/5 = 2  B 2 = Sum B /N = 8/2 = 4 The Variance

24 24 Example – The following sample consists of the number of jobs six students applied for: 17, 15, 23, 7, 9, 13. Find its mean and variance Solution The Variance

25 25 The Variance – Shortcut method

26 26 The standard deviation of a set of observations is the square root of the variance. Standard Deviation

27 27 Example – To examine the consistency of shots for a new innovative golf club, a golfer was asked to hit 150 shots, 75 with a currently used (7-iron) club, and 75 with the new club. – The distances were recorded. – Which club is better? Standard Deviation

28 28 Example – solution Example Standard Deviation Excel printout, from the “Descriptive Statistics” sub- menu. The innovation club is more consistent, and because the means are close, is considered a better club

29 29 The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. This coefficient provides a proportionate measure of variation. A standard deviation of 10 may be perceived large when the mean value is 100, but only moderately large when the mean value is 500 The Coefficient of Variation

30 Percentiles Example from http://www.ehow.com/how_2310404_calculate-percentiles.htmlhttp://www.ehow.com/how_2310404_calculate-percentiles.html Your test score, e.g. 70%, tells you how many questions you answered correctly. However, it doesn’t tell how well you did compared to the other people who took the same test. If the percentile of your score is 75, then you scored higher than 75% of other people who took the test. 30

31 31 Sample Percentiles and Box Plots Percentile – The pth percentile of a set of measurements is the value for which p percent of the observations are less than that value 100(1-p) percent of all the observations are greater than that value.

32 32 Sample Percentiles Find the 10 percentile of 6 8 3 6 2 8 1 Order the data: 1 2 3 6 6 8 8 7*(0.10) = 0.70; round up to 1 The first observation, 1, is the 10 percentile.

33 33 Commonly used percentiles – First (lower) quartile, Q 1 = 25th percentile – Second (middle) quartile,Q 2 = 50th percentile – Third quartile, Q 3 = 75th percentile – Fourth quartile, Q 4 = 100th percentile – First (lower) decile= 10th percentile – Ninth (upper) decile = 90th percentile

34 34 Quartiles and Variability Quartiles can provide an idea about the shape of a histogram Q 1 Q 2 Q 3 Positively skewed histogram Q 1 Q 2 Q 3 Negatively skewed histogram

35 35 Large value indicates a large spread of the observations Interquartile range = Q 3 – Q 1 Interquartile Range

36 36 Paired Data Sets and the Sample Correlation Coefficient The covariance and the coefficient of correlation are used to measure the direction and strength of the linear relationship between two variables. – Covariance - is there any pattern to the way two variables move together? – Coefficient of correlation - how strong is the linear relationship between two variables

37 37  x (  y ) is the population mean of the variable X (Y). N is the population size. Covariance x (y) is the sample mean of the variable X (Y). n is the sample size.

38 38 If the two variables move in opposite directions, (one increases when the other one decreases), the covariance is a large negative number. If the two variables are unrelated, the covariance will be close to zero. If the two variables move in the same direction, (both increase or both decrease), the covariance is a large positive number. Covariance

39 39 Compare the following three sets Covariance xixi yiyi (x – x)(y – y)(x – x)(y – y) 267267 13 20 27 -3 1 2 -7 0 7 21 0 14 x=5y =20Cov(x,y)=17.5 xixi yiyi (x – x)(y – y)(x – x)(y – y) 267267 27 20 13 -3 1 2 7 0 -7 -21 0 -14 x=5y =20Cov(x,y)=-17.5 xixi yiyi 267267 20 27 13 Cov(x,y) = -3.5 x=5y =20

40 40 – This coefficient answers the question: How strong is the association between X and Y? The coefficient of correlation

41 41 COV(X,Y)=0  or r = +1 0 Strong positive linear relationship No linear relationship Strong negative linear relationship or COV(X,Y)>0 COV(X,Y)<0 The coefficient of correlation

42 42 If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero. The Coefficient of Correlation

43 43

44 Correlation and causation Recognize the difference between correlation and causation — just because two things occur together, that does not necessarily mean that one causes the other. For random processes, causation means that if A occurs, that causes a change in the probability that B occurs. 44

45 Correlation and causation Existence of a statistical relationship, no matter how strong it is, does not imply a cause-and-effect relationship between X and Y. for ex, let X be size of vocabulary, and Y be writing speed for a group of children. There most probably be a positive relationship but this does not imply that an increase in vocabulary causes an increase in the speed of writing. Other variables such as age, education etc will affect both X and Y. Even if there is a causal relationship between X and Y, it might be in the opposite direction, i.e. from Y to X. For eg, let X be thermometer reading and let Y be actual temperature. Here Y will affect X. 45

46 Example Dr. Leonard Eron, professor at the University of Illinois at Chicago, has conducted a longitudinal study of the long–term effects of violent television programming. In 1960, he asked 870 third grade children their favorite television shows. He found that children judged most violent by their peers also watched the most violent television. Dr. Eron noted, however, that it was not clear which came first — the child’s behavior or the influence of television. In follow-up interviews at ten–year intervals, Eron found that youngsters who at age eight were nonaggressive but were watching violent television were more aggressive than children who at age eight were aggressive and watched non–violent television. Eron claims that this establishes a cause–and–effect relationship between watching violent television and aggressive behavior. Can you think of any other possible causes? 46

47 Example - solution It could be that the difference in aggressive behavior is due to other familial influences. Perhaps children who are permitted to watch violent programming are more likely to come from violent or abusive families, which could also lead to more aggressive behavior. 47


Download ppt "EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population."

Similar presentations


Ads by Google