Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Tendencia central y dispersión de una distribución.

Similar presentations


Presentation on theme: "1 Tendencia central y dispersión de una distribución."— Presentation transcript:

1 1 Tendencia central y dispersión de una distribución

2 2 Review Topics Measures of Central Tendency Mean, Median, Mode Quartile Measures of Variation The Range, Variance and Standard Deviation, Coefficient of variation Shape Symmetric, Skewed

3 3 Important Summary Measures Central Tendency Mean Median Mode Quartile One sample Summary Measures Variation Variance Standard Deviation Coefficient of Variation Range

4 4 Measures of Central Tendency Central Tendency Mean MedianMode Data: You can access practice sample data on HMO premiums here.here

5 5 With one data point clearly the central location is at the point itself. But if the third data point appears on the left hand-side of the midrange, it should “pull” the central location to the left. Measures of Central Location (Tendency) Usually, we focus our attention on two aspects of measures of central location: – Measure of the central data point (the average). – Measure of dispersion of the data about the average. With two data points, the central location should fall in the middle between them (in order to reflect the location of both of them). If the third data point appears exactly in the middle of the current range, the central location should not change (because it is currently residing in the middle).

6 6 – This is the most popular and useful measure of central location Sum of the measurements Number of measurements Mean = Sample meanPopulation mean Sample sizePopulation size § Arithmetic mean

7 7 Example 4.1 The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by 7 7 3 3 9 9 4 4 6 6 4.5 Example 4.2 Suppose the telephone bills of example 2.1 represent population of measurements. The population mean is 42.19 15.30 53.21 43.59

8 8 26,26,28,29,30,32,60,31 Odd number of observations 26,26,28,29,30,32,60 Example 4.4 Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29. Find the median salary. – The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude. Suppose one employee’s salary of $31,000 was added to the group recorded before. Find the median salary. Even number of observations 26,26,28,29, 30,32,60,31 There are two middle values! First, sort the salaries. Then, locate the value in the middle First, sort the salaries. Then, locate the value s in the middle 26,26,28,29, 30,32,60,31 29.5, § The median

9 9 – The mode of a set of measurements is the value that occurs most frequently. – Set of data may have one mode (or modal class), or two or more modes. The modal class For large data sets the modal class is much more relevant than the a single- value mode. § The mode

10 10 Example 4.6 A professor of statistics wants to report the results of a midterm exam, taken by 100 students. The data appear in file XM04-06. Find the mean, median, and mode, and describe the information they provide. The mean provides information about the over-all performance level of the class. The Median indicates that half of the class received a grade below 81%, and half of the class received a grade above 81%. The mode must be used when data is qualitative. If marks are classified by letter grade, the frequency of each grade can be calculated.Then, the mode becomes a logical measure to compute. Excel Results

11 11 Relationship among Mean, Median, and Mode If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode

12 12 ` If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode Mean Median Mode A negatively skewed distribution (“skewed to the left”)

13 13 Measures of Variation Variation VarianceStandard DeviationCoefficient of Variation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range Interquartile Range

14 14 Measures of variability (Looking beyond the average) Measures of central location fail to tell the whole story about the distribution. A question of interest still remains unanswered: How typical is the average value of all the measurements in the data set? How much spread out are the measurements about the average value? or

15 15 Observe two hypothetical data sets The average value provides a good representation of the values in the data set. Low variability data set High variability data set The same average value does not provide as good presentation of the values in the data set as before. This is the previous data set. It is now changing to...

16 16 – The range of a set of measurements is the difference between the largest and smallest measurements. – Its major advantage is the ease with which it can be computed. – Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. ? ? ? But, how do all the measurements spread out? Smallest measurement Largest measurement The range cannot assist in answering this question Range § The range

17 17 – This measure of dispersion reflects the values of all the measurements. – The variance of a population of N measurements x 1, x 2,…,x N having a mean  is defined as – The variance of a sample of n measurements x 1, x 2, …,x n having a mean is defined as § The variance

18 18 Consider two small populations: Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, 16 10 98 74 1112 1316 8-10= -2 9-10= -1 11-10= +1 12-10= +2 4-10 = - 6 7-10 = -3 13-10 = +3 16-10 = +6 Sum = 0 The mean of both populations is 10... …but measurements in B are much more dispersed then those in A. Thus, a measure of dispersion is needed that agrees with this observation. Let us start by calculating the sum of deviations A B The sum of deviations is zero in both cases, therefore, another measure is needed.

19 19 10 98 74 1112 1316 8-10= -2 9-10= -1 11-10= +1 12-10= +2 4-10 = - 6 7-10 = -3 13-10 = +3 16-10 = +6 Sum = 0 A B The sum of deviations is zero in both cases, therefore, another measure is needed. The sum of squared deviations is used in calculating the variance. See example next.

20 20 Let us calculate the variance of the two populations Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of dispersion instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!!

21 21 – Example 4.8 Find the mean and the variance of the following sample of measurements (in years). 3.4, 2.5, 4.1, 1.2, 2.8, 3.7 – Solution A shortcut formula =[3.4 2 +2.5 2 +…+3.7 2 ]-[(17.7) 2 /6] = 1.075 (years) 2

22 22 Sample Standard Deviation For the Sample : use n - 1 in the denominator. Data: 10 12 14 15 17 18 18 24 s = n = 8 Mean =16 = 4.2426 s

23 23 Interpreting Standard Deviation The standard deviation can be used to – compare the variability of several distributions – make a statement about the general shape of a distribution. The empirical rule: If a sample of measurements has a mound-shaped distribution, the interval

24 24 Comparing Standard Deviations s = = 4.2426 = 3.9686 Value for the Standard Deviation is larger for data considered as a Sample. Data : 10 12 14 15 17 18 18 24 N= 8 Mean =16

25 25 Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s =.9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C

26 26 Measures of Association Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram. – Covariance - is there any pattern to the way two variables move together? – Correlation coefficient - how strong is the linear relationship between two variables

27 27  x (  y ) is the population mean of the variable X (Y) N is the population size. n is the sample size. § The covariance

28 28 – This coefficient answers the question: How strong is the association between X and Y. § The coefficient of correlation

29 29 COV(X,Y)=0  or r = +1 0 Strong positive linear relationship No linear relationship Strong negative linear relationship or COV(X,Y)>0 COV(X,Y)<0


Download ppt "1 Tendencia central y dispersión de una distribución."

Similar presentations


Ads by Google