Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Teach A Level Maths” Statistics 1

Similar presentations


Presentation on theme: "“Teach A Level Maths” Statistics 1"— Presentation transcript:

1 “Teach A Level Maths” Statistics 1
The Sample Variance © Christine Crisp

2 Statistics 1 MEI/OCR "Certain images and/or photos on this presentation are the copyrighted property of JupiterImages and are being used with permission under license. These images and/or photos may not be copied or downloaded without permission from JupiterImages"

3 Can you find the medians and means for the following 3 data sets?
9 5 1 Set C 6 4 Set B 8 7 3 2 Set A Mean, Median 5 5 Although the medians and means are the same, the data sets are not really alike. The spread or variability of the numbers is quite different. How can we measure the spread within the data sets? ANS: The range and inter-quartile range both measure spread but neither uses all the data items.

4 5 9 1 Set C 6 4 Set B 8 7 3 2 Set A Mean, Median If you had to invent a method of measuring spread that used all the data items, what could you do? One thing we could do is find out how far each item is from the mean and add up these differences. e.g. 9 8 7 6 5 4 3 2 1 Set A: x -4 -3 4 3 2 1 -1 -2 = Data sets B and C give the same result. The negative and positive values have cancelled each other out.

5 To avoid the effect of the negative values we can either
ignore the negative signs, or square each difference ( since the squares will all be positive ). Squaring is more convenient for developing theory, so, e.g. 4 3 2 1 -1 -2 -3 -4 9 8 7 6 5 Set A: x 16 9 16 9 4 1 Let’s do this calculation for all 3 data sets:

6 x 5 9 1 6 4 8 7 3 2 Set C: x Set B: x Set A: x Mean, Set A: Set B:
The larger value for set B shows greater variability. Set C has least variability. Can you see a snag with this measurement? ANS: The calculated value increases if we have more data, so comparing data sets with different numbers of items would not be possible. To allow for this, we need to take n, the number of items, into account.

7 There are 2 formulae that can be used,
the mean square deviation. or the sample variance. In many books you will find the word variance used for the 1st of these formulae and you may have used it at GCSE. However, our data is nearly always a sample from a large unknown set of data ( the population ) and we take samples to find out about the population. The 1st formula does not give the best estimate of the variance of the population so is not used.

8 So, there are 2 quantities and their square roots that we need to be clear about
the mean square deviation, and the root mean square deviation. Also the sample variance, and the sample standard deviation.

9 x 7 9 14 This all seems very complicated but help is at hand.
Both the quantities, rmsd and s are given by your calculator. e.g. Find the root mean square deviation, rmsd, and the sample standard deviation, s, for the following data: x 7 9 14 Use the Statistics function on your calculator and enter the data. Select the list of calculations. You will be able to find the following: and Ignore the calculator notation. The rmsd is smaller than s ( because we are dividing by a larger number ). Correct to 3 s.f. we have

10 x 7 9 14 So, for the data we have Squaring these gives
( mean square deviation ) ( sample variance ) The part of the formula, , is in your formulae booklet ( see correlation and regression ), labelled Sxx. An expanded form of the expression is also given. All you have to do is divide by the correct quantity.

11 x 12 15 14 9 e.g.1 For the following sample data, find
(a) the root mean square deviation, rmsd, (b) the mean square deviation, msd, (c) the sample standard deviation, s, and (d) the sample variance s2. x 12 15 14 9 Answer: Using the calculator functions, (a) (b) (d) (c)

12 e.g.2 The following summary data are given for a sample of size 5:
(a) the mean square deviation, msd, (b) the root mean square deviation, rmsd , (c) the sample variance, s2 (d) the sample standard deviation, s , and, Find Solution: Using the formulae book, (a) msd = (b) rmsd = (c) (d)

13 SUMMARY The mean square deviation, msd, and sample variance, both measure the spread or variability in the data. If we have raw data we use the statistical functions on the calculator to find the rmsd or sample standard deviation. The sample standard deviation is the larger of these quantities. To find the msd or sample variance, we square the relevant quantity given by the calculator: msd = (rmsd)2 sample variance = s2 For summary data, we use the formulae book, choosing the appropriate form: Then, we divide by n for the msd or (n – 1) for s2.

14 Frequency Data The formula for the variance can be easily adapted to find the variance of frequency data. becomes We only use the formulae if we are given summary data. With raw data we enter the data into the calculator and use the statistical functions to get the answers directly.

15 e.g.1 Find the mean and sample standard deviation of the following data:
x 1 2 5 10 Frequency, f 3 8 4 Solution: Using the calculator functions, the mean, m = sample standard deviation, Although we don’t need the formula for this question, let’s check we have the correct value by using the formula:

16 e.g.1 Find the mean and sample standard deviation of the following data:
x 1 2 5 10 Frequency, f 3 8 4 Solution: So,

17 e.g.2 Find the sample standard deviation of the following lengths:
Length (cm) 1-9 10-14 15-19 20-29 Frequency, f 2 7 12 9 Solution: We need the class mid-values

18 e.g.2 Find the sample standard deviation of the following lengths:
Length (cm) 1-9 10-14 15-19 20-29 x Frequency, f 2 7 12 9 5 12 17 24·5 Solution: We need the class mid-values We can now enter the values of x and f on our calculators. Standard deviation, s =

19 e.g.3 Find the mean and sample variance of 20 values of x given the following:
Solution: Since we only have summary data, we must use the formulae sample mean, sample variance,

20 SUMMARY To find the root mean square deviation, rmsd, or the sample standard deviation, s, using the calculator functions, the values of x ( and f ) are entered and checked, the table of calculations gives both values, the larger value is the sample standard deviation, s the variance is the square of the standard deviation.

21 Exercise Find the mean, sample standard deviation and sample variance for each of the following samples, using calculator functions where appropriate. 1. 8 12 14 9 7 f 5 4 3 2 1 x 2. 21-25 16-20 11-15 6-10 1-5 Time ( mins ) observations where and

22 1. 8 12 14 9 7 f 5 4 3 2 1 x Answer: mean, standard deviation, s = variance, N.B. To find we need to use the full calculator value for s, not the answer to 3 s.f. Time ( mins ) 8 12 14 9 7 2. 21-25 16-20 11-15 6-10 1-5 f x 3 8 23 18 13 Answer: mean, standard deviation, s = variance,

23 3. 10 observations where and
Solution: mean, variance, Standard deviation, s

24 Outliers We’ve already seen that an outlier is a data item that lies well away from the other data. It may be a genuine observation or an error in the data. e.g. 1 Consider the following data: 10 12 14 17 19 21 81 With this data set, we would immediately suspect an error. The value 81 was likely to have been 18. If so, there would be a large effect on the mean and standard deviation although the median would not be affected and there would be little effect on the IQR. The presence of possible outliers is an argument in favour of using median and IQR as measures of data.

25 In an earlier section, we met a method of identifying outliers using a measure of 1·5  IQR above or below the median. A 2nd method used to identify outliers is to find points that are further than 2 standard deviations from the mean. e.g. 2. Consider the following sample: 10 12 14 17 18 19 21 22 24 33 The sample mean and sample standard deviation are : mean, standard deviation, s = So, and The point 33 is more than 2 standard deviations above the mean so, using this measure, it is an outlier.

26

27 The following slides contain repeats of information on earlier slides, shown without colour, so that they can be printed and photocopied. For most purposes the slides can be printed as “Handouts” with up to 6 slides per sheet.

28 There are 2 formulae that can be used to measure spread:
the mean square deviation. the sample variance, In many books you will find the word variance used for the 1st of these formulae and you may have used it at GCSE. However, our data is nearly always a sample from a large unknown set of data ( the population ) and we take the sample to find out about the population. The 1st formula does not give the best estimate of the variance of the population so is not used.

29 So, there are 2 quantities and their square roots that we need to be clear about
Also the mean square deviation the sample variance, and the root mean square deviation. the sample standard deviation.

30 The rmsd is smaller than s ( because we are dividing by a larger number ). Correct to 3 s.f. we have
e.g. Find the root mean square deviation, rmsd, and the sample standard deviation, s, for the following data: 14 9 7 x Use the Statistics function on your calculator and enter the data. Select the list of calculations. You will be able to find the following: Ignore the calculator notation.

31 Squaring these gives ( variance ) ( mean square deviation ) The part of the formula, , is in your formulae booklet ( see correlation and regression ), labelled Sxx. An expanded form of the expression is also given. All you have to do is divide by the correct quantity, n or n - 1. Using the formulae: If summary data are given, you will need to use the formulae instead of the calculator functions.

32 The mean square deviation, msd, and sample variance, both measure the spread or variability in the data. SUMMARY To find the msd or sample variance, we square the relevant quantity given by the calculator: If we have raw data we use the stats functions on the calculator to find the rmsd or sample standard deviation. msd = (rmsd)2 sample variance = s2 For summary data, we use the formulae book, choosing the appropriate form: Then, we divide by n for the msd or (n – 1) for s2. The sample standard deviation is the larger of these quantities.

33 9 14 15 12 x e.g.1 For the following sample data, find
(a) the root mean square deviation, rmsd, (b) the mean square deviation, msd, (c) the sample standard deviation, s, and (d) the sample variance s2. 9 14 15 12 x Answer: Using the calculator functions, (a) (b) (c) (d)

34 e.g.2 Given the following summary of data for a sample of size 5, find
Solution: Using the formulae book, (a) the mean square deviation, msd, (b) the root mean square deviation, rmsd , (c) the sample variance s2 (d) the sample standard deviation, s , and, msd = (a) (b) (c) (d) rmsd =

35 The formula for the variance can be easily adapted to find the variance of frequency data.
becomes Frequency Data As before, we only use the formulae if we are given summary data.

36 e.g.1 Find the mean and sample standard deviation of the following data:
4 8 5 3 Frequency, f 10 2 1 x Solution: So,

37 e.g.2 Find the sample standard deviation of the following lengths:
9 12 7 2 Frequency, f 20-29 15-19 10-14 1-9 Length (cm) Solution: Standard deviation, s = We need the class mid-values 5 17 24·5 We can now enter the values of x and f on our calculators. x

38 To find the root mean square deviation, rmsd, or the sample standard deviation, s, using the calculator functions, SUMMARY the values of x ( and f ) are entered and checked, the table of calculations gives both values, the variance is the square of the standard deviation. the larger value is the sample standard deviation, s, and this is the value that is most often used by statisticians,

39 Outliers We’ve already seen that an outlier is a data item that lies well away from the other data. It may be a genuine observation or an error in the data. e.g. 1 Consider the following data: 81 21 19 17 14 12 10 With this data set, we would immediately suspect an error. The value 81 was likely to have been 18. If so, there would be a large effect on the mean and standard deviation although the median would not be affected and there would be little effect on the IQR. The presence of possible outliers is an argument in favour of using median and IQR as measures of data.

40 A 2nd method used to identify outliers is to find points that are further than 2 standard deviations from the mean. The point 33 is more than 2 standard deviations above the mean so, using this measure, it is an outlier. In an earlier section, we met a method of identifying outliers using a measure of 1·5  IQR above or below the median. e.g. 2. Consider the following sample: 21 22 24 33 19 18 17 14 12 10 The sample mean and sample standard deviation are : mean, standard deviation, s = So, and


Download ppt "“Teach A Level Maths” Statistics 1"

Similar presentations


Ads by Google