Download presentation
Presentation is loading. Please wait.
Published byDina Cooper Modified over 10 years ago
1
Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. ~Aaron Levenstein Statistics can be made to prove anything - even the truth. ~Author Unknown Lottery: A tax on people who are bad at math. ~Author Unknown He uses statistics as a drunken man uses lampposts - for support rather than for illumination. ~Andrew Lang The theory of probabilities is at bottom nothing but common sense reduced to calculus. ~Laplace, Théorie analytique des probabilités, 1820 I could prove God statistically. Take the human body alone - the chances that all the functions of an individual would just happen is a statistical monstrosity. ~George Gallup Statistics are just a way for the mathematician to evangelize his faith. ~Hunter Brinkmeier There are three kinds of lies: lies, damned lies, and statistics.“ ~ Benjamin Disraelie Statistics – What is it?
2
Statistics is the science of using of mathematical tools to interpret data
3
Lesson Objective Understand the different ways of describing data Understand the importance of different sampling techniques when collecting data
4
The Different Ways of Describing Data Discrete data Continuous data Categorical data Numerical data Qualitative data Quantitative data
5
The Different Ways of Describing Data Data that falls into different labelled groups. If the labels are numerical then they have no numerical worth so calculating a mean is meaningless. Data that is digital and has specific values with gaps in between. A slight improvement in the accuracy of the measuring device does not alter the data. Data that is analogue and takes a range of values. A slight improvement in the accuracy of the measuring device alters the data collected. Data that is based on the size of numbers where the size of the numbers have some meaning. Data that has been collected based on some quality or categorization that in some cases may be 'informal' or may use relatively ill-defined characteristics such as warmth and flavour; Data that can be observed but not measured. Data that has a been collected by using a measuring scale is data measured or identified on a numerical scale. Discrete data Continuous data Categorical data Numerical data Qualitative data Quantitative data
6
Give 3 examples of each type of data: Discrete data Continuous data Categorical data Numerical data Qualitative data Quantitative data
7
Discrete data Continuous data Categorical data Numerical data Qualitative data Quantitative data The Different Ways of Describing Data Eg Types of Pet, House Number, Colour Eg Shoe Size, Dice score, Type of Pet Eg Time to run a mile, length of a hair Eg Score on a dice, Weight of a lemon Eg I feel happy, The weather is good today Eg The score obtained in a test, the height of a tree
8
Decide whether each of the following sets of data is categorical or numerical, and if numerical whether it is discrete or continuous. 1) Cards drawn from a set of playing cards: {2 of diamonds, ace of spades, 3 of hearts etc…} 2) Number of aces in a hand of 13 cards: {1, 2, 3, 4} 3) Time in seconds for 100 metre sprint: {10.05, 12.31, 11.20, 10.67, 11.56, …etc} 4) Fraction of coin tosses which were Heads after 1, 2, 3, … tosses for the following sequence: H T H T T T H H … {1, ½, 2/3, ½, 2/5, 1/3, 3/7, ½, …} 5) Number of spectators at a football match: {23 456, 40 132, 28 320, 18 214, …etc} 6) Day of week when people were born: {Wednesday, Monday, Sunday, Sunday, Saturday, etc…} 7) Times in seconds between ‘blips’ of a Geiger counter in a physics experiment: {0.23, 1.23, 3.03, 0.21, 4.51, …etc} 8) Percentages gained by students for a test out of 60: {20, 78.33, 80, 75, 53.33, …etc} 9) Number of weeds in a 1 m by 1 m square in a biology experiment: {2, 8, 12, 3, 5, 8, …}
9
Solution 1 and 6 are categorical data, all the others are numerical. 2 - discrete 3 - continuous 4 - discrete, as the possible fractions can be listed 5 - discrete 7 - continuous 8 - discrete, as there are only 60 possible percentage scores. 9 - discrete, as there must be a whole number of weeds.
10
Different Sampling Techniques There are many different ways to generate a sample for data collection: 4 of the most common are: Random Sampling Systematic Sampling Stratified Sampling Convenience Sampling Look at the cards on the next slide and decide which sampling technique is being described. Think of an advantage and a disadvantage for the technique described.
11
A bag contains 100 names. It is shaken and 30 names are drawn from the bag without looking A pollster stands in Huntingdon market square and asks the first 30 people that will listen to her their opinions on a market revamp. In a survey to assess opinions about Year 10 uniform a school list is printed and every 10 th pupil on the list selected. At a local club it is known that ¾ of the membership is female. A sample of 21 females and 7 males is drawn by randomly picking names from a hat. To find out opinions about a web site you ask the first 30 people to visit the site to complete a questionnaire using their browser. To select a sample of 6 people from a class of 30 to do a maths test, the class are lined up in height order and every 5 th pupil selected. In a class of 20 pupils each pupil is assigned a number and 4 members are selected for a competition by using the random number generator on a calculator. A Secondary school has 3 Key Stages with pupils split between them in the ratio 3:2:3 To survey opinions about the school canteen they interview 30 students from KS3, 20 from KS4 and 30 from KS5. To investigate the health of whales a marine biology charity decide to estimate the length of whales in the South Atlantic by measuring the first 10 whales they find.
12
Lesson Objective Understand the three key things required to analyse data
13
In an experiment pupils were selected randomly from their maths lessons and asked to estimate the area of a triangle and a rectangle. The area of both shapes was 15cm 2. The results are shown below: agegenderRec:15Tr:15 11f1211 f1050 11m1510 11f1516 11f1864 11f305 11m1625 11f1815 11f16 11m34.5 11f1520 11m812 11m89 f1413 11f1511 agegenderRec:15Tr:15 17f138 17m1416 17f1518 17m1220 17f1312 17f1612 17m1614 17m1410 17f1512 17f1213 17f1513 17f1020 18m1330 18f1415 18m 12 19f15 19m1815 Analyse this data.
14
In an experiment pupils were selected randomly from their maths lessons and asked to estimate the area of a triangle and a rectangle. The area of both shapes was 15cm 2. The results are shown below: agegenderRec:15Tr:15 11f1211 f1050 11m1510 11f1516 11f1864 11f305 11m1625 11f1815 11f16 11m34.5 11f1520 11m812 11m89 f1413 11f1511 agegenderRec:15Tr:15 17f138 17m1416 17f1518 17m1220 17f1312 17f1612 17m1614 17m1410 17f1512 17f1213 17f1513 17f1020 18m1330 18f1415 18m 12 19f15 19m1815 What things could we investigate?
15
Some nuggets of wisdom: 1)“This shows that the boys had a greater spread of data, meaning that the girls were more accurate” so spread implies accuracy? 2)“I predict that the girls will be more accurate than the boys at estimating the area as there are more of them and so a greater chance that more will correctly estimate the area” so the more people you have guessing the more accurate they will be? 3)“I predict that the boys will be better at estimating as there are fewer, meaning that there is less chance for anomalous results” so you get the best results by having a small sample size? Mode – generally useless for this exercise Calculating how many got it exactly right is generally useless as the data is continuous - the fact that some people guessed it correctly has more to do with Psychology than good estimating skills. Averaging averages to get an all embracing average is NEVER a good idea: Data set 1 Data set 2 1 and 8 6
16
Things to consider: 1)Is what they have tried to analyse clearly stated? Is there a hypothesis or some alternate statement explaining what they are trying to achieve? 2)Have they attempted to find an average? Is it the most appropriate average for the task? Is the average calculated properly? 3)Have they attempted to look at the consistency of the data? Have they used an appropriate method to measure consistency? Is their measure of consistency (range, IQR) calculated properly? 4)Have they drawn a graph or chart to help show the distribution of the data? 4)Have they written a final comment that refers to their initial statement/hypothesis and that attempts to provide a conclusion? Does the final comment agree with their actual maths? Have they referred to/tied their maths to the conclusion?(Eg the mean of …. for boys was greater than the mean for girls ….. therefore …) Does the conclusion comment on both consistency and averages? Is there anything in the conclusion to suggest deeper analysis? Is there anything that makes you go – that’s cleaver I like that! 1 mark 1 mark relevant average 1 mark accuracy 1 mark relevant measure 1 mark accuracy 1 mark relevant graph 1 mark accuracy 3 marks – you judge!
17
When we are analysing numerical data we are interested in 3 things: 1)The Location (Size) of the data 2)The variation (Spread) of the data 3)The shape (Distribution) of the data
18
1)The Location (Size) of the data We use averages for this purpose: Mean Mode Median Mid Range
19
2) The variation (Spread) of the data Range Inter-quartile Range Standard deviation/Root Mean Squared Deviation
20
3) The shape (Distribution) of the data We use graphs for this purpose: Stem and Leaf diagrams Box and Whisker Plots Bar Chars Histograms
21
Lesson Objective Revise basic graph types and their uses Focus on drawing and interpreting histograms
22
This data set is the heights of a group of 38 ‟A‟ level students. 1)How tall is the shortest person in the sample? 2)How many girls in the sample? 2) What is the range of the boys heights? 3) What is the median height of the girls? 4) What is the inter-quartile range of the boys heights? GIRLS BOYS
23
The Pie Charts show how Year 10 and 11 students travel to school. From the Pie Chart a) Can you tell if more Boys or Girls walk to school? b) If the angle for walking in the girls section is 18 degrees and represents 10 pupils, how many girls were surveyed.
24
This histogram illustrates the time students in a form group take to get to school in the morning. a)Find the number of students in the class. b)Estimate the probability that a randomly chosen pupil takes between 10 and 20 minutes to get to school.
25
height (cm)110-119120-129130-134135-139140-149150-159160-179180-189 frequency24356551 Question 1 The table below shows the heights, to the nearest centimetre, of a group of students. a) Draw a histogram for this data. b) Use your histogram to estimate the number of students taller than 153cm. c) Estimate the number of students between 127 and 143 cm tall.
26
height (cm)110-119120-129130-134135-139140-149150-159160-179180-189 frequency24356551 frequency density0.20.40.61 0.50.250.1 b) To find how many students are above 153cm in height, we would add the frequencies of the last two bars to the correct proportion of the previous bar. So there are approximately 9 students above 153 cm. c) The number of students between 127 and 143 cm tall is given by … The class width of the first bar would appear to be 9, but it is not. Because the heights are measured to the nearest centimetre, the first class embraces all heights between 109.5cm and 119.5cm. This is a class width of 10, and also involves labelling 109.5, 119.5 etc. on the horizontal axis of the histogram. Adding the frequency density row to the table...
27
time (minutes) frequency 0-1590 15-2040 20-25 25-35 2) Complete the table and histogram below.
28
time (minute s) frequency frequency density 0-15906 15-20408 20-258016 25-3510010
29
Bar Chart Most suitable Data Type(s) Discrete or Continuous Numerical or Categorical AdvantagesDisadvantages Pie Chart Stem and Leaf Box and Whisker Histogram
30
Bar Chart Most suitable Data Type AdvantagesDisadvantages Pie Chart Stem and Leaf Box and Whisker Histogram Categorical Discrete Categorical Discrete Numerical Small data sets continuous or discrete Numerical Continuous data Numerical Continuous data Shows proportions Clearly Can’t see how many are in each category. Not good if there are too many categories Easy to see how many are in each category. Shows shape well. Can’t see proportions so easily Keeps the raw data Shape of data clear Ordered data helps with medians etc Not good for large data sets Good for showing/comparing the spread of data Looses raw data Good for showing the shape of the data and the proportions Can’t read actual frequencies for the groups easily
31
Lesson Objective Be able to calculate measures of Location/Averages Understand summation notation for the mean What is an average and why do we have more than one way of calculating them?
32
“Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable. ” ~Bobby Bragan, 1963 “The average human has one breast and one testicle.” ~Des McHale “I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live.” ~Louis D. Brandeis These quotes might help you consider the answer to this question:
33
Averages for raw/untabulated data The data shows the number chocolates gratefully provided to a particular maths teacher from his sixth form classes over a 3 week period: Find the mean, mode, median and mid-range of the number of gifts received: 1, 2, 0, 3, 5, 1, 2, 0, 0, 4, 1, 1, 2, 1, 3
34
The data shows the number chocolates gratefully provided to a particular maths teacher from his sixth form classes over a 3 week period: Find the mean, mode, median and mid-range of the number of gifts received: 1, 2, 0, 3, 5, 1, 2, 0, 0, 4, 1, 1, 2, 1, 3 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5 Median:1 Mode: 1 Mean: =
35
Averages for tabulated data shoe size (x)frequency (f) 53 614 713 821 916 108 Total75 Find the mean, mode, median and mid-range for this data, showing shoe size
36
Median: 75 items of data median at (75 + 1)/2 = 38 th position Counting through the list median shoe size is 8 Mode: 8 Mean = = shoe size (x)frequency (f)frequency × shoe size (fx) 5315 61484 71391 821168 916144 10880 Total75582 Find the mean, mode, median and mid-range for this data, showing shoe size
37
Averages for tabulated data Find the mean, mode, median and mid-range for this data, showing speeds of vehicles along a road: speed, s (mph) number of vehicles (f) 20 ≤ s < 257 25 ≤ s < 3011 30 ≤ s < 3531 35 ≤ s < 4020 40 ≤ s < 4514 45 ≤ s < 509 Total92
38
Median: Use Cumulative Frequency Curve Instead for better accuracy! Estimate 92 items of data median at (92 + 1)/2 = 46.5 th position Counting through the list this will be in the 30 to 35 interval. Modal interval : 30 ≤ s < 35 Mean: Can only be estimated as lack of raw data = = Find the mean, mode, median and mid-range for this data, showing speeds of vehicles along a road: speed, s (mph) number of vehicles (f) mid-point (x) frequency × mid- point (fx) 20 ≤ s < 25722.5157.5 25 ≤ s < 301127.5302.5 30 ≤ s < 353132.51007.5 35 ≤ s < 402037.5750 40 ≤ s < 451442.5595 45 ≤ s < 50947.5427.5 Total923240
39
Lesson Objective Be able to calculate Interquartile Range for a list of data Drawing and Interpreting Box and Whisker Plots Understanding Skewness and identifying outliers
40
Two classes did a test (out of 100) Here are the results Class A: 50 82 40 51 45 50 48 49 47 10 43 58 56 52 39 16 Class B: 20 34 50 48 62 70 39 47 12 38 40 a)Find the median and interquartile range of the set of marks for each class. b)Draw a box and whisker plot to compare the results for each class.
41
Two classes did a test (out of 100) Here are the results Class A: 10 16 39 40 43 45 47 48 49 50 50 51 52 56 58 82 Class B: 12 20 34 38 39 40 47 48 50 62 70 a)Find the median and interquartile range of the set of marks for each class. b)Draw a box and whisker plot to compare the results for each class 0 10 20 30 40 50 60 70 80 90 100 CLASS A CLASS B Class A Median: 48.5, IQ Range = 10 Negatively Skewed Class B Median: 40, IQ Range = 16 Positively Skewed 48.5
42
A piece of data is generally considered an outlier if it is : 1.5 × IQR below the lower quartile OR 1.5 × IQR above the upper quartile Class A: 10 16 39 40 43 45 47 48 49 50 50 51 52 56 58 82 Class B: 12 20 34 38 39 40 47 48 50 62 70 Are there any outliers in each class?
43
Design a data set for one of the box and whisker charts on the next page Swap with a partner They must design a data set to recreate your graph as best as possible Compare at the end
46
Lesson Objective Be able to calculate the Standard Deviation for a set of data Use calculator to find the Standard Deviation for a set of data Write down some statements to compare these two sets of data. Which features are the same and which are different?
47
Here is the actual data? How does this clash with your previous assumptions?
48
Consider the following sets of numbers. Find the Range, The Interquartile Range and the Mean What are the limitations of the Range and the Interquartile Range in measuring consistency in a data set 4, 5, 9, 6, 6, 10, 10, 10, 11, 19
49
The Root Mean Squared Deviation (Commonly called the Standard Deviation of a Sample) R.M.S The value of this equation before you square root is referred to as the VARIANCE The Standard Deviation for a Population (It can be shown the Root Mean Squared Deviation formula when calculated on a sample taken from a population generally produces a result that is lower than the actual Standard Deviation of the Population – this is S3 + S4). The formula can therefore be adjusted as follows to take this into account: S.D of Population The value of this equation before you square root is still referred to as the VARIANCE NOTE: FOR OUR SYLLABUS IT IS EXPECTED THAT YOU WILL ALWAYS USE THE BOTTOM FORMULA WHEN ASKED TO CALCULATE STANDARD DEVIATION!!
50
Find the standard deviation for this set of data
52
Lesson Objective Understand the concept of ‘Coding’ Be able to find the mean and standard deviation of ‘coded’ data and related data sets
53
Here is some data. We will call this data the ‘x’ data: Find the mean and the standard deviation of this data? Check your results on your calculator.
54
Investigation Suppose you multiply each of the data you just used by 2 and add 3. Write down the new set of data. Call it the y-data. Now calculate the and the standard deviation of the y-data. What do you notice? How is it related to the original x-data? What if you multiply it by 2 and add 5? What if you multiply by 3 and add 5? Can you predict what will happen if you multiply by ‘a’ and add ‘b’? Can you justify your results?
55
Suppose you have a set of values (x-data) x 1, x 2, x 3, x 4, x 5 ………. Let the mean of the set of data be ‘m’ and the standard deviation ‘s’ Let another set of values (y-data) be so related to the x-data by a linear formula of the form y i = a × x i + b (‘a’ and ‘b’ are constants) Then: The mean of the y values = a × mean of ‘x-data’ + b The standard deviation of the y values = a × standard deviation of ‘x-data’ We can use this to find the mean of related sets of data. This process is called ‘Coding’ Eg Consider the values 1002, 1004, 1006, 1008, 1010 This data set is merely the data set 1, 2, 3, 4, 5 multipled by 2 and with 1000 added. The mean of 1, 2, 3, 4, 5 is 3 and the sd of 1, 2, 3, 4, 5 is 1.58 so the mean of the original data is 2 × 3 + 1000 = 1006 the sd of the original data is 2 × 1.58 = 3.16
56
Ex 50 Book S1 Third Edition
57
Lesson Objective Recognise and be able to use the alternative formula for standard deviation. shoe size (x)frequency (f) 53 614 713 821 916 108 Total75 75 adults were asked to their shoe size. The results are recorded in the table below. Calculate the standard deviation in the shoe-sizes using the formula: Check your result using your calculator
58
Lesson Objective Recognise and be able to use the alternative formula for standard deviation. shoe size (x)frequency (f)x × f 53 1522.8528 614 8443.3664 713 917.5088 821 1681.2096 916 14424.6016 108 8040.1408 Total75 582139.68 75 adults were asked to their shoe size. The results are recorded in the table below. Calculate the standard deviation in the shoe-sizes using the formula: Check your result using your calculator: Mean = 582÷ 75 =7.76 sd = √(139.68 ÷ 74) = 1.37
59
shoe size (x)frequency (f)x × f 53 1575 614 84504 713 91637 821 1681344 916 1441296 108 80800 Total75 154656 An alternative (rearrangement) of the formula: Is: This gives the same answer but is slightly easier to use when the data is in a frequency table: Mean = 582÷ 75 =7.76 sd = = 1.37
60
Height, h (cm) mid-points frequency (f) 158.54 160.511 162.519 164.58 166.55 168.53 Total 50 female students had their heights measured. The results were put into the table below. Find the mean height and the standard deviation in the heights: Check your result using your calculator.
61
Height, h (cm) mid-points frequency (f) 158.54 160.511 162.519 164.58 166.55 168.53 Total 50 female students had their heights measured. The results were put into the table below. Find the mean height and the standard deviation in the heights: Check your result using your calculator. Mean 162.5 cm sd = 2.56 cm
62
Different style of exam question Standard deviation formulae Given the following information relating to data placed in a frequency distribution. Find the mean and the standard deviation of the data
63
Different style of exam question Standard deviation formulae Given the following information relating to data placed in a frequency distribution. Find the mean and the standard deviation of the data Mean = 6.1 sd = 2.25 (3 sig fig)
64
Lesson Objective Understand what cumulative frequency curves represent Be able to draw a cumulative frequency curve Use a cumulative frequency curve to find medians, quartiles and percentiles
65
Weight of the Egg, w (grams) Frequ ency 30 ≤ w < 40 15 40 ≤ w < 50 25 50 ≤ w < 60 50 60 ≤ w < 70 40 70 ≤ w < 80 10 An egg farmer wants to grade his eggs in terms of size. Grade A will be the biggest size of egg Grade B the next, biggest etc with Grade D the smallest. Each grading should contain the same proportion of eggs. The table shows the weight of his first batch of eggs. What ‘boundaries’ should he choose for each egg Grade?
66
Weight of the Egg, w (grams) Cum. Freq. 0 ≤ w < 40 15 0 ≤ w < 50 40 0 ≤ w < 60 90 0 ≤ w < 70 130 0 ≤ w < 80 140 Weight of the Egg, w (grams) Frequ ency 30 ≤ w < 40 15 40 ≤ w < 50 25 50 ≤ w < 60 50 60 ≤ w < 70 40 70 ≤ w < 80 10 Quartile values will be roughly around: 35 (LQ), 70 (MEDIAN), 105 (UQ) LQ could be found by saying 40 + 20 / 25 of 10 = 48 MEDIAN 50 + 30 / 50 of 10 = 56 UQ 60+ 15 / 40 of 10 = 63.75 But this approach assumes a linear growth in the frequency across each interval
67
Weight of the Egg, w (grams) Frequ ency 30 ≤ w < 40 15 40 ≤ w < 50 25 50 ≤ w < 60 50 60 ≤ w < 70 40 70 ≤ w < 80 10 Cumulative frequency Weight 30354045505560 10 20 30 40 50 60 70 80 90 100 0 110 120 130 140 65707580 a)How a many eggs did the farmer harvest on this particular day? b)Estimate the Median weight of the eggs collected. c)Estimate the Inter-quartile range in the Eggs collected. Weight of the Egg, w (grams) Cum. Freq. 0 ≤ w < 40 0 ≤ w < 50 0 ≤ w < 60 0 ≤ w < 70 0 ≤ w < 80
68
Time in mins Cumulative frequency 30354045505560 10 20 30 40 50 60 70 80 90 100 0 Cumulative Frequency goes up the side Horizontal axis has a continuous scale You plot Cumulative Frequency at the end of the interval. (35,10) (40,21) etc A Cumulative frequency graph tells you how many items are below each value. Here 80 people waited for less than 53 mins. It is mainly used to estimate medians and percentiles for grouped data. Waiting TimeCum. Freq. 0 ≤ w < 35 10 0 ≤ w < 40 21 0 ≤ w < 45 46 0 ≤ w < 50 73 ……etc.…etc There were 100 people. The median waiting time is that obtained by the 50 th person (half of 100) = 46 mins. To find the Upper quartile, read the time at 75. For the lower quartile read the time at 25. Graph shows how long people waited to be seen at an eye clinic. Key Points:
69
Can you find data sets to match these cumulative frequency curves
70
Summary of what we have learned:
71
When comparing data we are interested in the location of the data (averages) the consistency of the data (measures of spread) and the shape of the data (Graphs) Averages: A single item of data that represents the whole data set Mean, Mode, Median, Mid Range Spread: Range, Interquartile Range, Root Mean Squared Deviation, Standard Deviation Shape: Bar Charts, Frequency Charts, Histograms, Frequency Polygons Can also draw Box and Whisker Plots (Good for showing skewness and spread) Pie Charts (Good for showing proportions) Cumulative Frequency Curves (Good for finding Interquartile Range for grouped data) The formula for the Variance is that for standard deviation without the square root Outliers are defined as being either: 1.5xIQR above the UQ or below the LQ or above/below mean +/- 2 standard deviations Standard deviation formulae: Root Mean squared Formulae:
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.