Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics Notes.

Similar presentations


Presentation on theme: "Statistics Notes."— Presentation transcript:

1 Statistics Notes

2 Generate & collect Data
What is Statistics? Statistics is the collection, organisation, and interpretation of data. Pose a question Generate & collect Data Interpret the results Analyse the data

3 Producing Data You use information collected by somebody else. This information could come from books, internet, newspapers, journals etc. You collect this data yourself Observational surveys: you collect information but do not influence what is happening Experimental Studies: you deliberately influence events and investigate the effects e.g. drug trails, laboratory studies Primary Data Secondary Data

4 How Reliable is Secondary Data?
Who carried out the survey? What was the population? How as the sample selected? How large was the sample? What was the response rate? How were the subjects contacted? When was the survey carried out? What were the exact questions?

5 Types of Data Types of Data Categorical Numerical Nominal Ordinal
Discrete Continuous

6 Categorical Categories that cannot be arranged in a particular order. These are non numeric. Examples: eye colour, place of birth, favourite program... Categories of data that can be ordered in a particular way. Examples: exam grades (A, B, C etc...), month of birth, stress levels... What would be a good way to represent this data? Pie Charts and Bar Charts are used for Categorical Data Nominal Ordinal

7 Numerical Numeric answers that can take an infinite number of values
Number answers that can only take on certain values. Examples: shoe size, age (in years), number of children in a family... Numeric answers that can take an infinite number of values Examples: height, age, foot length, rainfall... What would be a good way to represent this data? Discrete: bar chart, pie chart, line plot, stem and leaf Continuous: histograms Discrete Continuous

8 Numerical data is data which involves the use of numbers...eg 1,2,3
What is numerical data? Numerical data is data which involves the use of numbers...eg 1,2,3

9 What is categorical data?
Data which involves different categories.

10 Give an exampe of categorical data.
Gender Country of birth Favourite Sport

11 Data that can only take certain values.
What is discrete data? Data that can only take certain values.

12 Give an example of discrete data.
No of goals scored in a match. No of desks in a classroom.

13 What is continuous data?
Data that is measured on some scale and can take any value on that scale.

14 Give an example of continous data.
Height Speed of cars passing a certain point Time taken to complete a sprint

15 What is a sample survey? This is when information is gathered from a small part of the population as opposed to a census.

16 Explain what is meant by bias in sampling?
If you were going a survey on fitness and interviewed people leaving a gym this would be biased as these people obviously train.

17 What is simple random sampling?
This is when every member of the population has an equal chance of being selected. Eg. Put names in a hat.

18 What is stratified sampling?
This is where the population can be split into seperate groups that are different from each other.

19 What is systematic sampling?
A sample which is obtained by choosing items at regular intervals from an unordered list is called a systematic sample.eg..if you wish to take 20 students from 200 you could choose every 10th student.

20 What is quota sampling? Used in market research and opinion polls.Population is divided into groups in terms of age,social class etc,Then the interviewer is told how many people to take from each group.

21 What is Cluster Sampling?
The population being sampled is split into groups or clusters .The clusters are then randomly chosen and every item in the cluster is looked at.Cluster sampling is very popular with scientists..

22 What is convience sampling?
This involves selecting a group of people because it is easy for you to contact them and they are willing to answer your questions.

23 Explain the difference between a census and a sample.
Census is the full population, Sample is a smaller part of the population.

24 No which occurs most often.
What is the mode? No which occurs most often.

25 How do you find the mean of a set of numbers?
Add the numbers and divide by the number of numbers.

26 HOW DO YOU FIND THE MEAN OF A FREQUENCY DISTRIBUTION?
∑fx fx

27 How do i find a mid interval value in a group frequency distribution?
Add the two numbers in the interval and divide by two.

28 What is the median? Middle number

29 What is an outlier? An outlier is a very high or low value that is not typical of the other values in a data set.If the data is small then the outlier can have a significant effect on the mean.

30 How do you find the range of a set of data?
Highest value minus the lowest value.

31 The number or average of the numbers in the middle
Mean The average Median The number or average of the numbers in the middle The number that occurs most Mode

32 Find your answer before clicking!
An electronics store sells CD players at the following prices: $350, $275, $500, $325, $100, $375, and $300. What is the mean price? Find your answer before clicking!

33 The mean or average price of a CD player is $317.86.
$350 + $275 + $500 + $325 + $100 +$375 + $300 = $2225 $2225 / 7 = $317.86 The mean or average price of a CD player is $

34 Median is the middle number in a set of data when the data is arranged in numerical order.

35 12, 15, 11, 11, 7, 13 First, arrange the data in numerical order. 7, 11, 11, 12, 13, 15 Then find the number in the middle or the average of the two numbers in the middle. = / 2 = 11.5 The median is 11.5

36 Find your answer before clicking!
An electronics store sells CD players at the following prices: $350, $275, $500, $325, $100, $375, and $300. What is the median price? Find your answer before clicking!

37 $100, $275, $300, $325, $350, $375, $500 The median price is $325.
First place the prices in numerical order. $100, $275, $300, $325, $350, $375, $500 The price in the middle is the median price. The median price is $325.

38 The range of a set of data is the difference between the largest and the smallest number in the set.
For example, consider the following set: 40, 30, 43, 48, 26, 50, 55, 40, 34, 42, 47, and 50 To find the range you would take the largest number, 50, and subtract the smallest number, 26. 55 – 26 = 29 The range is 29!

39 http://www.bbc.co.uk/bitesize/ks3/maths/h andling_data/collecting_recording/quiz/q992 18337

40 RANGE AND VARIABILTY The range of a set of data is the largest value minus the smallest value. The range is NOT an average. It shows the SPREAD of the data. Useful when comparing two sets of data. Crude measure as it uses only the largest value and the smallest value of the data. More spread – out the data illustrates the variability of data. Range often used as a measure of variability as it is easy to calculate and easy to understand.

41 Quartiles and Interquartile Range
Median: half way into the data The data can also be divided into four quarters. The lower quartile is the value one quarter of the way into the data. The upper quartile is the value three quarters of the way into the data.

42 Quartiles Quartiles divide a set of data into four components.
There are three quartiles, the lower quartile (1/4), median (1/2) and the upper quartile (3/4). The interquartile is the difference between the lower quartile (Q1) and the upper quartile (Q3).

43 The upper quartile minus the lower quartile is called interquartile.
Interquartile Range The upper quartile minus the lower quartile is called interquartile.

44 What are outliners? Outliners are extreme values that are not typical of the other values in the set. AGAIN…… INTERQUARTILE RANGE: Q3 – Q1

45 Lets try an example together
Fifteen people attended a quiz in the local hall last weekend. There ages are as follows: 14,15,18,22,24,25,29,30,33,37,39,41,41,48,50

46 Finding the interquartile range
Calculate the lower quartile (Q1): 22 The upper quartile (Q3) : 41 INTERQUARTILE RANGE: Q3-Q1 = 41 – 22 = 19

47 Let’s try now it ourselves…

48 Using Appropriate Averages:
The MEAN is useful when a ‘typical’ value is wanted. It should not be used when there are extreme values, for example very small and very large values. The MEDIAN is useful average to use if there are extreme values. The MODE us useful when the most common value is needed.

49 ADVANTAGES/ DISADVANTAGES OF USING THE MEAN, MEDIAN AND MODE

50 WHEN TO USE? AVERAGE WHEN TO USE MODE
If the data is categorical, then the mode is the only sensible measure of centre to use. Therefore, for data on hair colour, eye colour, gender, etc. use only the mode. The mode can also be used with numerical data. MEDIAN Used only with numerical data. If there are extreme values in the data set, then use the median. MEAN If there are not extreme values in the data set, use the mean.

51 AVERAGE ADVANTAGES DISADVANTAGES MEAN Uses all the data. Easy to calculate. Mean is not always a given data value. Affected by extreme values (outliners). MODE Not influenced by extreme values Easy to find Extreme values (outliners) do not affect the mode. Not very useful for further analysis. May not exist. MEDIAN Useful for further analysis. Easy to calculate if data are ordered. When no values repeat in the data set, the mode is every value and is useless.

52 For categorical data, such as the colours of dresses, the mode is the only average that can be used.
Why is this?

53 How do you find the interquartile range?
Upper quartile minus lower quartile.

54 Need to know how to find the Standard deviation and correlation coefficient using calculator

55 What is a symmetrical(normal) distribution
The distribution has an axis of symmetry down the middle.

56 What is meant by positive skew?
When a distribution has most of the data at lower values.

57 What is meant by negative skew?
This is when the distribution has most of the data at higher values.

58 What is a uniform distribution?
This is where the data is evenly spread out.

59 This is where the data can be compared using the mean,mode,median.
Central Tendency This is where the data can be compared using the mean,mode,median.

60 What are the errors with this pictogram?
Freq Butter 56 Marmalade 72 Jam 60 Marmite 66 What do you have on your toast? = 8 people

61 Line Plot When the items or categories being tallied are numbers, a line plot can be used to visually display the data. A line plot uses X marks above a number line to show the frequencies. X X X X X X The X marks above the number line show the frequencies. X X X X X X X X X X The Number Line shows the number of books read. X X X X X X 1 2 3 4 5 6 7 Number of Books Read

62 A LINE PLOT is a diagram that shows data displayed on a number line.
Line plots clearly show each piece of data. Line plots clearly show the mode and range. Line plots can be used to easily find the median since the values are in numerical order. Line plots are not used to easily determine the mean.

63 Let’s use another example to show the process of plotting with INTERVALS.
A line plot does not have to start at 0. A line plot does not have to have every digit listed at the bottom.

64 Your line plot could look like this one:
Notice that the heights are marked every TWO inches instead of every inch.

65 What is the RANGE? RANGE: from 50in to 60in.

66 Are there any GAPS? No one is 51, 53, 57, 58, or 59 inches tall. .

67 Are there any CLUSTERS? Most of the heights are clustered between 52 and 56 inches.

68 Are there any OUTLIERS? Two students are several inches taller than the rest.

69 Bar Charts Bar charts must have: title frequency on the y-axis
gaps between the bars bars with equal widths Labelled axes

70 Bar chart to show the number of types of tree in a park
Freq Oak 2 Birch 5 Evergreen 8 Pine 10 Cedar 7 10 9 8 7 6 Frequency 5 4 3 2 1 Oak Birch Evergreen Pine Cedar Type of tree

71 Bar line graph to show the number of types of tree in a park
Freq Oak 2 Birch 5 Evergreen 8 Pine 10 Cedar 7 10 9 8 7 6 Frequency 5 4 3 2 1 Oak Birch Evergreen Pine Cedar Type of tree

72 1. Copy these tables into your books – one per page
Eye Colour Tally Frequency Blue Brown Green 1. Copy these tables into your books – one per page Favourite Season Tally Frequency Winter Spring Summer Autumn 2. Draw a bar chart for one of the other tables and a bar line graph for the remaining table. Hours watching TV a week Tally Frequency 0 – 4 5 – 9 10 – 14 15 – 19 Over 20

73 Comparative Bar Charts
We can compare two sets of data using bar charts Place the bars for the two sets of data next to each other for each category

74 Bar chart to compare the eye colour of boys and girls in Year 9
Blue 9 7 Green 4 Brown 2 10 9 8 BOYS 7 GIRLS 6 Frequency 5 4 3 2 1 Blue Green Brown Eye Colour

75 Stem and Leaf Diagrams How to Draw One:
September 18 Stem and Leaf Diagrams How to Draw One: 1. Put the first digits of each piece of data in numerical order down the left-hand side 2. Go through each piece of data in turn and put the remaining digits in the proper row 3. Re-draw the diagram putting the pieces of data in the right order 4. Add a key September 18

76 Stem and Leaf Diagrams September 18

77 Stem-and-leaf diagrams
September 18 Stem-and-leaf diagrams Sometimes data is arranged in a stem-and-leaf diagram. The below stem-and-leaf diagram shows the marks scored by 21 pupils in a maths test. 1 2 3 4 6 7 5 9 8 stem = tens leaves = units Find the median, mode and range for the data. There are 21 data values so the median will be the 11th value, that is ___ . 5 25 Start by explaining how to read the stem-and-leaf diagram. The scores shown in the diagram are 6, 7, 9, 14, 15, 15, 18, 20, 21, 23, 25, 26, 26, 30, 32, 32, 32, 35, 38, 40 and 40. Ask pupils what they think the test might have been out of. The values used for the stem and leaves depend on the data. For example, if we were showing times in minutes and seconds, we could use the stem to show the minutes and the leaves to show the seconds. If we were showing lengths between 3.0 and 8.0 written to one decimal place, we could use the stem to show the whole number of centimetres and the leaves to show the tenths (or millimetres). Discuss how to find the median, mode and range. 2 The mode is ___ . 32 The range is 40 – 6, which is ___ . 34 September 18

78 Stem and Leaf What is the modal group? What is the mode?
What is the range? How do I work it out? What is the median? How do I work it out? September 18

79 September 18 Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: Write the tens figures in the left hand column of a diagram. These are the ‘STEMS’ 4 5 6 7 8

80 September 18 Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: Go through the marks in turn and put in the units figures of each mark in the proper row. These are the ‘LEAVES’ 8 4 5 6 7 8 3 1

81 September 18 Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: When all the marks are entered the diagram will look like this: 8 3 1 4 5 6 7 8

82 September 18 Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: Rewrite the diagram so that the units figures in each row are in order: 4 5 6 7 8

83 Stem and Leaf Diagrams 5|2 = 52 0 4 7 9 2 3 4 7 7 8 9 9
September 18 Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: Add a KEY: 5|2 = 52 4 5 6 7 8

84 Stem and Leaf Diagrams Remember: - Always put in a Key
September 18 Stem and Leaf Diagrams Remember: - Always put in a Key - Always put your data in Order Median: - to work out the median, you must find the middle value - if there are two middle values, you need the average Range: - to work out the Range, subtract the smallest number from the biggest September 18

85 RE-CAP dling_data/collecting_recording/revision/8/ September 18

86 Median is the mean of the 8th and 9th data values.
September 18 Stem and Leaf Diagrams The stem & leaf diagram below shows the masses in kg of some people in a lift. How many people were weighed? What is the range of the masses? Find the median mass. Stem Leaf 3 4 5 6 7 8 1 2 tens units Median is the mean of the 8th and 9th data values. (c) 56 kg (a) 16 people. (b) 86 – 31 = 55 kg September 18

87 Stem and Leaf Put these 25 ages in a Stem and Leaf Diagram:
18, 25, 32, 19, 20, 32, 56, 41, 29, 30, 31, 38, 24, 21, 19, 43, 39, 34, 27, 20, 43, 50, 23, 42, 44 Ext: Try and find the median, mode and modal group. September 18

88 September 18 Solution 1 2 8, 9, 9 0, 0, 1, 3, 4, 5, 7, 9 0, 1, 2, 2, 4, 8, 9 1, 2, 3, 3, 4 0, 6 3 4 KEY 3|1 = 31 5 September 18

89 September 18

90 “Half of schools are below average”
OFSTED Presumably they would rather all schools were in the top 25% ;-)

91 Get data on fires in San Francisco for the last ten years
Get data on fires in San Francisco for the last ten years. Correlate the number of fire engines at each fire and the damages in dollars at each fire.  There is a strong positive correlation. Conclusion: fire engines cause the damage.

92 Twitter These are the number of positive and negative tweets per hour for people connected to Twitter. Look carefully at the scales….

93 80% of all statistics quoted to prove a point are made up on the spot.
Did you know that % of all statistics claim a precision of results that is not justified by the method employed? According to a recent survey, 33 of the people say they participate in surveys.

94 Survival rates in an air crash

95 Kills 99% of all known germs..
I’m worried about the unknown germs

96 Odds of getting a hole in one: 5,000 to 1  Odds of being an astronaut: 13,200,000 to 1  Odds of winning an Olympic medal: 662,000 to 1  Odds of injury from fireworks: 19,556 to 1  Odds of injury from using a chain saw: 4,464 to 1  Odds of injury from mowing the lawn: 3,623 to 1  Odds of drowning in a bathtub: 685,000 to 1  Odds of being killed on a 5-mile bus trip: 500,000,000 to 1  Odds of being struck by lightning: 576,000 to 1  Odds of being killed by lightning: 2,320,000 to 1  Odds of being the victim of serious crime in your lifetime: 20 to 1  Odds of dating a supermodel: 88,000 to 1  Odds of being on plane with a drunken pilot: 117 to 1  Odds of dating a millionaire: 215 to 1  Odds of dating a supermodel: 88,000 to 1  Odds of becoming a pro athlete: 22,000 to 1  Odds of finding a four-leaf clover on first try: 10,000 to 1  Odds of striking it rich on Antiques Roadshow: 60,000 to 1 

97

98 Dangerous Killer Animals Estimated Deaths per year
Bear 10 Shark 100 Jellyfish 100 Hippo 150 Elephant 400 Croc 700 Big Cats 800 Scorpion Snakes Mosquito

99

100

101 Which airline would you choose?

102

103 Pictograms: Poor or Perfect?

104 Standard Deviation September 18

105 WARM UP…. Ten students submitted their LCVP portfolios which were marked out of 40.The marks they obtained were 37, 34, 34, 34, 29, 27, 27, 10, 4, 28  (a) For these marks find (i) the mode (ii) the median (iii) the mean. (b)  Comment on your results. (c)  An external moderator reduced all the marks by 3. Find the mode, median and mean of the moderated results.

106 Variation or Spread of Distributions
September 18 Variation or Spread of Distributions Standard Deviation It tells us what is happening between the minimum and maximum scores It tells us how much the scores in the data set vary around the mean It is useful when we need to compare groups using the same scale September 18 Educational Assessment Unit Department for Curriculum Management

107 Standard Deviation Formula
The standard deviation formula can be represented using Sigma Notation: Notice the standard deviation formula is the square root of the variance. September 18

108 September 18 The Standard Deviation measures how far away each number in a set of data is from their mean. For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5? = - 9.5 - 9.5 September 18

109 September 18 Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5? = 7.5 - 9.5 7.5 September 18

110 LET’S CHECK THIS OUT…. Two machines A and B are used to measure the diameter of a washer. 50 measurements of a washer are taken by each machine. If the standard deviations of measurements taken by machine A and B are 0.4mm and 0.15mm respectively, which instrument gives more of an accurate result?

111 Solution Standard deviation of A = 0.4 mm Standard deviation of B = 0.15 mm The smaller the standard deviation, the less widely dispersed the data is. This means that more measurements are closer to the mean. Therefore, the measurements taken by instrument B are more consistent.

112 Hint: Find the mean of the data.
Subtract the mean from each value – called the deviation from the mean. Square each deviation of the mean. Find the sum of the squares. Divide the total by the number of items – result is the variance. Take the square root of the variance – result is the standard deviation. September 18

113 Try this yourself… WORKSHEET…DO ALL QUESTIONS PAIR UP! September 18

114 Question 1 Three siblings are setting off for the One Direction concert tonight on in the O2 arena, they are aged 12, 17 and 31. Find their mean age. Find the standard deviation from the mean correct to the nearest degree. Write down the mean age and the standard deviation from the mean of these three people, five years later.

115 QUESTION 2 The set S is the number of goals scored by both teams over five games = {2, 4, 5, 7, 7} Find: the mode of S the median of S the mean of S the standard deviation from the mean.

116 QUESTION 3 The number of Easter eggs twenty students received on Easter Sunday last year. Find the mean and the standard deviation from the mean, correct to one decimal place. No. of Easter Eggs 1 2 3 4 f(x) 8

117 Result… How do we find the mean from a frequency distribution?
What does the term frequency mean? Let’s draw the table…this will certainly help.

118 QUESTION 4 Twenty students are asked for how many minutes they spend on Facebook or twitter each day. This table shows their replies: Time (minutes) 0 - 40 Frequency 2 6 5 3 4 Using the mid – intervals values, estimate the mean viewing time. Find the standard deviation to the nearest minute.


Download ppt "Statistics Notes."

Similar presentations


Ads by Google