Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics Statistics deal with collecting, organizing, and interpreting data. A Survey is a method of collecting information. → Surveys use a small.

Similar presentations


Presentation on theme: "Statistics Statistics deal with collecting, organizing, and interpreting data. A Survey is a method of collecting information. → Surveys use a small."— Presentation transcript:

1

2 Statistics

3 Statistics deal with collecting, organizing, and interpreting data. A Survey is a method of collecting information. → Surveys use a small sample to represent a large population. Populations: the whole group; the group being studied. Sample: part of the population; the group being surveyed.

4 For each survey topic; determine which represents the population and which represents a sample of the population.

5 Making Predictions and Drawing Inferences

6 You can use survey results to predict the actions of a larger group or draw inferences on the entire population. Predictions: A hypothesis made based on survey results or past actions. Inference: A prediction that is made using observations, prior knowledge, and experience. Use proportions to help calculate your predictions and inferences.

7 A survey found that 6 out of 10 students at IMS have an IPod. Predict how many students have IPods if there are 650 students at IMS. About 390 students have IPods cell total x. 650 =

8 A researcher catches 60 fish from different locations in a lake. He then tags the fish and puts them back in the lake. Two weeks later, the researcher catches 40 fish from the same locations. 8 of these 40 fish are tagged. Predict the number of fish in the lake. About 300 fish tag total 60. x =

9 A middle school has 1,800 students. A random sample of 80 shows that 24 have cell phones. Predict the number of students in the middle school who have cell phones. About 540 students have cell phones phones total x =

10 A tilapia fish hatchery selectively releases fish when the populations have increased beyond a certain target level. In order to estimate the current fish population, workers at the hatchery catch 110 fish and mark them with special paint. Then a little while later, they catch 530 fish, among which 11 are marked. To the nearest whole number, what is the best estimate for the fish population? About 5,300 fish marked total 110. x =

11 In a random sample, 3 of 400 computer chips are found to be defective. Based on the sample, about how many chips out of 100,000 would you expect to be defective? About 750 chips will be defective defective total _ x _. 100,000 =

12 Mali is starting her own beehive so that she can have fresh honey straight from the hive. Mali decides to check the current population of bees in the hive by marking 52 bees with special bee-marking paint. Later, Mali collects 190 bees and observes that 26 of them are marked. To the nearest whole number, what is the best estimate for the bee population?. About 380 bees marked total x =

13 For a research project on rodents, 21 chipmunks were tagged and released. Later, researchers counted 100 chipmunks in the area. Of the chipmunks they counted, 14 had tags. To the nearest whole number, what is the best estimate for the chipmunk population? About 150 chipmunks tagged total 21. x =

14 While studying a gecko population, a group of university scientists marked and released 38 geckos. Later, the group counted a total of 240 geckos, of which 24 were marked. To the nearest whole number, what is the best estimate for the gecko population? About 380 geckos marked total 38. x =

15 To determine the jackrabbit population in a wildlife preserve, researchers tagged 110 jackrabbits. Later, they counted 200 jackrabbits. Out of the jackrabbits they counted, 22 had tags. To the nearest whole number, what is the best estimate for the jackrabbit population? About 1,000 jackrabbits tagged total 110 x =

16 Sampling

17 Biased Sample: A sample that doesn’t truly represent the population.  Example: Surveying 6 th graders about the height of IMS students. Random Sample: A sample where every member of the population has an equal chance of being picked.  Example: Surveying using lockers numbers that end in 2 about the height for IMS students.

18 Practice Problems Tell if each sample is biased or random. Explain your answer.

19 An airline surveys passengers from a flight that is on time to determine if passengers on all flights are satisfied. Biased If they are on-time, they are likely satisfied with their experience.

20 A newspaper randomly chooses 100 names from its subscriber database and then surveys those subscribers to find if they read the restaurant reviews. Random The names were randomly chosen in such a way that everyone in the population has an equal chance of being picked

21 The manager of a bookstore sends a survey to 150 customers who were randomly selected from a customer list. Random The customers were randomly chosen so everyone in the population has an equal chance of being picked.

22 A team of researchers’ surveys 200 people at a multiplex movie theater to find out how much money state residents spend on entertainment. Biased People who go to the movies likely spend more money on entertainment then randomly selected people.

23 Types of Random Sampling

24 Simple Random Sample: An unbiased sample where each item or person in the population is as likely to be chosen as any other.  Example: Each students’ name is on a piece of paper in a bowl; names picked without looking Systematic Random Sample: A sample where the items or people are selected according to a specific time or time interval.  Example: Every 20 th person is chosen from an alphabetical list of all students attending IMS.

25 Stratified Random Sample: A sample where the population is divided into groups; then choose a certain number at random from each group.  Example: Alphabetical list of all students at IMS divided into boys and girls. Then sampling every 20 th person from that list.

26 Types of Biased Sampling

27 Convenience Sample: A biased sample which consists of members of a population that are easily accessed.  Example: Only surveying one math class about IMS students’ favorite letter day Voluntary Sample: A biased sample which involves only those who want to participate in the sampling.  Example: Students at IMS who wish to participate in the survey can fill it out on-line

28 Try the Following Use your knowledge of types of Random and Biased Sampling methods to solve the following problems.

29 To find how much money the average American family spends to cool their home, 100 Alaskan families are surveyed at random. Of the families, 85 said that they spend less than $75 per month on cooling. The researcher concluded that the average American spends less than $75 on cooling per month. Is this conclusion valid? Explain. The conclusion is not valid. This is a biased convenience sample since people in the United States would spend much more than those in Alaska.

30 Zach is trying to decide which golf course is the best of three golf courses. He randomly surveyed people at a sports store and recorded the results in the table. Which type of sampling method did Zach use? Suppose Zach surveyed 150 more people. How many people would be expected to vote for Rolling green? A simple Random Sample 42 more people

31 Adults in every 100 th household in the phonebook are surveyed about which candidate they plan to vote for. Which type of sampling method is being described? Systematic Random Sample

32 Use the organizer to determine whether the conclusion is valid.

33 A computer program selects telephone numbers at random for a survey on which candidate people plan to vote for. Which type of sampling method is being described? Simple Random Sample

34 The researchers send a mail survey to apple farmers asking them to please record the number of their trees that are infected and send the survey back. Which type of sampling method is being described? Biased – Voluntary Response Sample

35 To determine what people in California think about a proposed law, 5,000 people from the state are randomly surveyed. Of the people surveyed, 5.8% are against the law. The legislature concludes that the law should not be passed. Which type of sampling method is being described? Is this a valid conclusion? Yes it is valid. A Simple Random Sample was used.

36 Types of Graphs First, what is a graph?

37 Types of Graphs ► Pictographs ► Histograms ► Bar Graphs ► Double Bar Graphs ► Line Graphs ► Double Line Graphs ► Circle (Pie) graphs ► Line Plots ► Stem-and-leaf plots ► Box-and- Whisker Plots

38 Pictographs ► Use pictures. ► What does this graph represent? ► How many students play hockey? ◦ 20 ► How many more students played soccer than hockey? ◦ 40

39 Histograms ► Show how often something occurs in equal intervals. ► This histogram shows: ◦ The distance of long jumps at a track meet ► What range occurred the most? Least? ◦ 5’7” – 6’, 6’7” – 7’ ► How many long jumps were from 5’1” to 6’? ◦ 25 long jumps ► How many more students jumped 5’7” – 6’ than 5’ – 5’6”? ◦ 5 students

40 Bar Graphs Bar Graphs ► Use bars of different lengths to display and compare data in specific categories. ► This bar graph shows: ◦ The amount of money raised in a charity walk by each of the grades. ► Which grade raised twice as much money as 8 th ? ◦ 10 th grade ► How much more money did 7 th grade raise than 8 th ? ◦ $30

41 Double Bar Graphs Double Bar Graphs ► Use pairs of bars to compare two sets of categorical data ► This graph compares: ◦ Number of Sports & History books in 3 different school libraries ► Which school has the greatest difference between sports & history? ◦ Oak ► Does the Maple School have more sports or history books? How many? ◦ History books, 11

42 Line Graphs ► Show a change in data over time. ► What data does this line graph present? ◦ Number of rainy days from May to December ► Between which 2 months was there the greatest increase in the number of rainy days? ◦ August & September

43 Double Line Graphs Double Line Graphs ► Uses two lines to compare two sets of data over time ► What is this double line graph comparing? ◦ Temperatures for first ten days of winter for two different years ► On what day were the temp’s the closest? ◦ Day 6 ► On what day were the temp’s the furthest? ◦ Day 10

44 Circle Graphs Circle Graphs ► Compare parts of a whole. Each sector, or slice, is one part of the entire data set. ► This graph compares: ◦ The results of Leo’s survey on pet ownership ► How many people do not own pets? ◦ 15 (50% of 30) ► How many people have cats? ◦ 6 people (20% of 30)

45 Line Plots Line Plots ► A graph that uses x’s and a number line to show frequency of data ► How many days did The Lorax train? ◦ 18 days ► Which number of miles did he run most often? least often? ◦ 5 miles, 2, 8, 16 ► How often did The Lorax run 6 mi? ◦ 3 days ► What is the range of the miles? ◦ 14 ► What is the median miles? ◦ 5 miles Number of miles The Lorax ran per day during training

46 Stem-and-Leaf Plots Stem-and-Leaf Plots ► A graph that uses digits of each number to organize and display data ► A Stem: represents the left- hand digit of the data value ► A Leaf: represents the remaining right-hand digits \ ► What’s the greatest amount of time spent doing homework? ◦ 64 minutes ► How many students were surveyed? ◦ 18 students ► How many students studied for 32 min? ◦ 2 students ► How many studied for more than 43 min? ◦ 7 students

47 Box-and-Whisker Plots Box-and-Whisker Plots ► Uses a number line to show the distribution of a data set and measures of variation. Also useful for large sets of data. ► These plots are divided into four parts called: quartiles ► The median of the entire data set is the middle ► The lower quartile is the median of the lower half of the data set ► The upper quartile is the median of the upper half of the data set ► The range is the difference between the highest and lowest data points ► The interquartile range is the difference between the upper and lower quartile

48 Measures of Central Tendency

49 Measures of central tendency show what the middle of a data set looks like. The measures of central tendency are the mean, median, and mode. The Range is NOT a measure of central tendency

50 Find the mean, median, mode, and range of the following data set: The ages of Mrs. Long’s grandchildren: 8, 3, 5, 4, 2, 3, 1, and 4.

51 Mean is average = 30 = 3.75 The mean is 3.75

52 Range max minus min Or largest minus smallest. List in order: 1, 2, 3, 3, 4, 4, 5, 8 8 – 1 = 7 The range is 7

53 Mode the number that occurs most often. There can be several modes or no mode List in order: 1, 2, 3, 3, 4, 4, 5, 8 The mode here is 3 and 4

54 Median is the middle data value when in order. The middle two numbers are 3 and 4 List in order: 1, 2, 3, 3, 4, 4, 5, 8 The median is 3.5

55 Often one measure of Central Tendency is more appropriate for describing a data set. Think about what each measure tells you about the data.

56 Find the median, mode, mean and range of each data set. Determine the measure of Central Tendency that best describes the data set.

57 6, 5, 3, 6, 8 List in order: 3, 5, 6, 6, 8 Median: 6Mean: 28/5 = 5.6 Best measure of center: 6 (median & mode) Mode: 6Range: 5

58 7, 6, 13, 16, 15, 9 List in order: 6, 7, 9, 13, 15, 16 Median: 13+9 = 22 22/2 =11 Mean: 66/6 = 11 Best measure of center: 11 (median & mean) Mode: none Range: 10

59 12, 15, 17, 9, 17 List in order: 9, 12, 15, 17, 17 Median: 15 Mean: 70/5 = 14 Best measure of center: 15 (possibly 14) (median and possibly mean) Mode: 17 Range: 8

60 51, 62, 68, 55, 68, 62 List in order: 51, 55, 62, 62, 68, 68 Median: 62 Mean: 366/6 = 61 Best measure of center: 62 (median, mean & mode) Mode: 62 & 68 Range: 17

61 List in order: 36, 41, 42, 44, 47 Median: 43 Mean: 210/5 = 42 Best measure of center: 42 or 43 (median or mean) Mode: none Range: 11

62 An outlier is an extreme value – either much less than the lowest value or much greater than the highest value.

63 Use the data set to answer the questions below: 4, 6, 3, 6, 25, 3, 2 Is there an outlier? If so, what is it? How does the outlier affect the mean and median? Which measure of central tendency is most effected by an outlier in a data set? Which measure of CT bests describes the data? Explain. List in order: 2, 3, 3, 4, 6, 6, 25 With outlier: Median= 4; Mean 49/7 = 7 Without outlier: Median= 3.5; Mean: 24/6 = 4 Yes 25 Mean! Median – it is not dramatically affected by outliers

64

65 What does misleading mean? To lead in the wrong direction. To manipulate statistics without lying. Misleading = Dishonesty To intentionally deceive someone.

66 Mrs. Long’s Salaries

67

68 What is the difference between the two graphs? Do these two graphs appear to show the same information? Why do you think someone would want to present the same information in different ways?

69

70

71

72 Key = or = 5 pets Which pet is most popular?

73 What is misleading about this bar graph?

74

75 What eye color is the most frequent?

76

77

78 Why would someone want to mislead you? To make it appear that they are correct. Change the way the data is interpreted To persuade someone To influence an opinion

79 Ways to Manipulate Statistics Change the values on the x- or y-axis. Do not start the graph at zero. Use different bar widths on a bar graph Change the way you conduct your survey ◦ Example: Survey only 6 th graders when you are collecting data on the height of middle school students at IMS. ◦ Survey’s should be random.

80 Try the examples in your notes..

81 Graphs let readers analyze data easily, but are sometimes made to influence conclusions by misrepresenting the data.

82  Explain how the graphs differ. ◦ Which graph appears to show a sharper increase in price? ◦ Which graph might the Student Council use to show that while ticket prices have risen, the increase is not significant? Why? Graph B They might use Graph A. The y-axis scale makes the increase appear less significant.

83  The line graphs show monthly profits of a company from October to March. Which graph suggests that the business is extremely profitable? Is this a valid conclusion? Explain. Although both graphs show a profit, Graph A’s profit increase is exaggerated due to the y-axis scale beginning with $500 intervals and changing to $100 intervals

84 Statistics can also be used to influence conclusions.  An amusement park boasts that the average height of their roller coasters is 170 feet. Explain how this might be misleading. ◦ Mean: ◦ Median: ◦ Mode: The mean has been affected by the outlier of 365, therefore using the average to describe this data set is misleading. 850/5 = No mode

85 How is this graph misleading? How could you redraw the graph so it would not be misleading? The y-axis scale does not have equal spacing Draw the y-axis scale starting at 0 with equal spacing so that the distance between 0 and 18,000 equal distance between 18,000 and 36,000.

86 How is this graph misleading? How could you redraw the graph so it would not be misleading? The y-axis scale has a break so the differences in jump distances appear greater Draw the y-axis scale starting at 0 and continuing to 7.5 using equal spacing.

87 Mon Tue WedThu Fri Taxicab Fares Fare ($) How is this graph misleading? How could you redraw the graph so it would not be misleading? Draw the y-axis scale starting at 0 and continuing to 7.5 using equal spacing. The y-axis scale has a break so the differences in fare appear greater.

88 Water Consumed Mark FrankMilaYvonne Ounces of Water How is this graph misleading? How could you redraw the graph so it would not be misleading? The y-axis scale does not start at zero so the differences in water consumed seem greater. Draw the y-axis scale starting at 0 and continuing to 48 using equal spacing.

89 Mean Absolute Deviation

90 Mean Absolute Deviation: The average amount each number is away from the mean of a a data set. Step 1. Find the mean. Step 2. Find the absolute value of the difference between each data value and the mean. Step 3. Find the average of those differences.

91 1.Find the mean: =448 2.Find differences of mean and data points: 56 – 52=4 56 – 48=8 56 – 60=4 56 – 55=1 56 – 59=3 56 – 54 =2 56 – 58=2 56 – 62=6 Find the mean of the differences: = 30 30/8 = 3.75

92 Try one on your own

93 = /8 = – 58=6 64 – 88=24 64 – 40=24 64 – 60=4 64 – 72=8 64 – 66 =2 64 – 80=16 64 – 48= = /8 = 12.5

94 The top five salaries and bottom five salaries for the 2010 New York Yankees are shown in the table below. Salaries are in millions of dollars and are rounded to the nearest hundredth = /5 = (0.43) = / =5 = $4.19 million = /5 = (24.4) = / =5 = $0.1 million

95 The table shows the running time in minutes for two kinds of movies. Find the mean absolute deviation for each set of data. Round to the nearest hundredth. Then write a few sentences comparing their variation = /5 = = / =5 = minutes = /5 = = / =5 = minutes

96 Find the mean absolute deviation. Round to the nearest hundredth if necessary. Then describe what the mean absolute deviation represents daily visitors = /5 = = /5 = The MAD is large. The average distance from each point is away from the mean is about 18.

97 Find the mean absolute deviation. Round to the nearest hundredth if necessary. Then describe what the mean absolute deviation represents. $0.50 difference in admission prices = /6 = $ = $ $3.00/6 = $0.50 The MAD is small. The difference is only $0.50

98 The table shows the height of waterslides at two different water parks. Find the mean absolute deviation for each set of data. Round to the nearest hundredth. Then write a few sentences comparing their variation /5 = = / =5 = feet = /5 = 89.6 (24.4) = / =5 = feet

99 The water slides at Splash Lagoon are closer together in terms of height. There is less variability in the height at Splash Lagoon when compared to Wild Water Bay

100 Box and Whisker Plots

101 A box-and-whisker plot uses a number line to show the distributions of a data set. To make a box-and-whisker plot, first divide the data into four equal parts using quartiles. The median or 2 nd quartile, divides the data into a lower half and an upper half.

102 The median of the lower half is the lower quartile, and the median of the upper half is the upper quartile.

103 Example: Use the data to make a box-and- whisker plot: Order the data from least to greatest. Calculate/determine the following: Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value:

104 Draw a box from the lower to the upper quartile. Inside the box, draw a vertical line through the median. Then draw the “whiskers” from the box to the least and greatest values. Be sure to title and label your graph. Title Label

105 Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: 31 22, 24, 27, 31, 35, 38, Title Label

106 Measures of Variability

107 Measures of Variability: Is how spread out group of data is. Measures of Variability are range and interquartile range. Inter-quartile range (IQR): This is the difference between the upper quartile and the lower quartile.

108 What is the range for the above data set? 42 – 22 = 20 What is the interquartile range for the above data set? 38 – 24 = 14 Measures of Variation are range, Interquartile range, upper quartile, and lower quartile. 22, 24, 27, 31, 35, 38, 42

109 Practice

110 16, 19, 19, 23, 24, 25, 31, 37, 42, 46, Title Label Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: = = 23

111 26, 27, 28, 29, 30, 32, 36, 38, 40, Points Scored in a Game Points Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: = = 10 Halle’s basketball team scored the following points in their past games. 38, 42, 26, 32, 40, 28, 36, 27, 29, 30

112 4, 8, 9, 10, 10, 12, 12, 12, 15, 18, 20, 21, 24, 25, Books Students read in a year Number of Books Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: = = 11 Kyle is helping your school librarian conduct a survey of how many books students read during the year. He gets the following results: 12, 24, 10, 12, 4, 35, 10, 8, 12, 15, 20, 18, 25, 21, and 9.

113 Describe the center, shape, spread, and outliers of the distribution. The typical student reads about 12 books. There is a slight right skew. The IQR is 11 so there a lot of variability in the number of books read.

114 3, 5, 5, 6, 8, 9, 12, 15, 15, 17, 22, 26, 35, 42, 42, 43, 46, 47, 54, 55 Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: = = 34 Ms. Carpenter asked each of her students to record how much time it takes them to get from school to home this afternoon. The next day, students came back with this data, in minutes: 15, 12, 5, 55, 6, 9, 47, 8, 35, 3, 22, 26, 46, 54, 17, 42, 43, 42, 15, 5. Time to get home from school Minutes

115 18, 20, 22, 24, 24, 25, 28, 29, 30, 30, 32, 35, 38 Range: 3 rd Quartile: , 20, 18, 25, 22, 32, 30, 29, 35, 30, 28, 24, 38 Create a box-and- whisker plot for the data.

116 Comparing Populations (Skipper is Kewl :D)

117 A double box plot consists of two graphed on the same number line. → You can draw inferences about the two populations in the double box plot by comparing their centers and variations.

118 Ian surveyed a different group of students in his science and math classes. The double box plot shows the results for both classes. Compare their centers and variations. Write an inference you can draw bout the two populations. What does the does the plot show? Number of times each class posted a blog this month Is either plot symmetric? No Which measure of center should you use to compare the data? Median (Math: 10; Science: 20) Which measure of variation should you use to compare the data? IQR (Math: 15; Science: 10) Which class posts more blogs? Science Which class has a greater spread of data around the median? Math Use the comparisons to write an inference: Science students posed more blogs than the math class. The median for science is twice the median for math. There is also a greater spread of data around the median for the Math class than the Science class

119 The double dot plot shows the daily high temperatures for two cities for thirteen days. Compare the centers and variations for the two populations. Write an inference you can draw about the two populations. Is either plot symmetric? No Which measure of center should you use? Mean (Springfield: 81; Lake City: 84) Which measure of variation should you use? MAD (Springfield: 1.4; Lake City: 1.4) Use the comparisons to write an inference: Both cities have the same variation or spread around their means. Lake City has a greater mean temperature than Springfield.

120 Reading Box-and- Whisker Plots

121 The students at Dolan Middle School are competing in after-school activities in which they earn points for helping out around the school. Each team consists of the 30 students in a homeroom. Halfway through the competition, here are the scores from the students in two of the teams. The champion team is the one with the most points when the scores of the 30 students on the team are added. Which team would you rather be on? Explain. Is either plot symmetric? No Which measure of center should you use? Median (Team 1: 100; Team 2: ≈115) Which measure of variation should you use? IQR (Team 1: ≈55; Team 2: ≈105) Use the comparisons to write an inference: Team 1 is more consistent and has fewer low scores. Although, Team 2 has a slightly higher median, Team 1 is the better choice

122 Which group has a larger interquartile range? Basketball Which group of players has more predictability in their height? Baseball - Range and IQR is smaller and also symmetric

123 Which shoe store has a greater median? Sage’s Which shoe store has a greater interquartile range? Maroon’s Which shoe store appears to be more predictable in the number of shoes sold per week? Sage’s – the range and IQR are smaller

124 Which golfer has the lower median score? Henry Which golfer has the lesser interquartile range of scores Trish Which golfer appears to be more consistent? Trish – her range and IQR are smaller 1.The table below shows the golf scores for two people. Make two box and whisker plots of the data on the same number line.

125 Now that was easy!


Download ppt "Statistics Statistics deal with collecting, organizing, and interpreting data. A Survey is a method of collecting information. → Surveys use a small."

Similar presentations


Ads by Google