2 Statistics deal with collecting, organizing, and interpreting data. A Survey is a method of collecting information.→ Surveys use a small sample to represent a large population.Populations: the whole group; the group being studied.Sample: part of the population; the group being surveyed.
3 For each survey topic; determine which represents the population and which represents a sample of the population.
5 You can use survey results to predict the actions of a larger group or draw inferences on the entire population.Predictions: A hypothesis made based on survey results or past actions.Inference: A prediction that is made using observations, prior knowledge, and experience.Use proportions to help calculate your predictions and inferences.
6 About 390 students have IPods A survey found that 6 out of 10 students at IMS have an IPod. Predict how many students have IPods if there are 650 students at IMS.celltotal6 .10x .650=About 390 students have IPods6
7 A researcher catches 60 fish from different locations in a lake A researcher catches 60 fish from different locations in a lake. He then tags the fish and puts them back in the lake. Two weeks later, the researcher catches 40 fish from the same locations. 8 of these 40 fish are tagged. Predict the number of fish in the lake.tagtotal60 .x8 .40=About 300 fish7
8 About 540 students have cell phones A middle school has 1,800 students. A random sample of 80 shows that 24 have cell phones. Predict the number of students in the middle school who have cell phones.phonestotalx .180024 .80=About 540 students have cell phones8
9 marked 110 . 11 . total 530 x = About 5,300 fish A tilapia fish hatchery selectively releases fish when the populations have increased beyond a certain target level. In order to estimate the current fish population, workers at the hatchery catch 110 fish and mark them with special paint. Then a little while later, they catch 530 fish, among which 11 are marked. To the nearest whole number, what is the best estimate for the fish population?markedtotal110 .x11 .530=About 5,300 fish9
10 About 750 chips will be defective In a random sample, 3 of 400 computer chips are found to be defective. Based on the sample, about how many chips out of 100,000 would you expect to be defective?defectivetotal3 .400_ x _ .100,000=About 750 chips will be defective10
11 26 . marked 52 . total 190 x = About 380 bees Mali is starting her own beehive so that she can have fresh honey straight from the hive. Mali decides to check the current population of bees in the hive by marking 52 bees with special bee-marking paint. Later, Mali collects 190 bees and observes that 26 of them are marked. To the nearest whole number, what is the best estimate for the bee population?.markedtotal26 .19052 .x=About 380 bees11
12 tagged total 21 . x 14 . 100 = About 150 chipmunks For a research project on rodents, 21 chipmunks were tagged and released. Later, researchers counted 100 chipmunks in the area. Of the chipmunks they counted, 14 had tags. To the nearest whole number, what is the best estimate for the chipmunk population?taggedtotal21 .x14 .100=About 150 chipmunks12
13 marked total 38 . x 24 . 240 = About 380 geckos While studying a gecko population, a group of university scientists marked and released 38 geckos. Later, the group counted a total of 240 geckos, of which 24 were marked. To the nearest whole number, what is the best estimate for the gecko population?markedtotal38 .x24 .240=About 380 geckos13
14 tagged total 110 x 22 . 200 = About 1,000 jackrabbits To determine the jackrabbit population in a wildlife preserve, researchers tagged 110 jackrabbits. Later, they counted 200 jackrabbits. Out of the jackrabbits they counted, 22 had tags. To the nearest whole number, what is the best estimate for the jackrabbit population?taggedtotal110x22 .200=About 1,000 jackrabbits14
16 Biased Sample: A sample that doesn’t truly represent the population. Example: Surveying 6th graders about the height of IMS students.Random Sample: A sample where every member of the population has an equal chance of being picked.Example: Surveying using lockers numbers that end in 2 about the height for IMS students.
17 Practice Problems Tell if each sample is biased or random Practice Problems Tell if each sample is biased or random. Explain your answer.
18 If they are on-time, they are likely satisfied with their experience. An airline surveys passengers from a flight that is on time to determine if passengers on all flights are satisfied.BiasedIf they are on-time, they are likely satisfied with their experience.18
19 A newspaper randomly chooses 100 names from its subscriber database and then surveys those subscribers to find if they read the restaurant reviews.RandomThe names were randomly chosen in such a way that everyone in the population has an equal chance of being picked19
20 The manager of a bookstore sends a survey to 150 customers who were randomly selected from a customer list.RandomThe customers were randomly chosen so everyone in the population has an equal chance of being picked.20
21 A team of researchers’ surveys 200 people at a multiplex movie theater to find out how much money state residents spend on entertainment.BiasedPeople who go to the movies likely spend more money on entertainment then randomly selected people.21
23 Simple Random Sample: An unbiased sample where each item or person in the population is as likely to be chosen as any other.Example: Each students’ name is on a piece of paper in a bowl; names picked without lookingSystematic Random Sample: A sample where the items or people are selected according to a specific time or time interval.Example: Every 20th person is chosen from an alphabetical list of all students attending IMS.
24 Stratified Random Sample: A sample where the population is divided into groups; then choose a certain number at random from each group.Example: Alphabetical list of all students at IMS divided into boys and girls. Then sampling every 20th person from that list.
26 Convenience Sample: A biased sample which consists of members of a population that are easily accessed.Example: Only surveying one math class about IMS students’ favorite letter dayVoluntary Sample: A biased sample which involves only those who want to participate in the sampling.Example: Students at IMS who wish to participate in the survey can fill it out on-line
27 Try the Following Use your knowledge of types of Random and Biased Sampling methods to solve the following problems.
28 To find how much money the average American family spends to cool their home, 100 Alaskan families are surveyed at random. Of the families, 85 said that they spend less than $75 per month on cooling. The researcher concluded that the average American spends less than $75 on cooling per month. Is this conclusion valid? Explain.The conclusion is not valid. This is a biased convenience sample since people in the United States would spend much more than those in Alaska.28
29 A simple Random Sample 42 more people Zach is trying to decide which golf course is the best of three golf courses. He randomly surveyed people at a sports store and recorded the results in the table. Which type of sampling method did Zach use?Suppose Zach surveyed 150 more people. How many people would be expected to vote for Rolling green?A simple Random Sample42 more people29
30 Systematic Random Sample Adults in every 100th household in the phonebook are surveyed about which candidate they plan to vote for. Which type of sampling method is being described?Systematic Random Sample30
31 Use the organizer to determine whether the conclusion is valid. 31
32 A computer program selects telephone numbers at random for a survey on which candidate people plan to vote for. Which type of sampling method is being described?Simple Random Sample32
33 Biased – Voluntary Response Sample The researchers send a mail survey to apple farmers asking them to please record the number of their trees that are infected and send the survey back. Which type of sampling method is being described?Biased – Voluntary Response Sample33
34 Yes it is valid. A Simple Random Sample was used. To determine what people in California think about a proposed law, 5,000 people from the state are randomly surveyed. Of the people surveyed, 5.8% are against the law. The legislature concludes that the law should not be passed. Which type of sampling method is being described? Is this a valid conclusion?Yes it is valid. A Simple Random Sample was used.34
36 Box-and- Whisker Plots Pictographs Histograms Bar Graphs Types of GraphsCircle (Pie) graphsLine PlotsStem-and-leaf plotsBox-and- Whisker PlotsPictographsHistogramsBar GraphsDouble Bar GraphsLine GraphsDouble Line Graphs
37 Pictographs Use pictures. What does this graph represent? How many students play hockey?20How many more students played soccer than hockey?40
38 Histograms This histogram shows: What range occurred the most? Least? Show how often something occurs in equal intervals.This histogram shows:The distance of long jumps at a track meetWhat range occurred the most? Least?5’7” – 6’, 6’7” – 7’How many long jumps were from 5’1” to 6’?25 long jumpsHow many more students jumped 5’7” – 6’ than 5’ – 5’6”?5 students
39 Bar GraphsUse bars of different lengths to display and compare data in specific categories.This bar graph shows:The amount of money raised in a charity walk by each of the grades.Which grade raised twice as much money as 8th?10th gradeHow much more money did 7th grade raise than 8th ?$30
40 Double Bar GraphsUse pairs of bars to compare two sets of categorical dataThis graph compares:Number of Sports & History books in 3 different school librariesWhich school has the greatest difference between sports & history?OakDoes the Maple School have more sports or history books? How many?History books, 11
41 Line Graphs Show a change in data over time. What data does this line graph present?Number of rainy days from May to DecemberBetween which 2 months was there the greatest increase in the number of rainy days?August & September
42 Double Line GraphsUses two lines to compare two sets of data over timeWhat is this double line graph comparing?Temperatures for first ten days of winter for two different yearsOn what day were the temp’s the closest?Day 6On what day were the temp’s the furthest?Day 10
43 Circle GraphsCompare parts of a whole. Each sector, or slice, is one part of the entire data set.This graph compares:The results of Leo’s survey on pet ownershipHow many people do not own pets?15 (50% of 30)How many people have cats?6 people (20% of 30)
44 Number of miles The Lorax ran per day during training Line PlotsA graph that uses x’s and a number line to show frequency of dataHow many days did TheLorax train?18 daysWhich number of milesdid he run most often?least often?5 miles, 2, 8, 16How often did The Lorax run 6 mi?3 daysWhat is the range of the miles?14What is the median miles?5 milesNumber of miles The Lorax ran per day during training
45 Stem-and-Leaf Plots A graph that uses digits of each number to organize anddisplay dataA Stem: represents the left-hand digit of the data valueA Leaf: represents theremaining right-hand digits\What’s the greatest amount oftime spent doing homework?64 minutesHow many students were surveyed?18 studentsHow many students studied for 32 min?2 studentsHow many studied for more than 43 min?7 students
46 Box-and-Whisker Plots Uses a number line to show the distribution of a data set and measures of variation. Also useful for large sets of data.These plots are divided into four parts called: quartilesThe median of the entire data set is the middleThe lower quartile is the median of the lower half of the data setThe upper quartile is the median of the upper half of the data setThe range is the difference between the highest and lowest data pointsThe interquartile range is the difference between the upper and lower quartile
48 Measures of central tendency show what the middle of a data set looks like. The measures of central tendency are the mean, median, and mode.The Range is NOT a measure of central tendency
49 Find the mean, median, mode, and range of the following data set: The ages of Mrs. Long’s grandchildren: 8, 3, 5, 4, 2, 3, 1, and 4.
50 Mean is average. = 3.75 The mean is 3.75 = 30= The mean is 3.75
51 Or largest minus smallest. Range max minus minOr largest minus smallest.List in order: 1, 2, 3, 3, 4, 4, 5, 88 – 1 = 7The range is 7
52 There can be several modes or no mode Mode the number thatoccurs most often.There can be several modesor no modeList in order: 1, 2, 3, 3, 4, 4, 5, 8The mode here is 3 and 4
53 data value when in order. Median is the middledata value when in order.The middle two numbersare 3 and 4List in order: 1, 2, 3, 3, 4, 4, 5, 8The median is 3.5
54 Often one measure of Central Tendency is more appropriate for describing a data set. Think about what each measure tells you about the data.
55 Find the median, mode, mean and range of each data set Find the median, mode, mean and range of each data set. Determine the measure of Central Tendency that best describes the data set.
56 Best measure of center: 6 6, 5, 3, 6, 8List in order: 3, 5, 6, 6, 8Median: 6Mean: 28/5 = 5.6Mode: 6Range: 5Best measure of center: 6(median & mode)56
57 Best measure of center: 11 7, 6, 13, 16, 15, 9List in order: 6, 7, 9, 13, 15, 16Median: 13+9 = 2222/2 =11Mean: 66/6 = 11Range: 10Mode: noneBest measure of center: 11(median & mean)57
58 Best measure of center: 15 (possibly 14) (median and possibly mean) 12, 15, 17, 9, 17List in order: 9, 12, 15, 17, 17Median: 15Mean: 70/5 = 14Range: 8Mode: 17Best measure of center: 15 (possibly 14)(median and possibly mean)58
59 Best measure of center: 62 51, 62, 68, 55, 68, 62List in order: 51, 55, 62, 62, 68, 68Median: 62Mean: 366/6 = 61Range: 17Mode: 62 & 68Best measure of center: 62(median, mean & mode)59
60 Best measure of center: 42 or 43 List in order: 36, 41, 42, 44, 47Median: 43Mean: 210/5 = 42Range: 11Mode: noneBest measure of center: 42 or 43(median or mean)60
61 An outlier is an extreme value – either much less than the lowest value or much greater than the highest value.61
62 Use the data set to answer the questions below: 4, 6, 3, 6, 25, 3, 2 Is there an outlier? If so, what is it?How does the outlier affect the mean and median?Which measure of central tendency is most effected by an outlier in a data set?Which measure of CT bests describes the data? Explain.List in order: 2, 3, 3, 4, 6, 6, 2525YesWith outlier: Median= 4; Mean 49/7 = 7Without outlier: Median= 3.5; Mean: 24/6 = 4Mean!Median – it is not dramatically affected by outliers62
67 What is the difference between the two graphs? Mrs. Long’s SalariesMrs. Long’s SalariesWhat is the difference between the two graphs?Do these two graphs appear to show the same information?Why do you think someone would want to present the same information in different ways?67
77 Why would someone want to mislead you? To make it appear that they are correct.Change the way the data is interpretedTo persuade someoneTo influence an opinion77
78 Ways to Manipulate Statistics Change the values on the x- or y-axis.Do not start the graph at zero.Use different bar widths on a bar graphChange the way you conduct your surveyExample: Survey only 6th graders when you are collecting data on the height of middle school students at IMS.Survey’s should be random.78
80 Graphs let readers analyze data easily, but are sometimes made to influence conclusions by misrepresenting the data.80
81 Explain how the graphs differ. Which graph appears to show a sharper increase in price?Which graph might the Student Council use to show that while ticket prices have risen, the increase is not significant? Why?Graph BThey might use Graph A. The y-axis scale makes the increase appear less significant.81
82 The line graphs show monthly profits of a company from October to March. Which graph suggests that the business is extremely profitable? Is this a valid conclusion? Explain.Although both graphs show a profit, Graph A’s profit increase is exaggerated due to the y-axis scale beginning with $500 intervals and changing to $100 intervals82
83 Statistics can also be used to influence conclusions. An amusement park boasts that the average height of their roller coasters is 170 feet. Explain how this might be misleading.Mean:Median:Mode:850/5 = 170126No modeThe mean has been affected by the outlier of 365, therefore using the average to describe this data set is misleading.83
84 The y-axis scale does not have equal spacing How is this graph misleading?How could you redraw the graph so it would not be misleading?The y-axis scale does not have equal spacingDraw the y-axis scale starting at 0 with equal spacing so that the distance between 0 and 18,000 equal distance between 18,000 and 36,000.
85 How is this graph misleading? How could you redraw the graph so it would not be misleading?The y-axis scale has a break so the differences in jump distances appear greaterDraw the y-axis scale starting at 0 and continuing to 7.5 using equal spacing.
86 How is this graph misleading? How could you redraw the graph so it would not be misleading?The y-axis scale has a break so the differences in fare appear greater.Mon Tue Wed Thu Fri13121110Taxicab FaresFare ($)Draw the y-axis scale starting at 0 and continuing to 7.5 using equal spacing.
87 How is this graph misleading? How could you redraw the graph so it would not be misleading?The y-axis scale does not start at zero so the differences in water consumed seem greater.Draw the y-axis scale starting at 0 and continuing to 48 using equal spacing.Water Consumed4840322416MarkFrankMilaYvonneOunces of Water
89 Mean Absolute Deviation: The average amount each number is away from the mean of a a data set. Step Find the mean.Step Find the absolute value of the difference between each data value and the mean.Step Find the average of those differences.89
90 Find the mean: Find differences of mean and data points: =448Find differences of mean and data points:56 – 52=4 56 – 48= – 60= – 55=156 – 59= – 54 =2 56 – 58= – 62=6Find the mean of the differences:= 3030/8 = 3.75
93 The top five salaries and bottom five salaries for the New York Yankees are shown in the table below. Salaries are in millions of dollars and are rounded to the nearest hundredth.= 2.142.14/5 = (0.43)= 0.070.07/ =5 = 0.014=117.02/5 = (24.4)= 20.9620.96/ =5 = 4.192$0.1 million$4.19 million93
94 The table shows the running time in minutes for two kinds of movies The table shows the running time in minutes for two kinds of movies. Find the mean absolute deviation for each set of data. Round to the nearest hundredth. Then write a few sentences comparing their variation.= 471471/5 = 94.2= 20.820.8/ =5 = 4.16= 664664/5 = 132.8= 61.261.2/ =5 = 12.2412.24 minutes4.16 minutes94
95 Find the mean absolute deviation Find the mean absolute deviation. Round to the nearest hundredth if necessary. Then describe what the mean absolute deviation represents.= 647647/5 = 129.4= 92.492.4/5 = 18.4818.48 daily visitorsThe MAD is large. The average distance from each point is away from the mean is about 18.95
96 $0.50 difference in admission prices Find the mean absolute deviation. Round to the nearest hundredth if necessary. Then describe what the mean absolute deviation represents.= 52.5052.50/6 = $8.75= $3.00$3.00/6 = $0.50$0.50 difference in admission pricesThe MAD is small. The difference is only $0.5096
97 The table shows the height of waterslides at two different water parks The table shows the height of waterslides at two different water parks. Find the mean absolute deviation for each set of data. Round to the nearest hundredth. Then write a few sentences comparing their variation.= 448448/5 = 89.6 (24.4)= 51.651.6/ =5 = 10.32583/5 = 116.6= 62.462.4/ =5 = 12.4810.32 feet12.48 feet97
98 The water slides at Splash Lagoon are closer together in terms of height. There is less variability in the height at Splash Lagoon when compared to Wild Water Bay98
100 A box-and-whisker plot uses a number line to show the distributions of a data set. To make a box-and-whisker plot, first divide the data into four equal parts using quartiles.The median or 2nd quartile, divides the data into a lower half and an upper half.100
101 The median of the lower half is the lower quartile, and the median of the upper half is the upper quartile.101
102 Order the data from least to greatest. Example: Use the data to make a box-and- whisker plot:Order the data from least to greatest.Calculate/determine the following:Median:Lower Quartile:Upper Quartile:Lowest Value:Greatest Value:7468786785102
103 Title Label Draw a box from the lower to the upper quartile. Inside the box, draw a vertical line through the median.Then draw the “whiskers” from the box to the least and greatest values.Be sure to title and label your graph.TitleLabel103
106 Measures of Variability: Is how spread out group of data is. Measures of Variability are range and interquartile range.Inter-quartile range (IQR): This is the difference between the upper quartile and the lower quartile.106
107 22, 24, 27, 31, 35, 38, 42 What is the range for the above data set? 42 – 22 = 20What is the interquartile range for the above data set?38 – 24 = 14Measures of Variation are range, Interquartile range, upper quartile, and lower quartile.22, 24, 27, 31, 35, 38, 42107
110 26, 27, 28, 29, 30, 32, 36, 38, 40, 42 Median: Lower Quartile: Upper Quartile:Lowest Value:Greatest Value:Range:IQR:31Halle’s basketball team scored the following points in their past games.38, 42, 26, 32, 40, 28, 36, 27, 29, 302838262242= 16= 10Points Scored in a GamePoints110
111 Books Students read in a year 4 , 8, 9, 10, 10, 12, 12, 12, 15, 18, 20, 21, 24, 25, 35Median:Lower Quartile:Upper Quartile:Lowest Value:Greatest Value:Range:IQR:12Kyle is helping your school librarian conduct a survey of how many books students read during the year. He gets the following results: 12, 24, 10, 12, 4, 35, 10, 8, 12, 15, 20, 18, 25, 21, and 9.10214223535- 4 = 31= 11Books Students read in a yearNumber of Books111
112 Describe the center, shape, spread, and outliers of the distribution. The typical student reads about 12 books.There is a slight right skew.The IQR is 11 so there a lot of variability in the number of books read.112
113 Time to get home from school 3, 5, 5, 6, 8, 9, 12, 15, 15, 17, 22, 26, 35, 42, 42, 43, 46, 47, 54, 55Median:Lower Quartile:Upper Quartile:Lowest Value:Greatest Value:Range:IQR:19.5Ms. Carpenter asked each of her students to record how much time it takes them to get from school to home this afternoon. The next day, students came back with this data, in minutes: 15, 12, 5, 55, 6, 9, 47, 8, 35, 3, 22, 26, 46, 54, 17, 42, 43, 42, 15, 5.8.542.535555- 3 = 52= 34Time to get home from schoolMinutes113
114 Create a box-and-whisker plot for the data. 18, 20, 22, 24, 24, 25, 28, 29, 30, 30, 32, 35, 38Range:3rd Quartile:19.524, 20, 18, 25, 22, 32, 30, 29, 35, 30, 28, 24, 38Create a box-and-whisker plot for the data.31114
115 Comparing Populations (Skipper is Kewl :D) 115
116 A double box plot consists of two graphed on the same number line. → You can draw inferences about the two populations in the double box plot by comparing their centers and variations.116
117 Ian surveyed a different group of students in his science and math classes. The double box plot shows the results for both classes. Compare their centers and variations. Write an inference you can draw bout the two populations.What does the does the plot show?Number of times each class posted a blog this monthIs either plot symmetric?NoWhich measure of center should you use to compare the data?Median (Math: 10; Science: 20)Which measure of variation should you use to compare the data?IQR (Math: 15; Science: 10)Which class posts more blogs?ScienceWhich class has a greater spread of data around the median?MathUse the comparisons to write an inference:Science students posed more blogs than the math class. The median for science is twice the median for math. There is also a greater spread of data around the median for the Math class than the Science class117
118 The double dot plot shows the daily high temperatures for two cities for thirteen days. Compare the centers and variations for the two populations. Write an inference you can draw about the two populations.Is either plot symmetric?NoWhich measure of center should you use?Mean(Springfield: 81; Lake City: 84)Which measure of variation should you use?MAD(Springfield: 1.4; Lake City: 1.4)Use the comparisons to write an inference:Both cities have the same variation or spread around their means. Lake City has a greater mean temperature than Springfield.118
120 Is either plot symmetric? No Which measure of center should you use? The students at Dolan Middle School are competing in after-school activities in which they earn points for helping out around the school. Each team consists of the 30 students in a homeroom. Halfway through the competition, here are the scores from the students in two of the teams. The champion team is the one with the most points when the scores of the 30 students on the team are added. Which team would you rather be on? Explain.Is either plot symmetric?NoWhich measure of center should you use?Median(Team 1: 100; Team 2: ≈115)Which measure of variation should you use?IQR(Team 1: ≈55; Team 2: ≈105)Use the comparisons to write an inference:Team 1 is more consistent and has fewer low scores. Although, Team 2 has a slightly higher median, Team 1 is the better choice120
121 Which group has a larger interquartile range? Basketball Which group of players has more predictability in their height?Baseball - Range and IQR is smaller and also symmetric121
122 Which shoe store has a greater median? Sage’s Which shoe store has a greater interquartile range?Maroon’sWhich shoe store appears to be more predictable in the number of shoes sold per week?Sage’s – the range and IQR are smaller122
123 The table below shows the golf scores for two people The table below shows the golf scores for two people. Make two box and whisker plots of the data on the same number line.Which golfer has the lower median score?HenryWhich golfer has the lesser interquartile range of scoresTrishWhich golfer appears to be more consistent?Trish – her range and IQR are smaller123