Presentation on theme: "Statistics Statistics deal with collecting, organizing, and interpreting data. A Survey is a method of collecting information. → Surveys use a small."— Presentation transcript:
Statistics deal with collecting, organizing, and interpreting data. A Survey is a method of collecting information. → Surveys use a small sample to represent a large population. Populations: the whole group; the group being studied. Sample: part of the population; the group being surveyed.
For each survey topic; determine which represents the population and which represents a sample of the population.
Making Predictions and Drawing Inferences
You can use survey results to predict the actions of a larger group or draw inferences on the entire population. Predictions: A hypothesis made based on survey results or past actions. Inference: A prediction that is made using observations, prior knowledge, and experience. Use proportions to help calculate your predictions and inferences.
A survey found that 6 out of 10 students at IMS have an IPod. Predict how many students have IPods if there are 650 students at IMS. About 390 students have IPods cell total x. 650 =
A researcher catches 60 fish from different locations in a lake. He then tags the fish and puts them back in the lake. Two weeks later, the researcher catches 40 fish from the same locations. 8 of these 40 fish are tagged. Predict the number of fish in the lake. About 300 fish tag total 60. x =
A middle school has 1,800 students. A random sample of 80 shows that 24 have cell phones. Predict the number of students in the middle school who have cell phones. About 540 students have cell phones phones total x =
A tilapia fish hatchery selectively releases fish when the populations have increased beyond a certain target level. In order to estimate the current fish population, workers at the hatchery catch 110 fish and mark them with special paint. Then a little while later, they catch 530 fish, among which 11 are marked. To the nearest whole number, what is the best estimate for the fish population? About 5,300 fish marked total 110. x =
In a random sample, 3 of 400 computer chips are found to be defective. Based on the sample, about how many chips out of 100,000 would you expect to be defective? About 750 chips will be defective defective total _ x _. 100,000 =
Mali is starting her own beehive so that she can have fresh honey straight from the hive. Mali decides to check the current population of bees in the hive by marking 52 bees with special bee-marking paint. Later, Mali collects 190 bees and observes that 26 of them are marked. To the nearest whole number, what is the best estimate for the bee population?. About 380 bees marked total x =
For a research project on rodents, 21 chipmunks were tagged and released. Later, researchers counted 100 chipmunks in the area. Of the chipmunks they counted, 14 had tags. To the nearest whole number, what is the best estimate for the chipmunk population? About 150 chipmunks tagged total 21. x =
While studying a gecko population, a group of university scientists marked and released 38 geckos. Later, the group counted a total of 240 geckos, of which 24 were marked. To the nearest whole number, what is the best estimate for the gecko population? About 380 geckos marked total 38. x =
To determine the jackrabbit population in a wildlife preserve, researchers tagged 110 jackrabbits. Later, they counted 200 jackrabbits. Out of the jackrabbits they counted, 22 had tags. To the nearest whole number, what is the best estimate for the jackrabbit population? About 1,000 jackrabbits tagged total 110 x =
Biased Sample: A sample that doesn’t truly represent the population. Example: Surveying 6 th graders about the height of IMS students. Random Sample: A sample where every member of the population has an equal chance of being picked. Example: Surveying using lockers numbers that end in 2 about the height for IMS students.
Practice Problems Tell if each sample is biased or random. Explain your answer.
An airline surveys passengers from a flight that is on time to determine if passengers on all flights are satisfied. Biased If they are on-time, they are likely satisfied with their experience.
A newspaper randomly chooses 100 names from its subscriber database and then surveys those subscribers to find if they read the restaurant reviews. Random The names were randomly chosen in such a way that everyone in the population has an equal chance of being picked
The manager of a bookstore sends a survey to 150 customers who were randomly selected from a customer list. Random The customers were randomly chosen so everyone in the population has an equal chance of being picked.
A team of researchers’ surveys 200 people at a multiplex movie theater to find out how much money state residents spend on entertainment. Biased People who go to the movies likely spend more money on entertainment then randomly selected people.
Types of Random Sampling
Simple Random Sample: An unbiased sample where each item or person in the population is as likely to be chosen as any other. Example: Each students’ name is on a piece of paper in a bowl; names picked without looking Systematic Random Sample: A sample where the items or people are selected according to a specific time or time interval. Example: Every 20 th person is chosen from an alphabetical list of all students attending IMS.
Stratified Random Sample: A sample where the population is divided into groups; then choose a certain number at random from each group. Example: Alphabetical list of all students at IMS divided into boys and girls. Then sampling every 20 th person from that list.
Types of Biased Sampling
Convenience Sample: A biased sample which consists of members of a population that are easily accessed. Example: Only surveying one math class about IMS students’ favorite letter day Voluntary Sample: A biased sample which involves only those who want to participate in the sampling. Example: Students at IMS who wish to participate in the survey can fill it out on-line
Try the Following Use your knowledge of types of Random and Biased Sampling methods to solve the following problems.
To find how much money the average American family spends to cool their home, 100 Alaskan families are surveyed at random. Of the families, 85 said that they spend less than $75 per month on cooling. The researcher concluded that the average American spends less than $75 on cooling per month. Is this conclusion valid? Explain. The conclusion is not valid. This is a biased convenience sample since people in the United States would spend much more than those in Alaska.
Zach is trying to decide which golf course is the best of three golf courses. He randomly surveyed people at a sports store and recorded the results in the table. Which type of sampling method did Zach use? Suppose Zach surveyed 150 more people. How many people would be expected to vote for Rolling green? A simple Random Sample 42 more people
Adults in every 100 th household in the phonebook are surveyed about which candidate they plan to vote for. Which type of sampling method is being described? Systematic Random Sample
Use the organizer to determine whether the conclusion is valid.
A computer program selects telephone numbers at random for a survey on which candidate people plan to vote for. Which type of sampling method is being described? Simple Random Sample
The researchers send a mail survey to apple farmers asking them to please record the number of their trees that are infected and send the survey back. Which type of sampling method is being described? Biased – Voluntary Response Sample
To determine what people in California think about a proposed law, 5,000 people from the state are randomly surveyed. Of the people surveyed, 5.8% are against the law. The legislature concludes that the law should not be passed. Which type of sampling method is being described? Is this a valid conclusion? Yes it is valid. A Simple Random Sample was used.
Types of Graphs First, what is a graph?
Types of Graphs ► Pictographs ► Histograms ► Bar Graphs ► Double Bar Graphs ► Line Graphs ► Double Line Graphs ► Circle (Pie) graphs ► Line Plots ► Stem-and-leaf plots ► Box-and- Whisker Plots
Pictographs ► Use pictures. ► What does this graph represent? ► How many students play hockey? ◦ 20 ► How many more students played soccer than hockey? ◦ 40
Histograms ► Show how often something occurs in equal intervals. ► This histogram shows: ◦ The distance of long jumps at a track meet ► What range occurred the most? Least? ◦ 5’7” – 6’, 6’7” – 7’ ► How many long jumps were from 5’1” to 6’? ◦ 25 long jumps ► How many more students jumped 5’7” – 6’ than 5’ – 5’6”? ◦ 5 students
Bar Graphs Bar Graphs ► Use bars of different lengths to display and compare data in specific categories. ► This bar graph shows: ◦ The amount of money raised in a charity walk by each of the grades. ► Which grade raised twice as much money as 8 th ? ◦ 10 th grade ► How much more money did 7 th grade raise than 8 th ? ◦ $30
Double Bar Graphs Double Bar Graphs ► Use pairs of bars to compare two sets of categorical data ► This graph compares: ◦ Number of Sports & History books in 3 different school libraries ► Which school has the greatest difference between sports & history? ◦ Oak ► Does the Maple School have more sports or history books? How many? ◦ History books, 11
Line Graphs ► Show a change in data over time. ► What data does this line graph present? ◦ Number of rainy days from May to December ► Between which 2 months was there the greatest increase in the number of rainy days? ◦ August & September
Double Line Graphs Double Line Graphs ► Uses two lines to compare two sets of data over time ► What is this double line graph comparing? ◦ Temperatures for first ten days of winter for two different years ► On what day were the temp’s the closest? ◦ Day 6 ► On what day were the temp’s the furthest? ◦ Day 10
Circle Graphs Circle Graphs ► Compare parts of a whole. Each sector, or slice, is one part of the entire data set. ► This graph compares: ◦ The results of Leo’s survey on pet ownership ► How many people do not own pets? ◦ 15 (50% of 30) ► How many people have cats? ◦ 6 people (20% of 30)
Line Plots Line Plots ► A graph that uses x’s and a number line to show frequency of data ► How many days did The Lorax train? ◦ 18 days ► Which number of miles did he run most often? least often? ◦ 5 miles, 2, 8, 16 ► How often did The Lorax run 6 mi? ◦ 3 days ► What is the range of the miles? ◦ 14 ► What is the median miles? ◦ 5 miles Number of miles The Lorax ran per day during training
Stem-and-Leaf Plots Stem-and-Leaf Plots ► A graph that uses digits of each number to organize and display data ► A Stem: represents the left- hand digit of the data value ► A Leaf: represents the remaining right-hand digits \ ► What’s the greatest amount of time spent doing homework? ◦ 64 minutes ► How many students were surveyed? ◦ 18 students ► How many students studied for 32 min? ◦ 2 students ► How many studied for more than 43 min? ◦ 7 students
Box-and-Whisker Plots Box-and-Whisker Plots ► Uses a number line to show the distribution of a data set and measures of variation. Also useful for large sets of data. ► These plots are divided into four parts called: quartiles ► The median of the entire data set is the middle ► The lower quartile is the median of the lower half of the data set ► The upper quartile is the median of the upper half of the data set ► The range is the difference between the highest and lowest data points ► The interquartile range is the difference between the upper and lower quartile
Measures of Central Tendency
Measures of central tendency show what the middle of a data set looks like. The measures of central tendency are the mean, median, and mode. The Range is NOT a measure of central tendency
Find the mean, median, mode, and range of the following data set: The ages of Mrs. Long’s grandchildren: 8, 3, 5, 4, 2, 3, 1, and 4.
Mean is average = 30 = 3.75 The mean is 3.75
Range max minus min Or largest minus smallest. List in order: 1, 2, 3, 3, 4, 4, 5, 8 8 – 1 = 7 The range is 7
Mode the number that occurs most often. There can be several modes or no mode List in order: 1, 2, 3, 3, 4, 4, 5, 8 The mode here is 3 and 4
Median is the middle data value when in order. The middle two numbers are 3 and 4 List in order: 1, 2, 3, 3, 4, 4, 5, 8 The median is 3.5
Often one measure of Central Tendency is more appropriate for describing a data set. Think about what each measure tells you about the data.
Find the median, mode, mean and range of each data set. Determine the measure of Central Tendency that best describes the data set.
6, 5, 3, 6, 8 List in order: 3, 5, 6, 6, 8 Median: 6Mean: 28/5 = 5.6 Best measure of center: 6 (median & mode) Mode: 6Range: 5
7, 6, 13, 16, 15, 9 List in order: 6, 7, 9, 13, 15, 16 Median: 13+9 = 22 22/2 =11 Mean: 66/6 = 11 Best measure of center: 11 (median & mean) Mode: none Range: 10
12, 15, 17, 9, 17 List in order: 9, 12, 15, 17, 17 Median: 15 Mean: 70/5 = 14 Best measure of center: 15 (possibly 14) (median and possibly mean) Mode: 17 Range: 8
51, 62, 68, 55, 68, 62 List in order: 51, 55, 62, 62, 68, 68 Median: 62 Mean: 366/6 = 61 Best measure of center: 62 (median, mean & mode) Mode: 62 & 68 Range: 17
List in order: 36, 41, 42, 44, 47 Median: 43 Mean: 210/5 = 42 Best measure of center: 42 or 43 (median or mean) Mode: none Range: 11
An outlier is an extreme value – either much less than the lowest value or much greater than the highest value.
Use the data set to answer the questions below: 4, 6, 3, 6, 25, 3, 2 Is there an outlier? If so, what is it? How does the outlier affect the mean and median? Which measure of central tendency is most effected by an outlier in a data set? Which measure of CT bests describes the data? Explain. List in order: 2, 3, 3, 4, 6, 6, 25 With outlier: Median= 4; Mean 49/7 = 7 Without outlier: Median= 3.5; Mean: 24/6 = 4 Yes 25 Mean! Median – it is not dramatically affected by outliers
What does misleading mean? To lead in the wrong direction. To manipulate statistics without lying. Misleading = Dishonesty To intentionally deceive someone.
Mrs. Long’s Salaries
What is the difference between the two graphs? Do these two graphs appear to show the same information? Why do you think someone would want to present the same information in different ways?
Key = or = 5 pets Which pet is most popular?
What is misleading about this bar graph?
What eye color is the most frequent?
Why would someone want to mislead you? To make it appear that they are correct. Change the way the data is interpreted To persuade someone To influence an opinion
Ways to Manipulate Statistics Change the values on the x- or y-axis. Do not start the graph at zero. Use different bar widths on a bar graph Change the way you conduct your survey ◦ Example: Survey only 6 th graders when you are collecting data on the height of middle school students at IMS. ◦ Survey’s should be random.
Try the examples in your notes..
Graphs let readers analyze data easily, but are sometimes made to influence conclusions by misrepresenting the data.
Explain how the graphs differ. ◦ Which graph appears to show a sharper increase in price? ◦ Which graph might the Student Council use to show that while ticket prices have risen, the increase is not significant? Why? Graph B They might use Graph A. The y-axis scale makes the increase appear less significant.
The line graphs show monthly profits of a company from October to March. Which graph suggests that the business is extremely profitable? Is this a valid conclusion? Explain. Although both graphs show a profit, Graph A’s profit increase is exaggerated due to the y-axis scale beginning with $500 intervals and changing to $100 intervals
Statistics can also be used to influence conclusions. An amusement park boasts that the average height of their roller coasters is 170 feet. Explain how this might be misleading. ◦ Mean: ◦ Median: ◦ Mode: The mean has been affected by the outlier of 365, therefore using the average to describe this data set is misleading. 850/5 = No mode
How is this graph misleading? How could you redraw the graph so it would not be misleading? The y-axis scale does not have equal spacing Draw the y-axis scale starting at 0 with equal spacing so that the distance between 0 and 18,000 equal distance between 18,000 and 36,000.
How is this graph misleading? How could you redraw the graph so it would not be misleading? The y-axis scale has a break so the differences in jump distances appear greater Draw the y-axis scale starting at 0 and continuing to 7.5 using equal spacing.
Mon Tue WedThu Fri Taxicab Fares Fare ($) How is this graph misleading? How could you redraw the graph so it would not be misleading? Draw the y-axis scale starting at 0 and continuing to 7.5 using equal spacing. The y-axis scale has a break so the differences in fare appear greater.
Water Consumed Mark FrankMilaYvonne Ounces of Water How is this graph misleading? How could you redraw the graph so it would not be misleading? The y-axis scale does not start at zero so the differences in water consumed seem greater. Draw the y-axis scale starting at 0 and continuing to 48 using equal spacing.
Mean Absolute Deviation
Mean Absolute Deviation: The average amount each number is away from the mean of a a data set. Step 1. Find the mean. Step 2. Find the absolute value of the difference between each data value and the mean. Step 3. Find the average of those differences.
1.Find the mean: =448 2.Find differences of mean and data points: 56 – 52=4 56 – 48=8 56 – 60=4 56 – 55=1 56 – 59=3 56 – 54 =2 56 – 58=2 56 – 62=6 Find the mean of the differences: = 30 30/8 = 3.75
The top five salaries and bottom five salaries for the 2010 New York Yankees are shown in the table below. Salaries are in millions of dollars and are rounded to the nearest hundredth = /5 = (0.43) = / =5 = $4.19 million = /5 = (24.4) = / =5 = $0.1 million
The table shows the running time in minutes for two kinds of movies. Find the mean absolute deviation for each set of data. Round to the nearest hundredth. Then write a few sentences comparing their variation = /5 = = / =5 = minutes = /5 = = / =5 = minutes
Find the mean absolute deviation. Round to the nearest hundredth if necessary. Then describe what the mean absolute deviation represents daily visitors = /5 = = /5 = The MAD is large. The average distance from each point is away from the mean is about 18.
Find the mean absolute deviation. Round to the nearest hundredth if necessary. Then describe what the mean absolute deviation represents. $0.50 difference in admission prices = /6 = $ = $ $3.00/6 = $0.50 The MAD is small. The difference is only $0.50
The table shows the height of waterslides at two different water parks. Find the mean absolute deviation for each set of data. Round to the nearest hundredth. Then write a few sentences comparing their variation /5 = = / =5 = feet = /5 = 89.6 (24.4) = / =5 = feet
The water slides at Splash Lagoon are closer together in terms of height. There is less variability in the height at Splash Lagoon when compared to Wild Water Bay
Box and Whisker Plots
A box-and-whisker plot uses a number line to show the distributions of a data set. To make a box-and-whisker plot, first divide the data into four equal parts using quartiles. The median or 2 nd quartile, divides the data into a lower half and an upper half.
The median of the lower half is the lower quartile, and the median of the upper half is the upper quartile.
Example: Use the data to make a box-and- whisker plot: Order the data from least to greatest. Calculate/determine the following: Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value:
Draw a box from the lower to the upper quartile. Inside the box, draw a vertical line through the median. Then draw the “whiskers” from the box to the least and greatest values. Be sure to title and label your graph. Title Label
Measures of Variability: Is how spread out group of data is. Measures of Variability are range and interquartile range. Inter-quartile range (IQR): This is the difference between the upper quartile and the lower quartile.
What is the range for the above data set? 42 – 22 = 20 What is the interquartile range for the above data set? 38 – 24 = 14 Measures of Variation are range, Interquartile range, upper quartile, and lower quartile. 22, 24, 27, 31, 35, 38, 42
26, 27, 28, 29, 30, 32, 36, 38, 40, Points Scored in a Game Points Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: = = 10 Halle’s basketball team scored the following points in their past games. 38, 42, 26, 32, 40, 28, 36, 27, 29, 30
4, 8, 9, 10, 10, 12, 12, 12, 15, 18, 20, 21, 24, 25, Books Students read in a year Number of Books Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: = = 11 Kyle is helping your school librarian conduct a survey of how many books students read during the year. He gets the following results: 12, 24, 10, 12, 4, 35, 10, 8, 12, 15, 20, 18, 25, 21, and 9.
Describe the center, shape, spread, and outliers of the distribution. The typical student reads about 12 books. There is a slight right skew. The IQR is 11 so there a lot of variability in the number of books read.
3, 5, 5, 6, 8, 9, 12, 15, 15, 17, 22, 26, 35, 42, 42, 43, 46, 47, 54, 55 Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: = = 34 Ms. Carpenter asked each of her students to record how much time it takes them to get from school to home this afternoon. The next day, students came back with this data, in minutes: 15, 12, 5, 55, 6, 9, 47, 8, 35, 3, 22, 26, 46, 54, 17, 42, 43, 42, 15, 5. Time to get home from school Minutes
A double box plot consists of two graphed on the same number line. → You can draw inferences about the two populations in the double box plot by comparing their centers and variations.
Ian surveyed a different group of students in his science and math classes. The double box plot shows the results for both classes. Compare their centers and variations. Write an inference you can draw bout the two populations. What does the does the plot show? Number of times each class posted a blog this month Is either plot symmetric? No Which measure of center should you use to compare the data? Median (Math: 10; Science: 20) Which measure of variation should you use to compare the data? IQR (Math: 15; Science: 10) Which class posts more blogs? Science Which class has a greater spread of data around the median? Math Use the comparisons to write an inference: Science students posed more blogs than the math class. The median for science is twice the median for math. There is also a greater spread of data around the median for the Math class than the Science class
The double dot plot shows the daily high temperatures for two cities for thirteen days. Compare the centers and variations for the two populations. Write an inference you can draw about the two populations. Is either plot symmetric? No Which measure of center should you use? Mean (Springfield: 81; Lake City: 84) Which measure of variation should you use? MAD (Springfield: 1.4; Lake City: 1.4) Use the comparisons to write an inference: Both cities have the same variation or spread around their means. Lake City has a greater mean temperature than Springfield.
Reading Box-and- Whisker Plots
The students at Dolan Middle School are competing in after-school activities in which they earn points for helping out around the school. Each team consists of the 30 students in a homeroom. Halfway through the competition, here are the scores from the students in two of the teams. The champion team is the one with the most points when the scores of the 30 students on the team are added. Which team would you rather be on? Explain. Is either plot symmetric? No Which measure of center should you use? Median (Team 1: 100; Team 2: ≈115) Which measure of variation should you use? IQR (Team 1: ≈55; Team 2: ≈105) Use the comparisons to write an inference: Team 1 is more consistent and has fewer low scores. Although, Team 2 has a slightly higher median, Team 1 is the better choice
Which group has a larger interquartile range? Basketball Which group of players has more predictability in their height? Baseball - Range and IQR is smaller and also symmetric
Which shoe store has a greater median? Sage’s Which shoe store has a greater interquartile range? Maroon’s Which shoe store appears to be more predictable in the number of shoes sold per week? Sage’s – the range and IQR are smaller
Which golfer has the lower median score? Henry Which golfer has the lesser interquartile range of scores Trish Which golfer appears to be more consistent? Trish – her range and IQR are smaller 1.The table below shows the golf scores for two people. Make two box and whisker plots of the data on the same number line.