# Statistics.

## Presentation on theme: "Statistics."— Presentation transcript:

Statistics

Statistics deal with collecting, organizing, and interpreting data.
A Survey is a method of collecting information. → Surveys use a small sample to represent a large population. Populations: the whole group; the group being studied. Sample: part of the population; the group being surveyed.

For each survey topic; determine which represents the population and which represents a sample of the population.

Making Predictions and Drawing Inferences

You can use survey results to predict the actions of a larger group or draw inferences on the entire population. Predictions: A hypothesis made based on survey results or past actions. Inference: A prediction that is made using observations, prior knowledge, and experience. Use proportions to help calculate your predictions and inferences.

A survey found that 6 out of 10 students at IMS have an IPod. Predict how many students have IPods if there are 650 students at IMS. cell total 6 . 10 x . 650 = About 390 students have IPods 6

A researcher catches 60 fish from different locations in a lake
A researcher catches 60 fish from different locations in a lake. He then tags the fish and puts them back in the lake. Two weeks later, the researcher catches 40 fish from the same locations. 8 of these 40 fish are tagged. Predict the number of fish in the lake. tag total 60 . x 8 . 40 = About 300 fish 7

About 540 students have cell phones
A middle school has 1,800 students. A random sample of 80 shows that 24 have cell phones. Predict the number of students in the middle school who have cell phones. phones total x . 1800 24 . 80 = About 540 students have cell phones 8

marked 110 . 11 . total 530 x = About 5,300 fish
A tilapia fish hatchery selectively releases fish when the populations have increased beyond a certain target level. In order to estimate the current fish population, workers at the hatchery catch 110 fish and mark them with special paint. Then a little while later, they catch 530 fish, among which 11 are marked. To the nearest whole number, what is the best estimate for the fish population? marked total 110 . x 11 . 530 = About 5,300 fish 9

About 750 chips will be defective
In a random sample, 3 of 400 computer chips are found to be defective. Based on the sample, about how many chips out of 100,000 would you expect to be defective? defective total 3 . 400 _ x _ . 100,000 = About 750 chips will be defective 10

26 . marked 52 . total 190 x = About 380 bees
Mali is starting her own beehive so that she can have fresh honey straight from the hive. Mali decides to check the current population of bees in the hive by marking 52 bees with special bee-marking paint. Later, Mali collects 190 bees and observes that 26 of them are marked. To the nearest whole number, what is the best estimate for the bee population?. marked total 26 . 190 52 . x = About 380 bees 11

tagged total 21 . x 14 . 100 = About 150 chipmunks
For a research project on rodents, 21 chipmunks were tagged and released. Later, researchers counted 100 chipmunks in the area. Of the chipmunks they counted, 14 had tags. To the nearest whole number, what is the best estimate for the chipmunk population? tagged total 21 . x 14 . 100 = About 150 chipmunks 12

marked total 38 . x 24 . 240 = About 380 geckos
While studying a gecko population, a group of university scientists marked and released 38 geckos. Later, the group counted a total of 240 geckos, of which 24 were marked. To the nearest whole number, what is the best estimate for the gecko population? marked total 38 . x 24 . 240 = About 380 geckos 13

tagged total 110 x 22 . 200 = About 1,000 jackrabbits
To determine the jackrabbit population in a wildlife preserve, researchers tagged 110 jackrabbits. Later, they counted 200 jackrabbits. Out of the jackrabbits they counted, 22 had tags. To the nearest whole number, what is the best estimate for the jackrabbit population? tagged total 110 x 22 . 200 = About 1,000 jackrabbits 14

Sampling

Biased Sample: A sample that doesn’t truly represent the population.
Example: Surveying 6th graders about the height of IMS students. Random Sample: A sample where every member of the population has an equal chance of being picked. Example: Surveying using lockers numbers that end in 2 about the height for IMS students.

Practice Problems Tell if each sample is biased or random
Practice Problems Tell if each sample is biased or random. Explain your answer.

If they are on-time, they are likely satisfied with their experience.
An airline surveys passengers from a flight that is on time to determine if passengers on all flights are satisfied. Biased If they are on-time, they are likely satisfied with their experience. 18

A newspaper randomly chooses 100 names from its subscriber database and then surveys those subscribers to find if they read the restaurant reviews. Random The names were randomly chosen in such a way that everyone in the population has an equal chance of being picked 19

The manager of a bookstore sends a survey to 150 customers who were randomly selected from a customer list. Random The customers were randomly chosen so everyone in the population has an equal chance of being picked. 20

A team of researchers’ surveys 200 people at a multiplex movie theater to find out how much money state residents spend on entertainment. Biased People who go to the movies likely spend more money on entertainment then randomly selected people. 21

Types of Random Sampling

Simple Random Sample: An unbiased sample where each item or person in the population is as likely to be chosen as any other. Example: Each students’ name is on a piece of paper in a bowl; names picked without looking Systematic Random Sample: A sample where the items or people are selected according to a specific time or time interval. Example: Every 20th person is chosen from an alphabetical list of all students attending IMS.

Stratified Random Sample: A sample where the population is divided into groups; then choose a certain number at random from each group. Example: Alphabetical list of all students at IMS divided into boys and girls. Then sampling every 20th person from that list.

Types of Biased Sampling

Convenience Sample: A biased sample which consists of members of a population that are easily accessed. Example: Only surveying one math class about IMS students’ favorite letter day Voluntary Sample: A biased sample which involves only those who want to participate in the sampling. Example: Students at IMS who wish to participate in the survey can fill it out on-line

Try the Following Use your knowledge of types of Random and Biased Sampling methods to solve the following problems.

To find how much money the average American family spends to cool their home, 100 Alaskan families are surveyed at random. Of the families, 85 said that they spend less than \$75 per month on cooling. The researcher concluded that the average American spends less than \$75 on cooling per month. Is this conclusion valid? Explain. The conclusion is not valid. This is a biased convenience sample since people in the United States would spend much more than those in Alaska. 28

A simple Random Sample 42 more people
Zach is trying to decide which golf course is the best of three golf courses. He randomly surveyed people at a sports store and recorded the results in the table. Which type of sampling method did Zach use? Suppose Zach surveyed 150 more people. How many people would be expected to vote for Rolling green? A simple Random Sample 42 more people 29

Systematic Random Sample
Adults in every 100th household in the phonebook are surveyed about which candidate they plan to vote for. Which type of sampling method is being described? Systematic Random Sample 30

Use the organizer to determine whether the conclusion is valid.
31

A computer program selects telephone numbers at random for a survey on which candidate people plan to vote for. Which type of sampling method is being described? Simple Random Sample 32

Biased – Voluntary Response Sample
The researchers send a mail survey to apple farmers asking them to please record the number of their trees that are infected and send the survey back. Which type of sampling method is being described? Biased – Voluntary Response Sample 33

Yes it is valid. A Simple Random Sample was used.
To determine what people in California think about a proposed law, 5,000 people from the state are randomly surveyed. Of the people surveyed, 5.8% are against the law. The legislature concludes that the law should not be passed. Which type of sampling method is being described? Is this a valid conclusion? Yes it is valid. A Simple Random Sample was used. 34

Types of Graphs First, what is a graph?

Box-and- Whisker Plots Pictographs Histograms Bar Graphs
Types of Graphs Circle (Pie) graphs Line Plots Stem-and-leaf plots Box-and- Whisker Plots Pictographs Histograms Bar Graphs Double Bar Graphs Line Graphs Double Line Graphs

Pictographs Use pictures. What does this graph represent?
How many students play hockey? 20 How many more students played soccer than hockey? 40

Histograms This histogram shows: What range occurred the most? Least?
Show how often something occurs in equal intervals. This histogram shows: The distance of long jumps at a track meet What range occurred the most? Least? 5’7” – 6’, 6’7” – 7’ How many long jumps were from 5’1” to 6’? 25 long jumps How many more students jumped 5’7” – 6’ than 5’ – 5’6”? 5 students

Bar Graphs Use bars of different lengths to display and compare data in specific categories. This bar graph shows: The amount of money raised in a charity walk by each of the grades. Which grade raised twice as much money as 8th? 10th grade How much more money did 7th grade raise than 8th ? \$30

Double Bar Graphs Use pairs of bars to compare two sets of categorical data This graph compares: Number of Sports & History books in 3 different school libraries Which school has the greatest difference between sports & history? Oak Does the Maple School have more sports or history books? How many? History books, 11

Line Graphs Show a change in data over time.
What data does this line graph present? Number of rainy days from May to December Between which 2 months was there the greatest increase in the number of rainy days? August & September

Double Line Graphs Uses two lines to compare two sets of data over time What is this double line graph comparing? Temperatures for first ten days of winter for two different years On what day were the temp’s the closest? Day 6 On what day were the temp’s the furthest? Day 10

Circle Graphs Compare parts of a whole. Each sector, or slice, is one part of the entire data set. This graph compares: The results of Leo’s survey on pet ownership How many people do not own pets? 15 (50% of 30) How many people have cats? 6 people (20% of 30)

Number of miles The Lorax ran per day during training
Line Plots A graph that uses x’s and a number line to show frequency of data How many days did The Lorax train? 18 days Which number of miles did he run most often? least often? 5 miles, 2, 8, 16 How often did The Lorax run 6 mi? 3 days What is the range of the miles? 14 What is the median miles? 5 miles Number of miles The Lorax ran per day during training

Stem-and-Leaf Plots A graph that uses digits of
each number to organize and display data A Stem: represents the left- hand digit of the data value A Leaf: represents the remaining right-hand digits \ What’s the greatest amount of time spent doing homework? 64 minutes How many students were surveyed? 18 students How many students studied for 32 min? 2 students How many studied for more than 43 min? 7 students

Box-and-Whisker Plots
Uses a number line to show the distribution of a data set and measures of variation. Also useful for large sets of data. These plots are divided into four parts called: quartiles The median of the entire data set is the middle The lower quartile is the median of the lower half of the data set The upper quartile is the median of the upper half of the data set The range is the difference between the highest and lowest data points The interquartile range is the difference between the upper and lower quartile

Measures of Central Tendency

Measures of central tendency show what the middle of a data set looks like.
The measures of central tendency are the mean, median, and mode. The Range is NOT a measure of central tendency

Find the mean, median, mode, and range of the following data set: The ages of Mrs. Long’s grandchildren: 8, 3, 5, 4, 2, 3, 1, and 4.

Mean is average. = 3.75 The mean is 3.75
= 30 = The mean is 3.75

Or largest minus smallest.
Range max minus min Or largest minus smallest. List in order: 1, 2, 3, 3, 4, 4, 5, 8 8 – 1 = 7 The range is 7

There can be several modes or no mode
Mode the number that occurs most often. There can be several modes or no mode List in order: 1, 2, 3, 3, 4, 4, 5, 8 The mode here is 3 and 4

data value when in order.
Median is the middle data value when in order. The middle two numbers are 3 and 4 List in order: 1, 2, 3, 3, 4, 4, 5, 8 The median is 3.5

Often one measure of Central Tendency is more appropriate for describing a data set. Think about what each measure tells you about the data.

Find the median, mode, mean and range of each data set
Find the median, mode, mean and range of each data set. Determine the measure of Central Tendency that best describes the data set.

Best measure of center: 6
6, 5, 3, 6, 8 List in order: 3, 5, 6, 6, 8 Median: 6 Mean: 28/5 = 5.6 Mode: 6 Range: 5 Best measure of center: 6 (median & mode) 56

Best measure of center: 11
7, 6, 13, 16, 15, 9 List in order: 6, 7, 9, 13, 15, 16 Median: 13+9 = 22 22/2 =11 Mean: 66/6 = 11 Range: 10 Mode: none Best measure of center: 11 (median & mean) 57

Best measure of center: 15 (possibly 14) (median and possibly mean)
12, 15, 17, 9, 17 List in order: 9, 12, 15, 17, 17 Median: 15 Mean: 70/5 = 14 Range: 8 Mode: 17 Best measure of center: 15 (possibly 14) (median and possibly mean) 58

Best measure of center: 62
51, 62, 68, 55, 68, 62 List in order: 51, 55, 62, 62, 68, 68 Median: 62 Mean: 366/6 = 61 Range: 17 Mode: 62 & 68 Best measure of center: 62 (median, mean & mode) 59

Best measure of center: 42 or 43
List in order: 36, 41, 42, 44, 47 Median: 43 Mean: 210/5 = 42 Range: 11 Mode: none Best measure of center: 42 or 43 (median or mean) 60

An outlier is an extreme value – either much less than the lowest value or much greater than the highest value. 61

Use the data set to answer the questions below: 4, 6, 3, 6, 25, 3, 2
Is there an outlier? If so, what is it? How does the outlier affect the mean and median? Which measure of central tendency is most effected by an outlier in a data set? Which measure of CT bests describes the data? Explain. List in order: 2, 3, 3, 4, 6, 6, 25 25 Yes With outlier: Median= 4; Mean 49/7 = 7 Without outlier: Median= 3.5; Mean: 24/6 = 4 Mean! Median – it is not dramatically affected by outliers 62

To lead in the wrong direction. To manipulate statistics without lying. Misleading = Dishonesty To intentionally deceive someone. 64

Mrs. Long’s Salaries 65

Mrs. Long’s Salaries 66

What is the difference between the two graphs?
Mrs. Long’s Salaries Mrs. Long’s Salaries What is the difference between the two graphs? Do these two graphs appear to show the same information? Why do you think someone would want to present the same information in different ways? 67

Ways To Manipulate A Graph 68

69

70

Which pet is most popular?
Pets Dog Horse Fish Key = or = 5 pets 71

72

73

What eye color is the most frequent?
74

75

76

Why would someone want to mislead you?
To make it appear that they are correct. Change the way the data is interpreted To persuade someone To influence an opinion 77

Ways to Manipulate Statistics
Change the values on the x- or y-axis. Do not start the graph at zero. Use different bar widths on a bar graph Change the way you conduct your survey Example: Survey only 6th graders when you are collecting data on the height of middle school students at IMS. Survey’s should be random. 78

Try the examples in your notes..

Graphs let readers analyze data easily, but are sometimes made to influence conclusions by misrepresenting the data. 80

Explain how the graphs differ.
Which graph appears to show a sharper increase in price? Which graph might the Student Council use to show that while ticket prices have risen, the increase is not significant? Why? Graph B They might use Graph A. The y-axis scale makes the increase appear less significant. 81

The line graphs show monthly profits of a company from October to March. Which graph suggests that the business is extremely profitable? Is this a valid conclusion? Explain. Although both graphs show a profit, Graph A’s profit increase is exaggerated due to the y-axis scale beginning with \$500 intervals and changing to \$100 intervals 82

Statistics can also be used to influence conclusions.
An amusement park boasts that the average height of their roller coasters is 170 feet. Explain how this might be misleading. Mean: Median: Mode: 850/5 = 170 126 No mode The mean has been affected by the outlier of 365, therefore using the average to describe this data set is misleading. 83

The y-axis scale does not have equal spacing
How is this graph misleading? How could you redraw the graph so it would not be misleading? The y-axis scale does not have equal spacing Draw the y-axis scale starting at 0 with equal spacing so that the distance between 0 and 18,000 equal distance between 18,000 and 36,000.

How could you redraw the graph so it would not be misleading? The y-axis scale has a break so the differences in jump distances appear greater Draw the y-axis scale starting at 0 and continuing to 7.5 using equal spacing.

How could you redraw the graph so it would not be misleading? The y-axis scale has a break so the differences in fare appear greater. Mon Tue Wed Thu Fri 13 12 11 10 Taxicab Fares Fare (\$) Draw the y-axis scale starting at 0 and continuing to 7.5 using equal spacing.

How could you redraw the graph so it would not be misleading? The y-axis scale does not start at zero so the differences in water consumed seem greater. Draw the y-axis scale starting at 0 and continuing to 48 using equal spacing. Water Consumed 48 40 32 24 16 Mark Frank Mila Yvonne Ounces of Water

Mean Absolute Deviation
88

Mean Absolute Deviation: The average amount each number is away from the mean of a a data set.
Step Find the mean. Step Find the absolute value of the difference between each data value and the mean. Step Find the average of those differences. 89

Find the mean: Find differences of mean and data points:
=448 Find differences of mean and data points: 56 – 52=4 56 – 48= – 60= – 55=1 56 – 59= – 54 =2 56 – 58= – 62=6 Find the mean of the differences: = 30 30/8 = 3.75

Try one on your own 91

=512 512/8 = 64 64 – 58=6 64 – 88= – 40= – 60=4 64 – 72=8 64 – 66 =2 64 – 80=16 64 – 48=12 = 100 100/8 = 12.5

The top five salaries and bottom five salaries for the New York Yankees are shown in the table below. Salaries are in millions of dollars and are rounded to the nearest hundredth. = 2.14 2.14/5 = (0.43) = 0.07 0.07/ =5 = 0.014 = 117.02/5 = (24.4) = 20.96 20.96/ =5 = 4.192 \$0.1 million \$4.19 million 93

The table shows the running time in minutes for two kinds of movies
The table shows the running time in minutes for two kinds of movies. Find the mean absolute deviation for each set of data. Round to the nearest hundredth. Then write a few sentences comparing their variation. = 471 471/5 = 94.2 = 20.8 20.8/ =5 = 4.16 = 664 664/5 = 132.8 = 61.2 61.2/ =5 = 12.24 12.24 minutes 4.16 minutes 94

Find the mean absolute deviation
Find the mean absolute deviation. Round to the nearest hundredth if necessary. Then describe what the mean absolute deviation represents. = 647 647/5 = 129.4 = 92.4 92.4/5 = 18.48 18.48 daily visitors The MAD is large. The average distance from each point is away from the mean is about 18. 95

Find the mean absolute deviation. Round to the nearest hundredth if necessary. Then describe what the mean absolute deviation represents. = 52.50 52.50/6 = \$8.75 = \$3.00 \$3.00/6 = \$0.50 \$0.50 difference in admission prices The MAD is small. The difference is only \$0.50 96

The table shows the height of waterslides at two different water parks
The table shows the height of waterslides at two different water parks. Find the mean absolute deviation for each set of data. Round to the nearest hundredth. Then write a few sentences comparing their variation. = 448 448/5 = 89.6 (24.4) = 51.6 51.6/ =5 = 10.32 583/5 = 116.6 = 62.4 62.4/ =5 = 12.48 10.32 feet 12.48 feet 97

The water slides at Splash Lagoon are closer together in terms of height. There is less variability in the height at Splash Lagoon when compared to Wild Water Bay 98

Box and Whisker Plots 99

A box-and-whisker plot uses a number line to show the distributions of a data set.
To make a box-and-whisker plot, first divide the data into four equal parts using quartiles. The median or 2nd quartile, divides the data into a lower half and an upper half. 100

The median of the lower half is the lower quartile, and the median of the upper half is the upper quartile. 101

Order the data from least to greatest.
Example: Use the data to make a box-and- whisker plot: Order the data from least to greatest. Calculate/determine the following: Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: 74 68 78 67 85 102

Title Label Draw a box from the lower to the upper quartile.
Inside the box, draw a vertical line through the median. Then draw the “whiskers” from the box to the least and greatest values. Be sure to title and label your graph. Title Label 103

Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: 31 22, 24, 27, 31, 35, 38, 42 24 38 22 42 Title Label 104

Measures of Variability
105

Measures of Variability: Is how spread out group of data is.
Measures of Variability are range and interquartile range. Inter-quartile range (IQR): This is the difference between the upper quartile and the lower quartile. 106

22, 24, 27, 31, 35, 38, 42 What is the range for the above data set?
42 – 22 = 20 What is the interquartile range for the above data set? 38 – 24 = 14 Measures of Variation are range, Interquartile range, upper quartile, and lower quartile. 22, 24, 27, 31, 35, 38, 42 107

Practice 108

16, 19, 19, 23, 24, 25, 31, 37, 42, 46, 47 Median: Lower Quartile:
Upper Quartile: Lowest Value: Greatest Value: Range: IQR: 25 19 42 16 22 47 = 31 = 23 Title Label 109

26, 27, 28, 29, 30, 32, 36, 38, 40, 42 Median: Lower Quartile:
Upper Quartile: Lowest Value: Greatest Value: Range: IQR: 31 Halle’s basketball team scored the following points in their past games. 38, 42, 26, 32, 40, 28, 36, 27, 29, 30 28 38 26 22 42 = 16 = 10 Points Scored in a Game Points 110

Books Students read in a year
4 , 8, 9, 10, 10, 12, 12, 12, 15, 18, 20, 21, 24, 25, 35 Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: 12 Kyle is helping your school librarian conduct a survey of how many books students read during the year. He gets the following results: 12, 24, 10, 12, 4, 35, 10, 8, 12, 15, 20, 18, 25, 21, and 9. 10 21 4 22 35 35- 4 = 31 = 11 Books Students read in a year Number of Books 111

Describe the center, shape, spread, and outliers of the distribution.
The typical student reads about 12 books. There is a slight right skew. The IQR is 11 so there a lot of variability in the number of books read. 112

Time to get home from school
3, 5, 5, 6, 8, 9, 12, 15, 15, 17, 22, 26, 35, 42, 42, 43, 46, 47, 54, 55 Median: Lower Quartile: Upper Quartile: Lowest Value: Greatest Value: Range: IQR: 19.5 Ms. Carpenter asked each of her students to record how much time it takes them to get from school to home this afternoon. The next day, students came back with this data, in minutes: 15, 12, 5, 55, 6, 9, 47, 8, 35, 3, 22, 26, 46, 54, 17, 42, 43, 42, 15, 5. 8.5 42.5 3 55 55- 3 = 52 = 34 Time to get home from school Minutes 113

Create a box-and-whisker plot for the data.
18, 20, 22, 24, 24, 25, 28, 29, 30, 30, 32, 35, 38 Range: 3rd Quartile: 19.5 24, 20, 18, 25, 22, 32, 30, 29, 35, 30, 28, 24, 38 Create a box-and-whisker plot for the data. 31 114

Comparing Populations (Skipper is Kewl :D)
115

A double box plot consists of two graphed on the same number line.
→ You can draw inferences about the two populations in the double box plot by comparing their centers and variations. 116

Ian surveyed a different group of students in his science and math classes. The double box plot shows the results for both classes. Compare their centers and variations. Write an inference you can draw bout the two populations. What does the does the plot show? Number of times each class posted a blog this month Is either plot symmetric? No Which measure of center should you use to compare the data? Median (Math: 10; Science: 20) Which measure of variation should you use to compare the data? IQR (Math: 15; Science: 10) Which class posts more blogs? Science Which class has a greater spread of data around the median? Math Use the comparisons to write an inference: Science students posed more blogs than the math class. The median for science is twice the median for math. There is also a greater spread of data around the median for the Math class than the Science class 117

The double dot plot shows the daily high temperatures for two cities for thirteen days. Compare the centers and variations for the two populations. Write an inference you can draw about the two populations. Is either plot symmetric? No Which measure of center should you use? Mean (Springfield: 81; Lake City: 84) Which measure of variation should you use? MAD (Springfield: 1.4; Lake City: 1.4) Use the comparisons to write an inference: Both cities have the same variation or spread around their means. Lake City has a greater mean temperature than Springfield. 118

119

Is either plot symmetric? No Which measure of center should you use?
The students at Dolan Middle School are competing in after-school activities in which they earn points for helping out around the school. Each team consists of the 30 students in a homeroom. Halfway through the competition, here are the scores from the students in two of the teams. The champion team is the one with the most points when the scores of the 30 students on the team are added. Which team would you rather be on? Explain. Is either plot symmetric? No Which measure of center should you use? Median (Team 1: 100; Team 2: ≈115) Which measure of variation should you use? IQR (Team 1: ≈55; Team 2: ≈105) Use the comparisons to write an inference: Team 1 is more consistent and has fewer low scores. Although, Team 2 has a slightly higher median, Team 1 is the better choice 120

Which group has a larger interquartile range? Basketball
Which group of players has more predictability in their height? Baseball - Range and IQR is smaller and also symmetric 121

Which shoe store has a greater median? Sage’s
Which shoe store has a greater interquartile range? Maroon’s Which shoe store appears to be more predictable in the number of shoes sold per week? Sage’s – the range and IQR are smaller 122

The table below shows the golf scores for two people
The table below shows the golf scores for two people. Make two box and whisker plots of the data on the same number line. Which golfer has the lower median score? Henry Which golfer has the lesser interquartile range of scores Trish Which golfer appears to be more consistent? Trish – her range and IQR are smaller 123

Now that was easy!

Similar presentations