Presentation is loading. Please wait.

Presentation is loading. Please wait.

MAT 110 Workshop Created by Michael Brown, Haden McDonald & Myra Bentley for use by the Center for Academic Support.

Similar presentations


Presentation on theme: "MAT 110 Workshop Created by Michael Brown, Haden McDonald & Myra Bentley for use by the Center for Academic Support."— Presentation transcript:

1 MAT 110 Workshop Created by Michael Brown, Haden McDonald & Myra Bentley for use by the Center for Academic Support

2 Unit 3: Statistics Introduction
Statistics is an area of mathematics in which we are interested in gathering, organizing, analyzing, and making predictions from numerical information called data. The set of all items under consideration is called the population. Often only a sample or subset of the population is considered. We will describe a sample as biased if it does not accurately reflect the population as a whole with regard to the data that we are gathering. Bias could occur because of the way in which we decide how to choose the people to participate in the survey. This is called selection bias. Another issue that can affect the reliability of a survey is the way we ask the questions, which is called leading-question bias.

3 Definitions Mean: The average in a set of data.
Median: The middle number in an ordered list. If there are two middles, the median is the average of those two. Mode: The number(s) that appears the most frequently in a data set. Range: The difference between the largest and smallest values. Standard Deviation: An average measure of how far each data point is from the mean. Normal Distribution: A very common distribution that describes many real life values. The symmetric Bell curve. Mean is typically denoted x-bar or mu Generally, if there are more than two numbers that appear the most frequently there is considered to be no mode in a set of data. Range = (highest) - (lowest) Standard Deviation is usually denoted sigma or s In a normal distribution the mean is in the middle of the bell curve, and is considered to be equal to the mode and median. Z-Score (for a data value, x) = (x - mu) / sigma Confidence Intervals are typically thought of as a “95% confidence interval” where 95% is the likelihood the mean is within the interval, and the extra 5% is the margin of error. Z-Score: The number of standard deviations a value is from the mean. Confidence Interval: A range that is 'likely' to contain the actual mean of a data set. Usually associated with a margin of error. Margin of error: The likelihood that a confidence interval does NOT contain the mean of a data set.

4 Example Calculate the mode, mean, and median of the following data:
12, 11, 6, 24, 11, 9, 15, 11

5 Example 6,9,11,11,11,12,15,24 Mode = 11 because it appears 3 times while all the other numbers only appear once. Mean = 99/8 because =99 and there are 8 numbers Median = 11 because 11 is in the middle of the numbers when placed smallest to largest

6 Frequency Tables A frequency table is a table that shows the total for each category or group of data. •Example: 25 viewers evaluated the latest episode of CSI. The possible evaluations are: (E)xcellent, (A)bove average, a(V)erage, (B)elow average, (P)oor. After the show, the 25 evaluations were as follows: A, V, V, B, P, E, A, E, V, V, A, E, P, B, V, V, A, A, A, E, B, V, A, B, V Construct a frequency table and a relative frequency table for this list of evaluations. A set of data listed with their frequencies is a frequency distribution. We often present a frequency distribution as a frequency table where we list the values in one column and the frequencies of the values in another column. We show the percent of the time that each item occurs in a frequency distribution using a relative frequency distribution. All problems and images came from “Mathematics All Around” What a Data Set Tells Us from Pearson Education, Inc.

7 Frequency Tables After the show, the 25 evaluations were as follows:
E, E, E, E, A, A, A, A, A, A, A, V, V, V, V, V, V, V, V, B, B, B, B, P, P We construct a relative frequency distribution for these data by dividing each frequency in the table by 25. A bar graph is one way to visualize a frequency distribution. In drawing a bar graph, we specify the classes on the horizontal axis and the frequencies on the vertical axis. Bar graphs are used for discrete values like the ratings above, while a histogram is used for continuous values. All problems and images came from “Mathematics All Around” What a Data Set Tells Us from Pearson Education, Inc.

8 Representing Data Visually
Solution: The bar graph for the relative frequency is shown below.

9 Frequency Tables •Example: Suppose 40 health care workers take an AIDS awareness test and earn the following scores: 79, 62, 87, 84, 53, 76, 67, 73, 82, 68, 82, 79, 61, 51, 66, 77, 78, 66, 86, 70, 76, 64, 87, 82, 61, 59, 77, 88, 80, 58, 56, 64, 83, 71, 74, 79, 67, 79, 84, 68 Construct a frequency table and a relative frequency table for these data. Frequency tables can be constructed from discrete ratings as in the previous example, but can also be constructed from ranges of continuous values. All problems and images came from “Mathematics All Around” What a Data Set Tells Us from Pearson Education, Inc.

10 79, 62, 87, 84, 53, 76, 67, 73, 82, 68, 82, 79, 61, 51, 66, 77, 78, 66, 86, 70, 76, 64, 87, 82, 61, 59, 77, 88, 80, 58, 56, 64, 83, 71, 74, 79, 67, 79, 84, 68 Frequency Tables Here an interval of 5 was chosen for each range. It is important that the ranges do not overlap. All problems and images came from “Mathematics All Around” What a Data Set Tells Us from Pearson Education, Inc.

11 Representing Data Visually
A variable quantity that cannot take on arbitrary values is called discrete. Other quantities, called continuous variables, can take on arbitrary values. The number of children in a family is an example of a discrete variable. Weight is an example of a continuous variable. We use a special type of bar graph called a histogram to graph a frequency distribution when we are dealing with a continuous variable quantity or a variable quantity that is discrete, but has a very large number of different possible values.

12 Representing Data Visually
Example: A clinic has the following data regarding the weight lost by its clients over the past 6 months. Draw a histogram for the relative frequency distribution for these data. (continued on next slide)

13 Representing Data Visually
Solution: We first find the relative frequency distribution. (continued on next slide)

14 Representing Data Visually
Draw the histogram exactly like a bar graph except that we do not allow spaces between the bars.

15 Stem and Leaf Display Example: The following are the number of home runs hit by the home run champions in the National League for the years 1975 to 1989 and for 1993 to 2007. 1975–1989: 38, 38, 52, 40, 48, 48, 31, 37, 40, , 37, 37, 49, 39, 47 1993–2007: 46, 43, 40, 47, 49, 70, 65, 50, 73, , 47, 48, 51, 58, 50 Compare these home run records using a stem-and-leaf display. (continued on next slide)

16 Stem and Leaf Display 1975 to 1989 1993 to 2007
Solution: In constructing a stem-and- leaf display, we view each number as having two parts. The left digit is considered the stem and the right digit the leaf. For example, 38 has a stem of 3 and a leaf of 8. 1975 to 1989 1993 to 2007 (continued on next slide)

17 Stem and Leaf Display We can compare these data by placing these two displays side by side as shown below. Some call this display a back-to-back stem-and-leaf display. It is clear that the home run champions hit significantly more home runs from 1993 to 2007 than from 1975 to 1989.

18 The Mean and the Median We use the Greek letter Σ (capital sigma) to indicate a sum. For example, we will write the sum of the data values 7, 2, 9, 4, and 10 by Σx = We represent the mean of a sample of a population by x (read as “x bar”), and we will use the Greek letter μ (lowercase mu) to represent the mean of the whole population.

19 The Mean and the Median Example: A car company has been studying its safety record at a factory and found that the number of accidents over the past 5 years was 25, 23, 27, 22, and 26. Find the mean annual number of accidents for this 5-year period. Solution: We add the number of accidents and divide by 5.

20 The Mean and the Median Example: The water temperature at a point downstream from a plant for the last 30 days is summarized in the table. What is the mean temperature for this distribution? (continued on next slide)

21 The Mean and the Median The mean is
Solution: A third column is added to the table that contains the products of the raw scores and their frequencies. The mean is

22 The Mean and the Median

23 The Mean and the Median Example: Listed are the yearly earnings of some celebrities. a) What is the mean of the earnings of the celebrities on this list? b) Is this mean an accurate measure of the “average” earnings for these celebrities? (continued on next slide)

24 The Mean and the Median Solution (a): Summing the salaries and dividing by 10 gives us Solution (b): Eight of the celebrities have earnings below the mean, whereas only two have earnings above the mean. The mean in this example does not give an accurate sense of what is “average” in this set of data because it was unduly influenced by higher earnings.

25 The Mean and the Median

26 The Mean and the Median Example: The table lists the ages at inauguration of the presidents who assumed office between 1901 and Find the median age for this distribution. Solution: We first arrange the ages in order to get There are 17 ages. The middle age is the ninth, which is 55.

27 The Mean and the Median Example: Fifty 32-ounce quarts of a particular brand of milk were purchased and the actual volume determined. The results of this survey are reported in the table. What is the median for this distribution?

28 The Mean and the Median Solution: Because the 50 scores are in increasing order, the two middle scores are in positions 25 and 26. We see that 29 ounces is in position 25 and 30 ounces is in position 26. The median for this distribution is

29 Five Number Summary All problems and images came from “Mathematics All Around” What a Data Set Tells Us from Pearson Education, Inc.

30 The Five Number Summary
Example: Consider the list of ages of the presidents from a previous example: 42, 43, 46, 51, 51, 51, 52, 54, 55, 55, 56, 56, 60, 61, 61, 64, 69. Find the following for this data set: a) the lower and upper halves b) the first and third quartiles c) the five-number summary (continued on next slide)

31 The Five Number Summary
Solution: (a): Finding the median, we can identify the lower and upper halves. (b): The median of the lower half is The median of the upper half is (continued on next slide)

32 The Five Number Summary
(c): The five number summary is We represent the five-number summary by a graph called a box-and-whisker plot. (continued on next slide)

33 Example Find the five-number summary for the following 10 values:
40, 37, 32, 28, 27, 24, 22, 34, 19, 36 Find the minimum: Find Q1: Find the median: Find Q3: Find the maximum:

34 Example 19,22,24,27,28,32,34,36,37,40 minimum: 19 because that is the smallest number Q1: 24 because it is in the middle of the minimum and median median: 30 because it is in the very middle of the numbers. ((28+32)/2=30) Q3: 36 because it is in the middle of the median and the maximum maximum: 40 because it is the largest number

35 The Five Number Summary

36 The Five Number Summary
Example: Find the mode for each data set. a) 5, 5, 68, 69, 70 b) 3, 3, 3, 2, 1, 4, 4, 9, 9, 9 c) 98, 99, 100, 101, 102 d) 2, 3, 4, 2, 3, 4, 5 Solution: a) The mode is 5. b) There are two modes: 3 and 9. In c) and d) there is no mode.

37 Comparing Measures of Central Tendency
Example: Assume that you are negotiating the contract for your union. You have gathered annual wage data and found that three workers earn $30,000, five workers earn $32,000, three workers earn $44,000, and one worker earns $50,000. In your negotiations, which measure of central tendency should you emphasize?

38 Comparing Measures of Central Tendency
Solution: Mode: $32,000 Median: $32,000 Mean: The mean is $36,000. To make the salaries appear as low as possible, you would want to use the mode and median.

39 The Range of a Data Set

40 Standard Deviation

41 Standard Deviation

42 Standard Deviation

43 Standard Deviation Example: A company has hired six interns. After 4 months, their work records show the following number of work days missed for each worker: 0, 2, 1, 4, 2, 3 Find the standard deviation of this data set. Solution: Mean: (continued on next slide)

44 Standard Deviation We calculate the squares of the deviations of the data values from the mean. Standard Deviation:

45 Standard Deviation

46 Standard Deviation Example: The following are the closing prices for a stock for the past 20 trading sessions: 37, 39, 39, 40, 40, 38, 38, 39, 40, 41, 41, 39, 41, 42, 42, 44, 39, 40, 40, 41 What is the standard deviation for this data set? Solution: Mean: (sum of the closing prices is 800) (continued on next slide)

47 Standard Deviation We create a table with values that will facilitate computing the standard deviation. Standard Deviation:

48 Standard Deviation Comparing Standard Deviations
All three distributions have a mean and median of 5; however, as the spread of the distribution increases, so does the standard deviation.

49 The Normal Distribution
The normal distribution describes many real-life data sets. The histogram shown gives an idea of the shape of a normal distribution.

50 The Normal Distribution

51 The Normal Distribution
We represent the mean by μ and the standard deviation by σ.

52 The Normal Distribution
Example: Suppose that the distribution of scores of 1,000 students who take a standardized intelligence test is a normal distribution. If the distribution’s mean is 450 and its standard deviation is 25, a) how many scores do we expect to fall between 425 and 475? b) how many scores do we expect to fall above 500? (continued on next slide)

53 The Normal Distribution
Solution (a): 425 and 475 are each 1 standard deviation from the mean. Approximately 68% of the scores lie within 1 standard deviation of the mean. We expect about 0.68 × 1,000 = 680 scores are in the range 425 to 475. (continued on next slide)

54 The Normal Distribution
Solution (b): We know 5% of the scores lie more than 2 standard deviations above or below the mean, so we expect to have 0.05 ÷ 2 = of the scores to be above 500. Multiplying by 1,000, we can expect that * 1,000 = 25 scores to be above 500.

55 Quartile Problem The scores of students on an exam are normally distributed with a mean of 516 and a standard deviation of 36. A) What is the first quartile score for this exam? B) What is the third quartile score for this exam?

56 Quartile Problem The Quartiles have 25% of the data on either side so we can use the area to find the Z-Score which is A) x= -0.67* so the first quartile is at B) x = 0.67* so the third quartile is at

57 z-Scores The standard normal distribution has a mean of 0 and a standard deviation of 1. There are tables (see next slide) that give the area under this curve between the mean and a number called a z-score. A z-score represents the number of standard deviations a data value is from the mean. For example, for a normal distribution with mean 450 and standard deviation 25, the value 500 is 2 standard deviations above the mean; that is, the value 500 corresponds to a z-score of 2.

58 z-Scores Below is a portion of a table that gives the area under the standard normal curve between the mean and a z-score.

59 z-Scores a) between z = 0 and z = 1.3 b) between z = 1.5 and z = 2.1
Example: Use a table to find the percentage of the data (area under the curve) that lie in the following regions for a standard normal distribution: a) between z = 0 and z = 1.3 b) between z = 1.5 and z = 2.1 c) between z = 0 and z = –1.83 (continued on next slide)

60 z-Scores Solution (a): The area under the curve between z = 0 and z = 1.3 is shown. Using a table we find this area for the z-score We find that A is 0.403 when z = We expect 40.3%, of the data to fall between 0 and 1.3 standard deviations above the mean. (continued on next slide)

61 z-Scores Solution (b): The area under the curve between z = 1.5 and z = 2.1 is shown. We first find the area from z = 0 to z = 2.1 and then subtract the area from z = 0 to z = 1.5. Using a table we get A = when z = 2.1, and A = when z = 1.5. The area is – = or 4.9% (continued on next slide)

62 z-Scores Solution (c): Due to the symmetry of the normal distribution, the area between z = 0 and z = –1.83 is the same as the area between z = 0 and z = 1.83. Using a table, we see that A = when z = Therefore, 46.6% of the data values lie between 0 and –1.83.

63 Converting Raw Scores to z-Scores

64 Converting Raw Scores to z-Scores
Example: Suppose the mean of a normal distribution is 20 and its standard deviation is 3. a) Find the z-score corresponding to the raw score 25. b) Find the z-score corresponding to the raw score 16. (continued on next slide)

65 Converting Raw Scores to z-Scores
Solution (a): We have We compute (continued on next slide)

66 Converting Raw Scores to z-Scores
Solution (b): We have We compute

67 Applications Example: Suppose you take a standardized test. Assume that the distribution of scores is normal and you received a score of 72 on the test, which had a mean of 65 and a standard deviation of 4. What percentage of those who took this test had a score below yours? Solution: We first find the z-score that corresponds to 72. (continued on next slide)

68 Applications Using a table, we have that A = when z = The normal curve is symmetric, so another 50% of the scores fall below the mean. So, there are 50% + 46% = 96% of the scores below 72. (continued on next slide)

69 Applications 1911: Ty Cobb hit .420. Mean average was .266
Example: Consider the following information: 1911: Ty Cobb hit Mean average was .266 with standard deviation 1941: Ted Williams hit Mean average was .267 with standard deviation 1980: George Brett hit Mean average was .261 with standard deviation Assuming normal distributions, use z-scores to determine which of the three batters was ranked the highest in relationship to his contemporaries. (continued on next slide)

70 Applications Ty Cobb’s average of .420 corresponded to a z-score of
Solution: Ty Cobb’s average of .420 corresponded to a z-score of Ted Williams’s average of .406 corresponded to a z-score of George Brett’s average of .390 corresponded to a z-score of Compared with his contemporaries, Ted Williams ranks as the best hitter.

71 Applications Example: A manufacturer plans to offer a warranty on an electronic device. Quality control engineers found that the device has a mean time to failure of 3,000 hours with a standard deviation of 500 hours. Assume that the typical purchaser will use the device for 4 hours per day. If the manufacturer does not want more than 5% to be returned as defective within the warranty period, how long should the warranty period be to guarantee this? (continued on next slide)

72 Applications Solution: We need to find a z-score such that at least 95% of the area is beyond this point. This score is to the left of the mean and is negative. By symmetry we find the z-score such that 95% of the area is below this score. (continued on next slide)

73 Applications 50% of the entire area lies below the mean, so our problem reduces to finding a z-score greater than 0 such that 45% of the area lies between the mean and that z-score. If A = 0.450, the corresponding z-score is % of the area underneath the standard normal curve falls below z = By symmetry, 95% of the values lie above –1.64. Since , we obtain (continued on next slide)

74 Applications Solving the equation for x, we get
Owners use the device about 4 hours per day, so we divide 2,180 by 4 to get 545 days. This is approximately 18 months if we use 31 days per month. The warranty should be for roughly 18 months.

75 Find the z-score such that:
Right and Left Z-Score Find the z-score such that: A) The area under the standard normal curve to its left is 0.518 B) The area under the standard normal curve to its left is C) The area under the standard normal curve to its right is D) The area under the standard normal curve to its right is

76 Right and Left Z-Score A) = look this up in the table to get 0.04 B) = look this up in the table to get 0.91 C) = look this up in the table to get 0.56 D) = look this up in the table to get 0.36

77 Practice Problem Length of skateboards in a skateshop are normally distributed with a mean of 30.9 in and a standard deviation of 1 in. The figure below shows the distribution of the length of skateboards in a skateshop. Calculate the shaded area under the curve. Express your answer in decimal form with at least two decimal place accuracy.

78 Practice Problem There is an area of .475 on the right side of the curve but we must use the Z-Score formula to find the area on the left side. z = − = -.67 The area at Z-Score of -.67 is So the total area is = .7236

79 Confidence Intervals A level C confidence interval is a range that is C% likely to contain the population mean of a set of data based on a sample mean (a 95% confidence interval based on sample data would be 95% likely to contain the population mean that the sample came from). The formula for the lower and upper bounds of a confidence interval is: Where the term on the left is the sample average, and the term on the right is referred to as the margin of error. All problems and images came from “Mathematics All Around” What a Data Set Tells Us from Pearson Education, Inc.

80 Confidence Intervals •Example: Suppose that the distribution of scores of 100 students who take a standardized intelligence test is a normal distribution. If the distribution’s sample mean is 90 and its standard deviation is 10, what is a 95% confidence interval for the population mean? Here, the z-score is related to an area equal to half the confidence, i.e. z is related to .95/2 = .475. Locating this area in a z-score table will yield that z = 1.96. The left end of the interval is: * 10 / sqrt(100) = 88.04 The right end of the interval is: * 10 / sqrt(100) = 91.96 So the 95% confidence interval is (88.04, 91.96), OR we are 95% confident the population mean is between and All problems and images came from “Mathematics All Around” What a Data Set Tells Us from Pearson Education, Inc.

81 Critical Values Z* To find the critical Z* value you need to get the confidence interval in decimal form. After than you divide the confidence interval by 2 to get the area. Use your chart to find the area closest to your value and that is your critical Z*

82 Critical Z Problem Find the critical z* for a level 51 % confidence interval.

83 Your Critical Z* value is 0.69
Critical Z* Solution 51/100 = .51 .51/2 = .255 Find the closest value to .255 (which is .2549). Your Critical Z* value is 0.69

84 Representing Data Exercises
Create a frequency and relative frequency table for the following set of numbers. Calculate the mean, median, mode, and standard deviation of the data. 78 70 92 81 75 90 74 99 76 79 80 86 96

85 Representing Data Solutions
Create a frequency and relative frequency table for the following set of numbers. Calculate the mean, median, mode, and standard deviation of the data. Mean = 82 Median = 79 Mode = 76, 78 Std. Dev. = 8.61 Range Frequency Relative Frequency 70-74 2 (70, 74) 2/15 = .1333 75-79 6 (75, 76, 76, 78, 78, 79) 6/15 = .4 80-84 2 (80, 81) 85-89 1 (86) 1/15 = .0667 90-94 2 (90, 92) 95-100 2 (96, 99) Construct the table, preferably using range, here ranges of 5 were chosen. The numbers in parenthesis are just to verify that the frequencies are correct and do not need to be there. Since 15 numbers were chosen, for relative frequency divide each frequency by 15. Calculating mean, by the formula, yields Mean = 1230 / 15 = 82 The median must be the 8th number, as there are 15 numbers. The 8th number is 79. 76 and 78 both show up twice while every other number shows up once. The modes are 76 and 78. The std. dev is more involved, but by the formula yields 8.61

86 Normal Distribution Exercises
Suppose 200 students took a test, and their scores were approximately normally distributed. The mean of the test scores was 82 and the standard deviation was 9. How many students got at least a 73? How many students got more than 95? What would a 95% confidence interval for this population be?

87 Normal Distribution Solutions
200 students took a test. The mean was 82 and the std. dev. was 9. (a) How many students got at least a 73? (b) More than 95? (c) What would a 95% confidence interval for this population be? .84 or 84% (168 students) .075 or 7.5% (15 students) (80.75, 83.25) 73 is one standard deviation from the mean, so 34% of the students should have scored between 73 and 82. Since the data is normally distributed, half the students should have scored higher than the mean, 82. So 34% scored between 73 and 82, and 50% scored better than % of students (or 168 students) scored better than 73. Calculate a z-score, where z = ( )/9 = 1.44, look up the associated area for 1.44, where area = Note this area is between the mean and the score, for students who scored more than 95, take .5 - area = or 7.49% (or nearly 15 students). For the confidence interval, work backwards from 95% or, .95. Divide it by 2 = .475, find the z-score corresponding to an area of .475, z = Now construct the interval. Left point = * 9 / sqrt(200) = Right point = * 9 / sqrt(200) = The interval is (80.75, 83.25).


Download ppt "MAT 110 Workshop Created by Michael Brown, Haden McDonald & Myra Bentley for use by the Center for Academic Support."

Similar presentations


Ads by Google