Presentation on theme: "Statistical Reasoning For Communication Majors. Mean ► This is a common statistic, and it’s simple. ► When we refer to the “average,” this is usually."— Presentation transcript:
Mean ► This is a common statistic, and it’s simple. ► When we refer to the “average,” this is usually what we mean. ► Add the values and divide by the number of values you have.
Mean Example ► A weekly newspaper has seven employees. What’s the mean salary? Here are their salaries: ► Editor -- $37,000 ► Assistant Editor -- $32,000 ► Reporter -- $28,000 ► Ad Sales Manager -- $38,000 ► Ad Sales Agent -- $31,000 ► 2 Circulation People -- $22,000 each
Mean Example ► Calculation: Add 37,000 + 32,000 + 28,000 + 38,000 + 31,000 + 22,000 + 22,000 = 210,000. Then divide by 7 = Mean salary is $30,000 ► NOTE: Mean can be deceptive if there is a wide spread in the numbers. For example, if the editor and ad sales manager made $60,000 each, the sales agent made $40,000, and each of the other workers made $12,500, the mean would be the same, but the picture of the average salary at the newspaper would be much different.
Median ► The median means the middle. ► It is the value in the dead center of the list of values when they are lined up from largest to smallest. ► It represents the average person or group. For example, if we say “the average household” or “the average worker,” then what we are looking for is the median, as in “ordinary” or “most common.” We aren’t really talking about the “average” or mean.
Median Example ► Consider the newsroom salaries used in the previous example lined up from largest to smallest: 38,000, 37,000, 32,000, 31,000, 28,000, 22,000, 22,000. ► The salary in the middle, the “median,” is $31,000. ► If the halfway lies between two numbers, split them.
Percent Change ► If the city increased parking fines from $10 to $15, by what percentage did the fines increase? ► This is simple, too. Subtract the old value from the new value (15-10=5), then divide by the old value (5/10=0.5). Multiply the result by 100 (0.5x100 = 50 percent ) and that’s the percent change. ► 15-10=5 5/10=0.5 0.5x100 = 50 percent.
Tax Example ► If the average property tax increased by $2,000 a year (We’re using median here to find $2,000), what is the average percent change? ► New value = $10,000 ► Old value = $8,000 ► 10,000 – 8,000 = 2,000 ► 2,000/8,000 =.25 ► 100x.25 = 25 percent ► So the percent change is +25 percent
Per capita, Rates and Comparisons ► Per capita refers to the rate per person. It helps make comparisons among large groups, like cities. ► To get per capita, simply divide the number of incidents by the number of people. ► A Southern city with a population of 450,000 experienced 16 murders during 2009. What is the city’s murder rate per 100,000 population? ► 450,000/100,000 = 4.5 16/4.5 = 3.5 per 100,000
Per capita example ► If a city has a population of 600,000 and experiences 12 murders a year, the per capita murder rate would be 12 divided by 600,000. ► To avoid tiny decimals, divide 600,000 by 100,000 and report the rate as a number per 100,000 population. ► 600,000/100,000 = 6 12/6 = 2, so the murder rate is 2 per 100,000 people. ► You can also find the percent change of the per capita rate over time to discover the trend in the murder rate.
Comparison Example ► Suppose you want to know how dangerous the city is compared to other cities. Our example city has a population of 600,000 with 12 murders. A nearby city has 26,000 and 4 murders. Which is more dangerous? Find the per capita murder rate of each to know. ► Per capita rate for City 1 is 2 per 100,000; per capita rate for City 2 is 4 per 26,000. City 2 is more dangerous because it has 15.4 murders per 100,000 (4/.26 = 15.38) people compared to City 1’s 2 murders per 100,000.
Standard Deviation ► In most situations, most people or values will group toward the middle. ► Those that don’t are different. ► If many group outside the middle, then that tells you something about the situation – it tells you that whatever you’re looking at isn’t expected.
Standard Deviation ► For normal situations, the “curve” will look bell-shaped, like this:
Standard Deviation ► Most healthy women will eat between 1,700 and 2,000 calories a day. If you plot how many calories women eat, each woman’s intake will be one value. Plot them on a sheet of paper along a line and most of the values (number of calories) will land in the middle of the spread. That will be what is called a “normal distribution.” Normal distribution
Standard Deviation ► In a normal distribution, about 68% of the women will gather in the middle. They are “one standard deviation” away from the middle on either side. (The blue area on the graph.)
► Two standard deviations away will account for about 95%. (The blue areas and the brown areas.) ► So, 95% of the values in most situations will be considered “normal.” However, all but the middle 68% will be somewhat abnormal, but not excessively abnormal.
► Three standard deviations away from the middle will account for about 99% of the values. (The blue, brown, and green areas). The values in the green areas are more abnormal, but we expect about 4% of values to fall into these areas, because life is not perfect.
Standard Deviation ► If a scientific study concludes that 99% of the values fall within three standard deviations, then you have a normal situation and the conclusions can be trusted. ► A good public opinion survey, for example, that concludes Americans support the President’s policies can be trusted if the values (support for the president) fall in a normal bell curve with most of the people saying they support the policies.
► But what about the situations where the values don’t fall in a normal bell curve?
► Then you have untrustworthy results, or at least you know that more than you would expect don’t fit the normal pattern. In the graph at top, most of the values fell to the left of center. In other words, most of the values are outside the normal range.
Margin of Error ► Margin of Error deserves better than the throw- away line it gets in the bottom of stories about polling data. Writers who don't understand margin of error, and its importance in interpreting scientific research, can easily embarrass themselves and their news organizations.
Margin of Error ► The margin of error is what statisticians call a confidence interval. The math behind it is much like the math behind the standard deviation. So you can think of the margin of error at the 95 percent confidence interval as being equal to two standard deviations in your polling sample. Occasionally you will see surveys with a 99 percent confidence interval, which would correspond to 3 standard deviations and a much larger margin of error because the more you include the fringe, the more likely your results will be untrustworthy. standard deviationstandard deviation
Margin of Error ► Let ’ s consider a particular week's poll as a repeat of the previous week's. In the first week, Candidate A received support from 57% of those polled. Candidate B received 43%, a 14 point difference. In the second week, Candidate A received 53% support and Candidate B received 47%, a 6 point difference. Both polls had a margin of error of 4 points. So, is Candidate B gaining on Candidate A? ► No. Statistically, there is no change from the previous week's poll. Politician B has made up no measurable ground on Politician A because the movement for both politicians is within the 4 point margin of error.
Questions Journalists Should Ask ► Where did the data come from? Always ask this one first. You always want to know who did the research that created the data you're going to write about. Just because a report comes from a group with a vested interest in its results doesn't guarantee the report is a sham. But you should always be extra skeptical when looking at research generated by people with a political agenda. At the least, they have plenty of incentive NOT to tell you about data they found that contradict their organization's position.
Questions ► Have the data been peer-reviewed? If it was, you know that the data you'll be looking at are at least minimally reliable because other pollsters have given their blessing on the data. If it wasn ’ t, that ’ s a sign that it might not be valid data.
Questions ► How were the data collected? This one is real important to ask, especially if the data were not peer-reviewed. If the data come from a survey, for example, you want to know that the people who responded to the survey were selected at random.
Questions ► Be skeptical when dealing with comparisons. Researchers like to do something called a "regression," a process that compares one thing to another to see if they are statistically related. They will call such a relationship a "correlation." Always remember that a correlation DOES NOT mean causation.
Questions ► Finally, be aware of numbers taken out of context. Again, data that are "cherry picked" to look interesting might mean something else entirely once it is placed in a different context.
Survey Sample Sizes ► The population of a study is everyone who could have been included. For a national poll, then, the population would include every adult in the U.S. – a number that would be impractical to poll. Some researchers take a random sample. The larger the sample the more likely it will be representative of the population. But a sample of 400 is usually good enough for most surveys. Most national polls, though, survey 1,500 to 2,500 people. The margin of error in a sample = 1 divided by the square root of the number of people in the sample
Survey Sample Sizes ► The margin of error in a sample = 1 divided by the square root of the number of people in the sample ► In a survey of 2,500 people, the square root is 50. So, 1/50 =.02 ► In a survey of 400 people, the square root is 20. So, 1/20 =.05 ► This shows the margin of error increases significantly as the number surveyed decreases.
Picking the Right Statistical Test ► There are different kinds of stats tests and the correct one will be the one that provides the best answers based on the type of data you have collected. ► It is best to enlist the help of a statistics pro to analyze your data. ► You can also use SPSS, a computer program that conducts the statistical computations for you when you enter the data. So by knowing what type of test to run, you can enter the data into SPSS and run the test.
Use of Statistics ► Statistical tests allow researchers to find out whether their findings are “significant” – i.e. What is the probability that what we think is a relationship between two variables is really just a chance occurrence? The lower the probability of chance, the more believable the results. ► Researchers hypothesize. They write a statement that they believe will be true from the data they collect. They base this on previous research and on common sense. Then, they write the “null hypothesis.” The null is the exact opposite of the hypothesis the researcher has chosen. The statistical tests are done to test whether the null hypothesis is correct. If it is WRONG, then the researcher’s hypothesis must be correct.
Use of Statistics ► Researchers use statistics to determine the probability of the data being correct. They usually want a confidence level of.05 and it is written: p =.05 That means that the data will be 95 percent accurate. (In other words, if the data were collected 100 more times, the results would fall within the range of the current study 95 times.) That means the data are pretty reliable.
ANOVA ► Most common statistical test: Analysis of Variance (ANOVA) is a statistical technique that is used to compare the means of more than two groups. There are One-way ANOVA (one dependent variable and one independent variable) and Two-way ANOVA (one dependent and two independent variables). [Note about variables: the dependent variable (say, choice of candidate) is what will be affected by the question or the experiment; the independent variables are controlled by the researcher (say, choosing gender or income as factors that affect the dependent variable – choice of candidate).] ► Use the ANOVA test only if you are comparing data from at least 3 groups.
T-test ► Another common statistical test: t-test uses the standard deviation of the sample to help determine interesting stuff about the larger population. ► Use when you have only 2 groups of data, say results from men and women and you want to know whether their answers are significantly different or just from random chance.
Other types of tests ► There are many other types of tests for interpreting data that require a rather high level of skill in statistics. If your data are complicated and you want to find out as much about the data as possible, you may want to consult a stats pro for help.