2Why we “do it”"What we really want to get at [in health care research] is not how many reports have been done, but how many people's lives are being bettered by what has been accomplished. In other words, is it being used, is it being followed, is it actually being given to patients—... What effect is it having on people—"Rep. John Porter (R-IL), retired chairman House Appropriations Subcommittee on Labor, Health and Human Services (HHS), and Education
3Is Statistics Important? Statistics is important because we can use it to find out whether something we observe can be applied to new and different situations.Knowing this allows us to plan for the future, and to make decisions about how to allocate our scarce resources of money, energy, and ultimately life.We use the term generalizable: can what we know help to predict what will happen in new and different situations?
4Why StatisticsScientific knowledge represents the best understanding that has been produced by means of current evidence.Research design, if used properly, strengthens the objectivity of the research.Statistical methods allow us to compare what is actually observed to what is logically expected.
5Why Statistics (cont’d) Knowledge of statistics . . .Useful in conducting investigationsHelpful the preparing and evaluating research proposals.Vital in deciding whether claims of a researcher are validKeep abreast of current developments.Effective presentations of the findings.
7Evils of Pickle EatingPickles are associated with all the major diseases of the body. Eating them breeds war and Communism. They can be related to most airline tragedies. Auto accidents are caused by pickles. There exists a positive relationship between crime waves and consumption of this fruit of the cucurbit family. For example
8Evils of Pickle Eating (cont’d) Nearly all sick people have eaten pickles. 99.9% of all people who die from cancer have eaten pickles.100% of all soldiers have eaten pickles.96.8% of all Communist sympathizers have eaten pickles.99.7% of the people involved in air and auto accidents ate pickles within 14 days preceding the accident.93.1% of juvenile delinquents come from homes where pickles are served frequently. Evidence points to the long-term effects of pickle eating.Of the people born in 1839 who later dined on pickles, there has been a 100% mortality.
9Evils of Pickle Eating (cont’d) All pickle eaters born between 1849 and 1859 have wrinkled skin, have lost most of their teeth, have brittle bones and failing eyesight-if the ills of pickle eating have not already caused their death.Even more convincing is the report of a noted team of medical specialists: rats force-fed with 20 pounds of pickles per day for 30 days developed bulging abdomens. Their appetites for WHOLESOME FOOD were destroyed.
10Evils of Pickle Eating (cont’d) In spite of all the evidence, pickle growers and packers continue to spread their evil. More than 120,000 acres of fertile U.S. soil are devoted to growing pickles. Our per capita consumption is nearly four pounds.Eat orchid petal soup. Practically no one has as many problems from eating orchid petal soup as they do with eating pickles.EVERETT D. EDINGTON
13Types of Statistics Descriptive Statistics Examples enumerate, organize, summarize, and categorizegraphical representation of data.these type of statistics describes the data.Examplesmeans and frequency of outcomescharts and graphs
14Types of Statistics Inferential Statistics Examples drawing conclusions from incomplete information.they make predictions about a larger population given a smaller samplethese are thought of as the statistical testExamplest-test, chi square test, ANOVA, regression
16Creighton University Medical Center VariablesJ.D. Bramble, Ph.D.Creighton University Medical CenterMed Fall 2006
17Types of Data Qualitative Quantitative data fall into separate classes with no numerical relationshipsex, mortality, correct/incorrect, etc.Quantitativenumerical data that is continuouspharmaceutical costs, LOS, etc.
18Parameters and Statistics characteristics of the populationcalculating the exact population parameter is often impractical or impossibleStatisticscharacteristics of the samplerepresent summary measures of observed values
19Types of VariablesVariables are symbols to which numerals or values are assignede.g. X and Y are variablesDependent (Y’s), that which is predictedIndependent (X’s), that which predictsExtraneous (Confounding or Control)statistical models “adjust” for their influence
20Independent variables Independent variables are the presumed cause of the the dependent variableThe variable responsible for the change in the phenomena being observedNothing is for sure, so avoid the word ‘cause’ and think in terms of independent and dependent variables
21Dependent variables Also referred to as the outcome variable The outcome of the changes due to the independent variablesExample: y = a + bx
22Confounding variables Additional variables that may effect the changes in the dependent variable attributed to the independent variables.These variables are controlled by measuring them and statistical methods adjust for there influence.Sometimes referred to as control variables
23Active vs. attribute variables Active variables are those variables under the control of the researchercontrolled experimental studiese.g., amount of drug administeredAttribute variables can not be manipulated by the researcherquasi-experimental studiese.g.,sex or age of subject; blood pressure; smoker
26Continuous VariablesContinuous variables are measured and can take on any value along the scalequantitative variablesmeasured on a interval or ratio levelExamplesAge, income, number of medications
27Categorical Variables Categorical variables are measured as dichotomous or polytomous measuresqualitative variablesmeasured on a nominal or ordinal levelExamplessex; smoking status; ownershipCategorizing continuous variables
28Nominal measurement scale Used for qualitative dataTwo or more levels of measurementThe name of the groups does not matterExamplesSex (Male/Female)Smoker (Yes/No)Political Party (Rep, Dem, Ind)
29Ordinal measurement scale All the properties of nominal plus . . .The groups are ordered or rankedIntervals between groups are not necessarily equalExample:Income (low, med, high)Disease severityLikert scales
30Interval measurement scale All properties of nominal and ordinal plus . . .A scale is used to measure the response of the study subjectsThe intervals scale’s units are equal; however arbitrary (e.g., a relative scale)Examples:Temperature on Fahrenheit scale
31Ratio measurement scale All properties of the previous scales plus . . .An absolute zero pointCan perform mathematical operationsHighest level of measurementExamplesIncome, age, height, weight
32Measures of Central Tendency and Variation Summarizing DataMeasures of Central Tendency and VariationThe mean is our usual concept of an overall average - add up the items and divide them by the number of sharers (100 candy bars collected for five kids next Halloween will yield 20 for each in a just world). The median, a different measure of central tendency, is the half-way point. If I line up five kids by height, the median child is shorter than two and taller than the other two (who might have trouble getting their mean share of the candy). A politician in power might say with pride, "The mean income of our citizens is $15,000 per year." The leader of the opposition might retort, "But half our citizens make less than $10,000 per year." Both are right, but neither cites a statistic with impassive objectivity. The first invokes a mean, the second a median.
33Mean Arithmetic mean the balance point sum all observations divide the sum by the number of observationsMeans are higher than medians in such cases because one millionaire may outweigh hundreds of poor people in setting a mean; but he can balance only one mendicant in calculating a median
34Median Divides the distribution into two equal parts. Considered the most “typical” observationLess sensitive to extreme values
35Calculating Medians To find the median value: q(n+1) 41, 28, 34, 36, 26, 44, 39, 32, 40, 35, 36, 33order data in ascending order26, 28, 32, 33, 34, 35, 36, 36, 39, 40, 41, 44Apply the median location formula: 0.5(12+1) = 6.5Note: this is ONLY the location of the median
36QuantilesQuantiles are those values that divide the distribution into n equal parts so that there is a given proportion of data below each quantile.The median is the middle quantile.Quartiles are also very common (25, 50, 75)If we divided the distribution into 100 then we have percentiles.
37Mode The observation that occurs most frequently Graphically it is the value of the peak of the distribution.Frequency often may be bimodal--two modes.If values are all the same--no mode exists
40Symmetrical: The relationship between the Mean, Median, & Mode
41Positive Skew: The relationship between the Mean, Median, & Mode
42Negative Skew: The relationship Between the Mean, Median, & Mode
43Summarizing Data Frequency distributions Measures of central tendency The tendency of data to center around certain numerical and ordinal values.Three common measures:mean, median, & modeMeasures of variationstandard deviation
44Five Figure Summary Median Quartiles Maximum Minimum Can be shown in a box and whisker plot
45Which Measure? Mean Median Mode numerical data symmetric distribution ordinal dataskewed distributionModebimodal distributionmost popular
46Variation Must also report measures of variation Measures of variability reflect the degree to which data differ from one another as well as the mean.Together the mean and variability help describe the characteristics of the data and shows how the distributions vary from one another.
47Example of VariationTake the following three sets of data: 1) 10, 8, 5, 5, 2; 2) 5, 6, 6, 7, 6; 3) 6, 6, 6, 6, 6In all three cases the mean is 6,the variability is a lot of variability in set 1No variability in set 3.We will discuss three measures of variability: 1) the range; 2) the standard deviation; and 3) variance
48Measures of Variation Range the value between the highest and the lowest observationsRange = xmax - xminlimited usefulness since it only accounts for the extreme valuescan also report the inter-quartile range (q3 – q1)
49Standard Deviation most widely used & preferred measure of variation. represented by the symbol s or sdthe square root of the variance (s2)larger values = more heterogeneous distribution75% of the observations lie between x-2s and x+2sif the distribution is normal (bell shaped)67% =95% =99.7% =
51ExampleUsing data on the sexual activity of male and female subjects can be found in Chatterjee, Handcock, and Simonoff (1995) A casebook for a first course in statistics. New York: Wiley. They provide data on the reported number of sexual partners for 1682 females and 1850 males. The dependent variable is the number of reported partners.
52Descriptive Statistics Male Female (n=1850) (n=1685)MeanMedianMode
53Using Excel When Syntax in Known Write them right into the spreadsheetBe sure to start with an equal signUse your mouse to highlight data to analyze
54Using Excel When Syntax in Unknown Use the wizard and follow in instructions.All wizards work about the same way.Select the fx button to select appropriate testSelect category and then desired test
55Follow the Wizard Either highlight the array or just write it in These icons reduce/enlarge the Wizard box