Data Analysis and Surveying 101: Data Analysis and Surveying 101: Basic research methods and biostatistics as they apply to the Theresa Jackson Hughes, MPH American College Health Association December 2006
What we will cover today Research Methods Sampling Frame and Sampling Generalizability Bias Reliability and Validity Levels of measurement Biostatistics Statistical significance Other key terms Appropriate statistical tests Fun examples from the Spring 2005 dataset! Get excited! It’s data time!!!
Research Methods
“To do successful research, you don't need to know everything, you just need to know of one thing that isn't known.” Arthur Schawlow “That's the nature of research - you don't know what in hell you're doing.” Harold "Doc" Edgerton “If we knew what it was we were doing, it would not be called research, would it?” Albert Einstein
What exactly is research? “Scientific research is systematic, controlled, empirical, and critical investigation of natural phenomena guided by theory and hypotheses about the presumed relations among such phenomena.” Kerlinger, 1986 Research is an organized and systematic way of finding answers to questions
Important Components of Empirical Research Problem statement, research questions, purposes, benefits Theory, assumptions, background literature Variables and hypotheses Operational definitions and measurement Research design and methodology Instrumentation, sampling Data analysis Conclusions, interpretations, recommendations
Sampling What is your population of interest? To whom do you want to generalize your results? All students (18 and over) Undergraduates only Greeks Athletes Other Can you sample the entire population?
Sampling A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005) Why sample? Resources (time, money) and workload Gives results with known accuracy that can be calculated mathematically The sampling frame is the list from which the potential respondents are drawn Registrar’s office Class rosters Must assess sampling frame errors
Types of Samples Probability (Random) Samples Simple random sample Systematic random sample Stratified random sample Proportionate Disproportionate Cluster sample Non-Probability Samples Convenience sample Purposive sample Quota
Sample Size Size of CampusFinal Desired N <600All students 600-2, ,000-9, ,000-19, ,000-29, ≥30,0001,000 Depends on expected response rate Average 85% for paper FINAL SAMPLE DESIRED /.85 = SAMPLE Average 25% for web FINAL SAMPLE DESIRED /.25 = SAMPLE
Bias and Error
Systematic Error or Bias: unknown or unacknowledged error created during the design, measurement, sampling, procedure, or choice of problem studied Error tends to go in one direction Examples: Selection, Recall, Social desirability Random Unrelated to true measures Example: Momentary fatigue
Reliability and Validity Reliability The extent to which a test is repeatable and yields consistent scores Affected by random error/bias Validity The extent to which a test measures what it is supposed to measure A subjective judgment made on the basis of experience and empirical indicators Asks "Is the test measuring what you think it’s measuring?“ Affected by systematic error/bias
Reliability vs. Validity In order to be valid, a test must be reliable; but reliability does not guarantee validity.
Levels of Measurement
Nominal Gender Male, Female Vaccinations Yes, No, Unsure Ordinal Personal health status Excellent, Very good, Good, Fair, Poor Last 30 days Never used, Not in last 30 days, 1-2 days, 3-5 days, 6-9 days, days, days, All 30 days Interval Body Mass Index (BMI) Ratio Number of drinks Number of sexual partners Perception percentages Blood alcohol concentration (BAC)
Biostatistics
“It is commonly believed that anyone who tabulates numbers is a statistician. This is like believing that anyone who owns a scalpel is a surgeon.” R. Hooke “Torture numbers, and they'll confess to anything.” Gregg Easterbrook “98% of all statistics are made up.” Author Unknown
Types of Statistics Descriptive statistics Describe the basic features of data in a study Provide summaries about the sample and measures Inferential statistics Investigate questions, models, and hypotheses Infer population characteristics based on sample Make judgments about what we observe
Descriptive Statistics Mode Median Mean Central Tendency Variation Range Variance Standard Deviation Frequency
Descriptive Statistics Examples Categorical Variables (Nominal/Ordinal)
Descriptive Statistics Examples Categorical Variables (Nominal/Ordinal)
Descriptive Statistics Examples Continuous Variables (Interval/Ratio)
Hypotheses Null hypotheses Presumed true until statistical evidence in the form of a hypothesis test indicates otherwise There is no effect/relationship There is no difference in means Alternative hypotheses Tested using inferential statistics There is an effect/relationship There is a difference in means
Alpha, Beta, Power, Effect Size Alpha – probability of making a Type I error Reject null when null is true Level of significance, p value Beta – probability of making a Type II error Fail to reject null when null is false Power – probability of correctly rejecting null 1 – Beta Effect Size Measure of the strength of the relationship between two variables Null is true Null is false Reject null Alpha Type I error 1 – Beta Power CORRECT REJECTION Fail to Reject null 1 – Alpha CORRECT NON- REJECTION Beta Type II error
Let’s test some hypotheses!!!
Test of the mean of one continuous variable College students report drinking an average of 5 drinks the last time they “partied”/socialized Hypotheses H o : µ = 5 H A : µ ≠ 5 Test: Two-tailed t-test Result: Reject null
Test of a single proportion of one categorical variable 20% of college students report their health is excellent Hypotheses H o : p = 20 H A : p ≠ 20 (one-tailed) Test: Z-test for a single proportion Result: Reject null
Test of a relationship between two continuous variables There is a relationship between the number of drinks students report drinking the last time they drank and the number of sex partners they have had within the last school year Hypotheses H o : ρ = 0 H A : ρ ≠ 0 Test: Pearson Product Moment Correlation Result: Reject null
Test of the difference between two means Men and women report significantly different numbers of sexual partners over the past 12 months Hypotheses µ 1 = µ 2 µ 1 ≠ µ 2 Test: Independent Samples t-test OR One-way ANOVA Result: Reject null
Test of the difference between two or more means Mean BAC reported differs across student residences Hypotheses µ 1 = µ 2 = µ 3 = µ 4 = µ 5 = µ 6 µ i ≠ µ j for at least one pair i, j Test: One-way ANOVA Result: Reject null
Test of the difference between two or more means
Test for a relationship between two categorical variables Is there an association between being a member of a fraternity/sorority and ever being diagnosed with depression? Hypotheses H o : There is no association between being a member of a fraternity/sorority and ever being diagnosed with depression. H A : There is an association between being a member of a fraternity/sorority and ever being diagnosed with depression. Test: Chi-square test for independence Result: Fail to reject null
Test for relationship between two categorical variables
Important Points to Remember An significant association does not indicate causation Statistical significance is not always the same as practical significance Multiple factors contribute to whether your results are significant It gets easier and easier as you practice!
Questions???