Global PaedSurg Research Training Fellowship

Slides:



Advertisements
Similar presentations
Basic Statistics Overview
Advertisements

Statistical Tests Karen H. Hagglund, M.S.
QUANTITATIVE DATA ANALYSIS
Social Research Methods
Summary of Quantitative Analysis Neuman and Robson Ch. 11
DESIGNING, CONDUCTING, ANALYZING & INTERPRETING DESCRIPTIVE RESEARCH CHAPTERS 7 & 11 Kristina Feldner.
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Analyzing and Interpreting Quantitative Data
Data Analysis (continued). Analyzing the Results of Research Investigations Two basic ways of describing the results Two basic ways of describing the.
Chapter 16 The Chi-Square Statistic
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Chapter Eight: Using Statistics to Answer Questions.
Chapter 6: Analyzing and Interpreting Quantitative Data
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis – mutually exclusive – exhaustive.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
1 UNIT 13: DATA ANALYSIS. 2 A. Editing, Coding and Computer Entry Editing in field i.e after completion of each interview/questionnaire. Editing again.
Easy (and not so easy) questions to ask about adolescent health data J. Dennis Fortenberry MD MS Indiana University School of Medicine.
Chapter 13 Understanding research results: statistical inference.
Basic Statistics for Scientific Research. Outline Descriptive Statistics – Frequencies & percentages – Means & standard deviations Inferential Statistics.
Interpretation of Common Statistical Tests Mary Burke, PhD, RN, CNE.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Quantitative Methods in the Behavioral Sciences PSY 302
Outline Sampling Measurement Descriptive Statistics:
Introduction to Marketing Research
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Hypothesis Testing.
Logistic Regression APKC – STATS AFAC (2016).
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Review 1. Describing variables.
STATISTICS FOR SCIENCE RESEARCH
Bi-variate #1 Cross-Tabulation
Analyzing and Interpreting Quantitative Data
Basic Statistics Overview
Basic Statistics and Beyond Made Easy
Social Research Methods
Chapter 2 Describing Data: Graphs and Tables
Statistics.
Inferential statistics,
Summarising and presenting data - Univariate analysis continued
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
CHAPTER 26: Inference for Regression
STEM Fair Graphs & Statistical Analysis
Basic Statistical Terms
NURS 790: Methods for Research and Evidence Based Practice
Statistical Analysis using SPSS
STEM Fair Graphs.
UNIVERSITY OF NIGERIA, NSUKKA SCHOOL OF POSTGRADUATE STUDIES
STATISTICS Topic 1 IB Biology Miss Werba.
Descriptive and Inferential
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Unit XI: Data Analysis in nursing research
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Welcome!.
Correlation and the Pearson r
15.1 The Role of Statistics in the Research Process
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Basic Statistics Overview
Constructing and Interpreting Visual Displays of Data
Displaying Data – Charts & Graphs
Chapter Nine: Using Statistics to Answer Questions
Psychological Research Methods and Statistics
PSY 250 Hunter College Spring 2018
Descriptive Statistics
Introductory Statistics
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

Global PaedSurg Research Training Fellowship 6 Data Cleaning and Analysis Dr. Emily Smith 4/25/2019

At the end of the day, you want a clean and precise dataset. The question is – How to get there? Especially in global health Overarching Goal

Next Steps Think about the data that you have collected or will collect as part of your research project What is your research question? What are you trying to get your data to “say”? Which statistical tests will best help you answer your research question? Contact the research coordinator to discuss how to analyze your data!

Data Analysis Process

Step 1: Creating an analysis plan Formulate your plan according to your research objective and setting Collect the data Now that you have your data, how to you assess its accuracy and precision?

Create a data dictionary Step 2: Managing Data Create a data dictionary

Excel is a great resource for a data dictionary Make sure to add a column describing how you code the missing values

Few datasets are free of errors and missing values. It is important to review the dataset to identify errors before beginning analysis. Document, document, document ANY changes you make. This is an iterative process. Before you begin, make a copy of the original dataset!! KEY POINT! Step 3: Cleaning Data

Few datasets are 100% complete or accurate. Usually there are a few weird or missing values. Sometimes missing data occurs randomly and sometimes it occurs in patterns. Step 4: Detecting and Correcting Missing, Miscoded or Out-of-Range Values

Types of missing data MCAR (Missing completely at random): Missing are independent of variables and occur at random. MAR (Missing at random): Missing-ness is related to a particular variable, but is not related to the value of the variable that has missing data (Accidently omitting an answer on a questionnaire). MNAR (Missing not at random): Missing for a reason! How do you know?

Identifying Missing Values Best way to identify missing values is assessing the frequency distributions for each variable.

HANDLING MISSING DATA Complete-case analysis (complete-participant analysis) Remove everyone with missing data on one or more variables. Often reduces precision Unbiased in a wide range of circumstances Imputation Default value imputation Mean imputation Regression imputation Multiple imputation Inverse probability weighting

Identifying Records with Out-of-Range Values Some variables may contain values that are outliers (or out of range) compared to the responses of the other participatns. Often, these are numerical values that may have been incorrectly coded. To identify these, make a scatter-plot of the variables. Identifying Records with Out-of-Range Values

Making a scatterplot illustrates the value of one variable on the X axis and the value of the other on the Y axis.

Now that you have cleaned data (missing data, outliers, etc), you’re ready to analyze! Remember you should at this point have an original dataset (pre-cleaned) and the dataset you have cleaned. The analysis should be done on this dataset! Step 5: Data Analysis

Types of Statistics/Analyses Descriptive Statistics Describing an association/relationship How many? How much? Inferences about an association/relationship Proving or disproving theories Associations between phenomena If sample relates to the larger population Frequencies Basic measurements Inferential Statistics Hypothesis Testing Correlation Confidence Intervals Significance Testing Prediction

Descriptive Statistics Descriptive statistics can be used to summarize and describe a single variable – univariate Frequencies (counts) & Percentages Categorical (nominal) data Means & Standard Deviations Continuous (interval/ratio) data

Frequencies & Percentages How to display frequencies and percentages: Pie chart Table Bar chart

Distributions The distribution can be displayed using Box and Whiskers Plots and Histograms

Continuous  Categorical You can aggregate data into categories from continuous data. Collect continuous if you can, rather than categories in your raw data!

INFERENTIAL STATISTICS Inferential statistics can be used to test theories, determine associations between variables, and determine if findings are significant Types include: Correlation T-tests/ANOVA Chi-square Logistic Regression INFERENTIAL STATISTICS

Analysis of Categorical/Nominal Data Chi-square Logistic Regression Analysis of Continuous Data Correlation T-tests T-tests Type of Data & Analysis

Correlation When to use it? What does it tell you? When you want to know about the association or relationship between two continuous variables Example: blood pressure and medication What does it tell you? If a linear relationship exists between two variables through the Pearson’s r, and how strong that relationship is Ranges from -1 to +1

Correlation 0 – 0.25 = Little or no relationship Interpreting strength of correlations: 0 – 0.25 = Little or no relationship 0.25 – 0.50 = Fair degree of relationship 0.50 - 0.75 = Moderate degree of relationship 0.75 – 1.0 = Strong relationship 1.0 = perfect correlation

T-tests What does a t-test tell you? If there is a statistically significant difference between the mean score (or value) of two groups What do the results look like? Student’s t Look at at corresponding p-value If p < .05, means are significantly different from each other If p > 0.05, means are not significantly different from each other

Chi-square When to use it? How to interpret? When you want to know if there is an association between an exposure and outcome Ex) Mortality (yes/no) and lung cancer (yes/no) How to interpret? If the observed frequencies of occurrence in each group are significantly different from expected frequencies (i.e., a difference of proportions) Usually, the higher the chi-square statistic, the greater likelihood the finding is significant, but you must look at the corresponding p-value to determine significance

Logistic Regression When to use it? How do you interpret the results? When you want to measure the strength and direction of the association between two variables (exposure and outcome) Where the dependent or outcome variable is categorical (e.g., yes/no) When you want to predict the likelihood of an outcome while controlling for confounders How do you interpret the results? Significance can be inferred using by looking at confidence intervals: If the confidence interval does not cross 1 then the result is significant If OR > 1  The outcome is that many times MORE likely to occur – Risk factor 2.0 = twice as likely If OR < 1  The outcome is that many times LESS likely to occur – Protective factor 0.50 = 50% less likely to experience the event

Summary of Statistical Tests Statistic Test Type of Data Needed Test Statistic Example Correlation Two continuous variables Pearson’s r Are blood pressure and weight correlated? T-tests/ANOVA Means from a continuous variable taken from two or more groups Student’s t Do normal weight (group 1) patients have lower blood pressure than obese patients (group 2)? Chi-square Two categorical variables Chi-square X2 Are obese individuals (obese vs. not obese) significantly more likely to have a stroke (stroke vs. no stroke)? Logistic Regression A dichotomous variable as the outcome Odds Ratios (OR) & 95% Confidence Intervals (CI) Does obesity predict stroke (stroke vs. no stroke) when controlling for other variables?

Descriptive statistics can be used with nominal, ordinal, interval and ratio data Frequencies and percentages describe categorical data and means and standard deviations describe continuous variables Inferential statistics can be used to determine associations between variables and predict the likelihood of outcomes or events Inferential statistics tell us if our findings are significant and if we can infer from our sample to the larger population Summary

References Essential Medical Statistics. Kirkwood & Sterne, 2nd Edition. 2003 http://ocw.tufts.edu/Content/1/lecturenotes/193325 http://stattrek.com/AP-Statistics- 1/Association.aspx?Tutorial=AP http://udel.edu/~mcdonald/statcentral.html Background to Statistics for Non-Statisticians. Powerpoint Lecture. Dr. Craig Jackson , Prof. Occupational Health Psychology , Faculty of Education, Law & Social Sciences, BCU. ww.hcc.uce.ac.uk/craigjackson/Basic%20Statistics.ppt.

Thank you for listening, any questions? Naomi Wright: globalpaedsurg4@gmail.com @PaedsSurgeon @GlobalPaedSurg #GlobalPaedSurg www.globalpaedsurg.com