# Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO.

## Presentation on theme: "Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO."— Presentation transcript:

Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO Faculty Development Seminar

Case Study: The wrong way

Statistician was consulted after the data had been collected. Study question was not clearly defined. Variables were not defined. Data Dictionary was not developed. Data were not cleaned/validated. Result: a statistician that is asked to perform a miracle!

Case Study: Lesson Arrangements to consult with a statistician should be made before you start enrolling and collecting data on patients! In fact, they should be made before protocol development to prevent issues downstream.

Learning Objectives 1.Describe the continuum of data management 2.List data collection instruments / approaches 3.Understand how to create a data dictionary 4.Describe methods to validate data 5.Describe various data analytic tools 6.Describe how to decide on statistical approaches

Question Where does data management fit into the research process?

The Research Process 1.Question 2.Literature search 3.Objective / Hypothesis 4.Study design 5.IRB 6.Study conduct 7.Data analysis 8.Dissemination of results

Data Management Pearl “No study is better than the quality of its data” - Friedman “…get it right the first time” - Crerand

Analysis Steps in Data Management Definition Acquisition Data Entry Validation

Data Definitions Identifying your data Identifying your data types Naming your data variables Creating a data dictionary

Data Types Types of Variables QualitativeQuantitative Nominal Ordinal Interval Ratio

Data Definition Exercise

Data Variable Names Make the name descriptive (easier to remember) Keep it short (less than 10 characters) User lower case Avoid spaces – use “underscore” Use numbers to indicate sequences

Data Variable Formats Variable formats: –Numeric –String

Data Variable Values Possible responses for a variable –Numeric format: 0 = no / 1 = yes –String format: a = no / b = yes

Data Variable Values

Note on Missing Values What about variables with no response? –Leave it blank –Assign a period “.” –Assign a value (usually out of the expected response range) –Avoid text

Data Naming Exercise

Data Dictionaries / Code Books Brings together all data elements: –Data types / formats –Variable names –Expected response values (range) –Comments Self-generated vs. computer generated “Rosetta Stone” for the database

Data Dictionary Exercise

Data Acquisition Pick the best method for the environment

Data Acquisition Methods Interviews Questionnaires Assessments –MCQ examinations –OSCE / OSAT Laboratory studies

Data Acquisition Environments Observational encounters Structured research encounters Self-report

Data Acquisition Problems Major types of data issues: –Missing data –Incorrect data –Excess variability

Data Acquisition Problems Reasons for poor data quality: –Researcher-dependent data: Insufficient time Inadequate training Lack of focus on study tasks Poor communication Protocol deviation

Data Acquisition Problems Reasons for poor data quality: –Subject-dependent data: Inadequate instruction Poor comprehension Sensitive or stigmatized behaviors

Data Acquisition Options Paper forms Direct entry Computer assisted data acquisition

Data Acquisition: Paper Forms Advantages Controlled distribution and return Comments Double data entry Disadvantages Anonymity Manual quality checks Data entry time / errors

Data Acquisition: Direct Entry Options: –MS Excel, MS Access –Epi Info – free on the web –Direct entry into statistical software Pros / Cons: –No data transcription –Errors

Data Acquisition Computer assisted data acquisition: –Automated data collection –OCR forms –Computer-based case report forms / questionnaires –Computer-assisted self-interviews –Mobile computing device diaries

Data Acquisition: CASI Special Focus: Health Behaviors –Factors which may affect reporting: Sensitive or stigmatized behaviors Age discrepancy between participant and interviewer Lack of privacy Lack of comprehension of self-administered questionnaires

Data Acquisition: CASI Computer-assisted self-interview (CASI): –Computer-based interview –Can incorporate audio, video, and text –Respondent listens to or reads questions on screen –Submits answers through keypad or touch screen

Data Acquisition: CASI Benefits of CASI: –Interview conducted in privacy –Standardized interview –Computer controlled branching –Automated consistency and range checking –Multilingual administration

Analysis Steps in Data Management Definition Acquisition Data Entry Validation

Data Validation 1.Is all of the data present? 2.Are the responses within the expected range? 3.Does the data make sense?

Data Validation Is all of the data present? –Visually examine the data cells –Frequencies

Data Validation Are the responses within the expected range? –Frequencies Maximum / minimum values –Descriptive statistics Means Standard deviations

Data Validation

Once the outlier is found, one can reference the chart for clarification

Descriptive Statistics

Data Distribution Definitions by SPSS 16.0

Data Distribution

Scatterplots

Who is Represented in the Data? Sample test of proportions –Percent of gender –Percent of ethnicity Sample test of means –Age –BMI Does our data reflect the population at large or a subset?

Who is not? Compare data of the included and excluded individuals –Are they similar for: Age (continuous – Student t test) BMI (continuous – Student t test) Ethnicity (discrete/categorical – Chi-square test) Gender (discrete/categorical – Chi-square test)

Analysis Steps in Data Management Definition Acquisition Data Entry Validation

Data Analysis Choose the right tool for the job Commonly used statistical tests: –If the data are normally distributed (i.e. bell-shaped curve) then we use parametric statistical test –If the data are (1) not “bell-shaped”, or (2) have small sample sizes, generally less than 30 per group or (3) contain “outliners”, then we use nonparametric statistical tests.

Choice of statistical tests is used on: –Distribution of the sample data –Sample size –Number of groups –Independence of the groups Comparison Measurement Normal Distributio n # of groups Statistical Test Mean (Average)Yes2Student’s t-test Mean (Average)Yes≥3Analysis of Variance MedianNo2Wilcoxon Rank-Sum or Mann-Whitney U-test MedianNo≥3Kruskal-Wallis test ProportionsYes≥2Chi-square test ProportionsNo≥2Fisher’s exact test Data Analysis

Univariate vs. Multivariate –Multivariate methods are being required more frequently in medical research because we are looking at relationships that involve more than one-to-one association. Multivariate methods allow us to: –Examine many variables simultaneously –Adjust for baseline differences between groups –Adjust for potential “confounding” variables –Obtain “adjusted” measures of effect Examples of multivariate methods: (Explain or predict the independent variables) –Linear regression – to predict the values of a numerical measurement (viral load) –Logistic regression – to predict a dichotomous outcome (pregnant/not pregnant) –Cox proportional hazard – to predict time to an event (survival time) Data Analysis

Session content, including narrated MS Powerpoint slides available at: http://www.obgynknowledgebank.net

Download ppt "Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO."

Similar presentations