Presentation is loading. Please wait.

Presentation is loading. Please wait.

Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO.

Similar presentations


Presentation on theme: "Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO."— Presentation transcript:

1 Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO Faculty Development Seminar

2 Case Study: The wrong way

3 Statistician was consulted after the data had been collected. Study question was not clearly defined. Variables were not defined. Data Dictionary was not developed. Data were not cleaned/validated. Result: a statistician that is asked to perform a miracle!

4 Case Study: Lesson Arrangements to consult with a statistician should be made before you start enrolling and collecting data on patients! In fact, they should be made before protocol development to prevent issues downstream.

5 Learning Objectives 1.Describe the continuum of data management 2.List data collection instruments / approaches 3.Understand how to create a data dictionary 4.Describe methods to validate data 5.Describe various data analytic tools 6.Describe how to decide on statistical approaches

6 Question Where does data management fit into the research process?

7 The Research Process 1.Question 2.Literature search 3.Objective / Hypothesis 4.Study design 5.IRB 6.Study conduct 7.Data analysis 8.Dissemination of results

8 Data Management Pearl “No study is better than the quality of its data” - Friedman “…get it right the first time” - Crerand

9 Analysis Steps in Data Management Definition Acquisition Data Entry Validation

10 Data Definitions Identifying your data Identifying your data types Naming your data variables Creating a data dictionary

11 Data Types Types of Variables QualitativeQuantitative Nominal Ordinal Interval Ratio

12 Data Definition Exercise

13 Data Variable Names Make the name descriptive (easier to remember) Keep it short (less than 10 characters) User lower case Avoid spaces – use “underscore” Use numbers to indicate sequences

14 Data Variable Formats Variable formats: –Numeric –String

15 Data Variable Values Possible responses for a variable –Numeric format: 0 = no / 1 = yes –String format: a = no / b = yes

16 Data Variable Values

17 Note on Missing Values What about variables with no response? –Leave it blank –Assign a period “.” –Assign a value (usually out of the expected response range) –Avoid text

18 Data Naming Exercise

19 Data Dictionaries / Code Books Brings together all data elements: –Data types / formats –Variable names –Expected response values (range) –Comments Self-generated vs. computer generated “Rosetta Stone” for the database

20

21 Data Dictionary Exercise

22 Data Acquisition Pick the best method for the environment

23 Data Acquisition Methods Interviews Questionnaires Assessments –MCQ examinations –OSCE / OSAT Laboratory studies

24 Data Acquisition Environments Observational encounters Structured research encounters Self-report

25 Data Acquisition Problems Major types of data issues: –Missing data –Incorrect data –Excess variability

26 Data Acquisition Problems Reasons for poor data quality: –Researcher-dependent data: Insufficient time Inadequate training Lack of focus on study tasks Poor communication Protocol deviation

27 Data Acquisition Problems Reasons for poor data quality: –Subject-dependent data: Inadequate instruction Poor comprehension Sensitive or stigmatized behaviors

28 Data Acquisition Options Paper forms Direct entry Computer assisted data acquisition

29 Data Acquisition: Paper Forms Advantages Controlled distribution and return Comments Double data entry Disadvantages Anonymity Manual quality checks Data entry time / errors

30 Data Acquisition: Direct Entry Options: –MS Excel, MS Access –Epi Info – free on the web –Direct entry into statistical software Pros / Cons: –No data transcription –Errors

31 Data Acquisition Computer assisted data acquisition: –Automated data collection –OCR forms –Computer-based case report forms / questionnaires –Computer-assisted self-interviews –Mobile computing device diaries

32 Data Acquisition: CASI Special Focus: Health Behaviors –Factors which may affect reporting: Sensitive or stigmatized behaviors Age discrepancy between participant and interviewer Lack of privacy Lack of comprehension of self-administered questionnaires

33 Data Acquisition: CASI Computer-assisted self-interview (CASI): –Computer-based interview –Can incorporate audio, video, and text –Respondent listens to or reads questions on screen –Submits answers through keypad or touch screen

34 Data Acquisition: CASI Benefits of CASI: –Interview conducted in privacy –Standardized interview –Computer controlled branching –Automated consistency and range checking –Multilingual administration

35 Analysis Steps in Data Management Definition Acquisition Data Entry Validation

36 Data Validation 1.Is all of the data present? 2.Are the responses within the expected range? 3.Does the data make sense?

37 Data Validation Is all of the data present? –Visually examine the data cells –Frequencies

38

39 Data Validation Are the responses within the expected range? –Frequencies Maximum / minimum values –Descriptive statistics Means Standard deviations

40 Data Validation

41 Once the outlier is found, one can reference the chart for clarification

42 Descriptive Statistics

43 Data Distribution Definitions by SPSS 16.0

44 Data Distribution

45

46 Scatterplots

47 Who is Represented in the Data? Sample test of proportions –Percent of gender –Percent of ethnicity Sample test of means –Age –BMI Does our data reflect the population at large or a subset?

48 Who is not? Compare data of the included and excluded individuals –Are they similar for: Age (continuous – Student t test) BMI (continuous – Student t test) Ethnicity (discrete/categorical – Chi-square test) Gender (discrete/categorical – Chi-square test)

49 Analysis Steps in Data Management Definition Acquisition Data Entry Validation

50 Data Analysis Choose the right tool for the job Commonly used statistical tests: –If the data are normally distributed (i.e. bell-shaped curve) then we use parametric statistical test –If the data are (1) not “bell-shaped”, or (2) have small sample sizes, generally less than 30 per group or (3) contain “outliners”, then we use nonparametric statistical tests.

51 Choice of statistical tests is used on: –Distribution of the sample data –Sample size –Number of groups –Independence of the groups Comparison Measurement Normal Distributio n # of groups Statistical Test Mean (Average)Yes2Student’s t-test Mean (Average)Yes≥3Analysis of Variance MedianNo2Wilcoxon Rank-Sum or Mann-Whitney U-test MedianNo≥3Kruskal-Wallis test ProportionsYes≥2Chi-square test ProportionsNo≥2Fisher’s exact test Data Analysis

52 Univariate vs. Multivariate –Multivariate methods are being required more frequently in medical research because we are looking at relationships that involve more than one-to-one association. Multivariate methods allow us to: –Examine many variables simultaneously –Adjust for baseline differences between groups –Adjust for potential “confounding” variables –Obtain “adjusted” measures of effect Examples of multivariate methods: (Explain or predict the independent variables) –Linear regression – to predict the values of a numerical measurement (viral load) –Logistic regression – to predict a dichotomous outcome (pregnant/not pregnant) –Cox proportional hazard – to predict time to an event (survival time) Data Analysis

53 Session content, including narrated MS Powerpoint slides available at: http://www.obgynknowledgebank.net


Download ppt "Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO."

Similar presentations


Ads by Google