Working With Your Statistician: How we can make each others’ jobs easier Jeannie-Marie Leoutsakos, PhD MHS Assistant Professor, Department of Psychiatry.

Slides:



Advertisements
Similar presentations
Maintaining data quality: fundamental steps
Advertisements

P20 Seminar November 12, Statistical Collaboration Part 1: Working with Statisticians from Start to Finish Part 2: Essentials of Data Management.
Pengolahan dan Analisa Data Indra Budi Fasilkom UI.
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
1 © 2006 by Smiths Group: Proprietary Data Smiths Group Online Performance Review Tool Training.
How to enter data in SPSS
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
REDCap Overview Institute for Clinical and Translational Science Neil Nuehring Jesteny Pascual Daniel Hingtgen
Analyzing Quantitative Data Lecture 21 st. Recap Questionnaires are often used to collect descriptive and explanatory data Five main types of questionnaire.
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.
REDCap Overview Institute for Clinical and Translational Science Heath Davis Fred McClurg Brian Finley.
Sue Lowry Biostatistical Design and Analysis Center (BDAC) Clinical and Translational Science Institute Academic Health Center University of Minnesota.
ICTR Data Manager Interest Group Meeting April 15, 2011 Results from Data Manager’s Survey Kerry Stewart, Ed.D., FAHA, FAACVPR, FACSM, FSGC Professor,
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Processing the Data.
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
Questionnaires and Interviews
Organizing Your Data for Statistical Analysis in SPSS
Practice Management Training Webinar - Chapter 3 of 4 Ros Campbell Product Consultant.
Before I stated the database I had to save it into My Documents> ICT> You can do it> D201EPORTFOLIO> Evidence For the field group food item, I set the.
Coding for Excel Analysis Optional Exercise Map Your Hazards! Module, Unit 2 Map Your Hazards! Combining Natural Hazards with Societal Issues.
Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
ACCELERATING CLINICAL AND TRANSLATIONAL RESEARCH A simple, flexible tool for inexpensively building secure data capture systems Andy.
Data Collection and Management for Clinical Research Michael A. Kohn, MD, MPP 31 August 2010.
Questionnaire Design Chapter Nine. Chapter Nine Objectives To understand the role of the questionnaire in the data collection process. To become familiar.
Introduction to SPSS Edward A. Greenberg, PhD
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 24 Designing a Quantitative Analysis Strategy: From Data Collection to Interpretation.
Exam Taking Kinds of Tests and Test Taking Strategies.
IB ITGS Case Study. Introduction: Serving thousands of clients, it is method of environment-friendly green ticketing. User friendly system which minimizes.
THE ART OF CODING OF QUESTIONNAIRES By David Onen (Ph.D) Lecturer, Department Of Higher Degrees Uganda Management Institute (UMI) A paper presented to.
Working with scientific collaborators as a grad student Eric F. Lock (with notes from Will Thomas)
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Entering Data Manually PowerPoint Prepared by.
Copyright 2010, The World Bank Group. All Rights Reserved. ICT - a core management issue Part 1 Managing ICT resources Produced in Collaboration between.
Copyright © by Curt Hill Database Introduction History Why we want to use them Other fun.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
REDCap Overview Institute for Clinical and Translational Science Fred McClurg Neil Nuehring.
REDCap Overview Institute for Clinical and Translational Science Heath Davis Fred McClurg Brian Finley.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Behavioral and Emotional Rating Scale - 2 Understanding and Sharing BERS-2 Information and Scoring with Parents, Caregivers and Youth May 1, 2012.
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
Group 1 BDMPS Project Work The Survey on Use of ICT Facilities in TIC.
Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit January 12,
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
CMSC 104, Version 8/061L09VariablesInC.ppt Variables in C Topics Naming Variables Declaring Variables Using Variables The Assignment Statement Reading.
Creating a data set From paper surveys to excel. STEPS 1.Order your filled questionnaires 2.Number your questionnaires 3.Name your variables. 4.Create.
Data Entry, Coding & Cleaning SPSS Training Thomas Joshua, MS July, 2008.
Questionnaire-Part 2. Translating a questionnaire Quality of the obtained data increases if the questionnaire is presented in the respondents’ own mother.
DESIGNING GOOD SURVEYS Laura P. Naumann Assistant Professor of Psychology Nevada State College.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Saving Everyone’s Time and Energy: Practical Tips for Database Design Cynthia Wilson Garvan PhD Statistics, MA Mathematics College of Nursing
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
Division of HIV/AIDS Managing Questionnaire Development for a National HIV Surveillance Survey, Medical Monitoring Project Jennifer L Fagan, Health Scientist/Interview.
Data Management in Clinical Research
REDCap General Overview
Data quality & VALIDATION
PROCESSING DATA.
Gayane Yenokyan, MD, MPH, PhD
REDCap 201: Leveraging REDCap’s Advanced Features
2018 NM Community Survey Data Entry Training
Secondary Data Analysis Lec 10
The ultimate in data organization
MOON Data File Components
Discrepancy Management
Presentation transcript:

Working With Your Statistician: How we can make each others’ jobs easier Jeannie-Marie Leoutsakos, PhD MHS Assistant Professor, Department of Psychiatry and Behavioral Sciences Director, Psychiatry Data Core

 How many of you have a statistician working as part of your group?  How many of you work with a statistician outside your group?  Does the statistician become involved before or after the data are collected?  How many of you also act as the statistician for your group?  What questions are you hoping will be answered today? Questions

 My Background  Statisticians at Johns Hopkins  Ideal and Non-Ideal Collaborations, things to keep in mind.  Specific Recommendations  Data Coding  Data Documentation  Data Delivery  Questions? Outline

 Pre-Med/CogSci at Homewood  Started work at JHH (Research assistant, data manager, data analyst, network administrator)  Biostat master’s at JHSPH  Mental Health PhD at JHSPH  Postdoc in Psychiatry  Data-Core/Teaching/Methods Research How I got here

 53 statistician/biostatistician  53 research data analysts  46 Biostatistics Faculty  100 Biostatistics Students  20 Research Data Manager  9 Database Specialists  100 Programmer Analysts (Bio)statisticians at Hopkins

Collaborator: involvement throughout the project.  Hypothesis Development/Grant writing  Database setup  Data Analysis  Manuscript Preparation Teacher:  should be mutual and integrative Ideal Collaborations Kirk RE. (1991) Statistical consulting in a university: dealing with people and other challenges. American Statistician 45(1):28-34.

 Helper: technician; responds to questions. Accountability problems.  Leader: lack of substantive expertise.  Data-Blesser: curb-side advice.  Archaeologist: my other statistician stopped returning my s… Non-Ideal Collaborations

 thoughout the life of the project / end-product focused  Assist PI with hypothesis development/study design design  Consult on database design with PI & DBM  Check that necessary variables are present, etc.  Check that unnecessary variables are not included  Statistician can be your advocate – stressing important of data integrity to PI  Perform Interim analyses (if necessary)  Perform Final analyses  Assist in manuscript preparation Timeline for Collaboration

 Some portion of statistics(!)  May know little about databases, particularly your database software  May have very circumscribed programming ability.  May have little or no subject knowledge- don’t assume that they are familiar with certain variables or instruments/acronyms. What Statisticians Know

 Database Software  Variable Names/Value labels  Data Documentation  Datafile Version Control  File Formats/Transmission of Data Files Specific Recommendations

 MS Excel – simple but limited, sorting problem, security  MS Access, Filemaker Pro - labor intensive for DBMs  Redcap – web-based, allows tracking, nice features  CRMS – ?  Statistician will likely convert what you give them to a statistical package (Stata/R/SAS, etc)  May have memory issues: STATA/IC 2047 variables  MAC/PC issues Database Software

Stat/Transfer

1. Will this be completely unambiguous to an outside person with little or no prior knowledge of the study? 2. Is this as consistent as possible? (both internally and externally) Golden Rules

 Name Length Limits (should ask)  For SAS and STATA, now 32  Others: may be as low as 8  Need to start with a letter, avoid CAPS and special characters esp *!)  Use a consistent convention: e.g. Use first three characters to denote form (if you have multiple forms).  For dichotomous variables, consider a category as the name: (e.g., instead of “sex” coded 0/1, use “male” coded as 0/1 ) Variable/Field Names

Be careful how you name variables and encode values that might be considered sensitive.  Sex/gender/orientation  Race/ethnicity  Anthropometrics Pitfalls with Variable Names

 May not matter if transformed to.txt or.csv file  Numeric: byte, float, double  Date: format should be explicit  String/Text:  Memo/extended text:  ALERT: if database consists of multiple datafiles, ensure that variable names and formats of identifiers are consistent across all data files. Variable Formats

 Extended Variable Name/Description  Variable name: ham14  Variable Label: “hamilton depression rating scale q. 14”  Particularly useful with short variable name lengths  Check to see if statistician’s software will read them  Take note of label length limits (STATA: 80)  Use consistent convention Variable Labels

 Check to see if statistician’s software will accept them  Use a convention, avoid CAPS  Code functional values of dichotomous variables as 0/1  Missing Data:  Can have multiple missing value codes: don’t know, refused, not applicable, etc  Value codes should be universal and sequential, and outside the possible range of non-missing data.  No fields should be intentionally left blank (except possibly due to skip patterns) Encoding/Value Labels

 Study Protocol/Data Operations Manual  Codebook/Data Dictionary (ideally electronic and string searchable)  Sample CRF (binder with data collection forms)  Unresolved Queries/Issues  Invalid Values  Version Control Data Documentation

 Range from v. elaborate to v. simple  Variable Name  Variable Description  Variable Format (for dates, be careful and explicit as to 12/10/1975 vs 10/12/1975)  Encoding (if any)  Ranges, acceptable values  Counts, Descriptives  Value Labels  Missing Data codes Codebooks/Data Dictionaries

Over 100 PDF files corresponding to each separate datafile

Study also collected data on participants’ spouses and caregivers

Considerations for Longitudinal Datasets Wide: 1 line per patient Long: 1 line per visit Visit indicator needs to be at the end of the var name stub.

 Resolution of discrepancies between double data- entered files (if applicable)  Resolutions of missing data or aberrant values  Valid Data Indicators (e.g., lab values that are known to be erroneous – recommend second variable which contains an indicator as to whether that target variable value is legitimate/to be included in analyses)  Statisticians shouldn’t clean data  Inefficient  We don’t have enough knowledge about the data Dataset Cleaning

 There are likely things like totals, data calculations, etc that are calculated based on the entered data, rather than being entered.  Discuss with statistician – depending on which software you are both using, there may be things that are a lot easier for them to do later, or vice versa – e.g, Long/wide  Documentation should include exactly how these were calculated. Calculated Variables/Data Programming

 It is likely that there will be multiple versions of the dataset (e.g., interim, after cleaning)  A log of all generated versions should be kept, and dataset names should include the date.  Try to distribute only finalized versions of datasets Dataset Version Control

 Be careful about HIPAA!  PMI includes dates and ages if >90  It may be necessary to create “days from baseline variable”  A dataset containing PMI cannot be ed unless it is encrypted  Best bet: only distribute de-identified datasets  Redcap will create one for you automatically  If someone s me an unencrypted dataset with PMI, I am obligated to report them.  Consider Jshare or Sharepoint for file distribution Dataset Distribution

 Encourage your PI to develop a collaboration early.  You should be involved in that collaboration  You and the statistician can save each other time  Useful data is well-documented data Main Points

 How do you find a statistician?  Anybody having a problem with a statistician right now?  Interpersonal aspect of working with a statistician.  Data Scientist career paths  Statistical software packages Questions?