Presentation is loading. Please wait.

Presentation is loading. Please wait.

EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August.

Similar presentations


Presentation on theme: "EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August."— Presentation transcript:

1 EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August 2013

2 Clinical Research* Choose the study design, and define the study population, predictor variables, and outcome variables; measure these variables and anticipate problems with measurement; analyze the results In this course, we discuss the “nitty gritty” of collecting, storing, updating, and monitoring the study measurements. *Private companies that make data management systems for clinical research understand “clinical research” to include only RCTs preparatory to FDA drug or device approval, not observational studies.

3 Outline Housekeeping Data Tables Rows = Records; Columns = Fields Normalization of Data Tables Start Lab 1

4 Housekeeping Epi 218

5 Course website: http://www.epibiostat.ucsf.edu/courses/schedule/data_management.html Lectures and Labs will be in China Basin Landing 6702 with overflow into 6704, 8:30 – 10:30 “Learn MS Access 2000” video http://mkanders.com/learn_access_video.htm Username: ucsfdbclass Password: access2000 (We can also loan you the video on CD.)

6 Platforms Access (Labs 1, 2 and 3) REDCap (Lab 4) QuesGen (Lab 5) OnCore (Lab 6) May use other data management platforms for final project: -- SurveyMonkey -- Filemaker Pro -- Oncore -- OpenClinica -- Other

7 Microsoft Access Integrated desktop database management platform Uses SQL (Structured Query Language) Has an outstanding graphical query design tool Incorporates an excellent report writer Based on the principles of the Relational Model Relationships diagram has integrated referential integrity Very flexible, infinitely customizable NOT browser based, desktop application Using advanced features usually requires hiring a developer

8 Microsoft Access Changed user interface between Access 2003 and Access 2007. If you are running Access 2003 or an earlier version, use the lab instructions for Access 2003. If you are running Access 2007 or 2010, use the lab instructions for Access 2007. DEB Terminal Server 185-RDS1.epi-ucsf.org has Access 2010. The others have Access 2003.

9 DEB Terminal Server Provides a remote Windows desktop with Microsoft Office Professional Remote Desktop client software freely available for the Mac and already part of Windows http://www.microsoft.com/en- us/download/details.aspx?id=18140 http://www.microsoft.com/en- us/download/details.aspx?id=18140 Obtain DEB Terminal Server username and password from HelpDesk@psg.ucsf.eduHelpDesk@psg.ucsf.edu Instructions available on course syllabus page

10 REDCap Web-based research data collection system developed at Vanderbilt Available free through UCSF Academic Research Systems http://tinyurl.com/yh5m6ka You are both the Principal Investigator and User 1. Model= “Do-it-yourself”

11 QuesGen Web-enabled research data collection and management platform developed (with UCSF input) by a private company based in Burlingame More full-featured and customizable than REDCap, but primarily “pay-us-to- do-it” rather than “do-it-yourself” User accounts for Epi 218 students

12 Learning Objectives develop a multi-table, relational database for a research study using Microsoft Access query a database for monitoring and analyzing research data learn about REDCap: basic functions, advantages and limitations understand the advantages and costs of other web-based platforms such as QuesGen hear about data management for large-scale clinical trials in industry

13 Requirements Turn in all 5 labs on time Labs are due by midnight the following Thursday (Lab 1 due 8/8 at midnight) Complete Final Project Due 9/18/2013

14 Final Project: Part A Send in or Demonstrate Your Study Database Due 9/20/2012 Send in a copy of your research study database*. We prefer a database that you are currently using or will use for a research study. However, a demonstration or pilot database is acceptable. *If you are unable to package your database in a file to email, you can send us a link or work out another way to review your database.

15 If you are doing secondary analysis of data collected by someone else, obtain the data collection forms* used in the original data collection, set up a new database that you would use for a follow-up study. *Often easily obtained by doing a Google search or emailing the author of the original study. Final Project: Part A Send in or Demonstrate Your Study Database Due 9/18/2013

16 General description of database Data collection and entry Error checking and data validation Analysis (e.g., export to Stata) Security/confidentiality Back up Final Project: Part B Submit Your Data Management Plan Due 9/18/2013

17 Final Project Due 9/18/2013 Start thinking about this now. Build your own study database as you work through the labs. Use extra time in lab to work on your study database. Set up appointments with course faculty early.

18 TICR Professional Conduct Statement Clarifications for this class I will maintain the highest standards of academic honesty I will neither give nor receive aid in examinations or assignments unless such cooperation is expressly permitted by the instructor I will conduct research in an unbiased manner, reports results truthfully, and credit ideas developed and work done by others I will not use answer keys from prior years I will write answers in my own words, and, when collaboration is permitted, acknowledge collaborators when answers are jointly formulated For Epi 218 – Just don’t turn in somebody else’s work as your own.

19 Rows = Records = Entities Columns = Fields = Attributes Data Tables

20 DCR Chapter 16 Exercise 2 The PHTSE (Pre-Hospital Treatment of Status Epilepticus) Study was a randomized blinded trial of lorazepam, diazepam, or placebo in the treatment of pre-hospital status epilepticus. The primary endpoint was termination of convulsions by hospital arrival. To enroll patients, paramedics contacted base hospital physicians by radio. The following are base-hospital physician data collection forms for 2 enrolled patients: Lowenstein DH, Alldredge BK, Allen F, Neuhaus J, Corry M, Gottwald M, et al. The prehospital treatment of status epilepticus (PHTSE) study: design and methodology. Control Clin Trials 2001;22(3):290-309. Alldredge BK, Gelb AM, Isaacs SM, Corry MD, Allen F, Ulrich S, et al. A comparison of lorazepam, diazepam, and placebo for the treatment of out-of-hospital status epilepticus. N Engl J Med 2001;345(9):631-7.

21

22

23 Display the data from these 2 data collection forms in a 2-row data table. Subjec tID KitNum ber AdminDat e Admin Time SzStopPre Hosp SzStopPreHos pTime HospArrT ime HospArrS zAct HospArrG CSV 189A3223/12/199417:39FALSE 17:48TRUE 410B53612/1/199801:35TRUE01:3901:53FALSE4

24 Create a 9-field data dictionary for the data table Field Name Data TypeDescriptionValidation Rule SubjectIDIntegerUnique Subject Identifier KitNumberText(5)5-character Investigational Pharmacy Code AdminDateDateDate Study Drug Administered AdminTimeTimeTime Study Drug Administered SzStopPreHospYes/NoDid seizure stop during pre- hospital course? SzStopPreHosp Time TimeTime seizures stopped during pre-hosp course (blank if seizure did not stop) HospArrTimeTimeHospital Arrival Time HospArrSzActYes/NoWas there continued Seizure Activity on Hospital Arrival? Check against SzStopPreHosp HospArrGCSVIntegerVerbal GCS on Hospital Arrival (blank if seizure continued) Between 1 and 5

25 Methods: Design-Nested double cohort study. Setting-Kaiser Subjects-Infants with neonatal jaundice and randomly selected non-jaundiced infants Predictor Variable-Presence or absence of jaundice Outcome Variable- Neuropsychological score (ranging from 55 to 145) at age 5 Analysis- ? JIFee Jaundice and Infant Feeding Study Newman, T. B., P. Liljestrand, et al. (2006). "Outcomes among newborns with total serum bilirubin levels of 25 mg per deciliter or more." N Engl J Med 354(18): 1889-900.

26 Infant Jaundice Study Data 1.Approximately 400 children 2.5 examiners (doctors) 3.Approximately 700 neuropsychological examinations, measuring weight, height, and “NPScore” (IQ) 4.Some children to be examined more than once 5.No examiner to see the same child twice 6.If child died before age 5, store age and circumstances of death

27

28 Demonstration: Creating a Data Table Label columns and enter rows of data in datasheet view Where is predictor on data collection form?

29 Demonstration: Data Dictionary Table design view: field (=column) names, data types, definitions, validation rules (More on data types, free-text vs. coded responses, later)

30

31 Acceptable table showing one set of exam results per participant. (BabyExamForFigure3)

32 Demonstration Disallowed values Duplicate primary keys This automatic error checking and data validation IS why you need to enter your data into a computer; it is NOT why you need a relational DBMS. Many single- table products (Filemaker Pro, SAS FSP, even Excel) can do error checking and data validation.

33 Demonstration: Same Table in Excel, Stata Excel Stata Etc Rows = Records = Entities Columns = Fields = Attributes Access and Stata have a special row at the top for column headings (=field names); Excel just uses the first row.

34 Normalization

35 Table of Study Subjects Row = Individual Infant Columns = ID#, Name, DOB, Sex, Jaundice If some infants have more than one exam, what do you do? Table of Study Subjects

36 Undesirable table showing multiple exam results per study participant. (BabyExamForFigure4)

37 Demo Find highest IQ Score Find all exams done in April

38 Common Error If you find yourself creating multiple columns for the same measurement, e.g., Date1, Score1, Date2, Score2, Date3, Score3, … Or if your table is more than about 30 columns wide, It is time to restructure your table.

39 Undesirable table with participant-specific data duplicated for each exam. (Note problem with Helen’s DOB.) (ExamBabyForFigure5)

40 Demo Find highest IQ Score Find all exams in a particular month What is Helen’s birth date? What happened to Alejandro, Ryan, Zachary, and Jackson?

41 If some infants have multiple exams, “normalize” the records into two tables, one for subjects and one for examinations. Normalization

42 Data normalized into two tables: one (“Baby”) with rows comprising subject- specific information; the other (“Exam”) with rows comprising exam-specific information. Note that Helen can only have one birth date. Subjects with no exams, e.g. Alejandro, still appear in the database. “SubjectID” functions as the primary key in the “Baby” table and as the foreign key in the “Exam” table.

43 Figure 7. Relationships diagram showing the one-to-many relationship between the table of subjects (“Baby”) and the table of measurements (“Exam”).

44 Demonstration Inability to create integrity violations with normalized tables. This IS why you need a multi-table relational DBMS.

45 Outline Housekeeping Data Tables Rows = Records; Columns = Fields Normalization of Data Tables Start Lab 1


Download ppt "EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August."

Similar presentations


Ads by Google