Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Design Techniques for Clinical Research Melissa K. Carroll, M.S. October 20, 2003.

Similar presentations

Presentation on theme: "Database Design Techniques for Clinical Research Melissa K. Carroll, M.S. October 20, 2003."— Presentation transcript:

1 Database Design Techniques for Clinical Research Melissa K. Carroll, M.S. October 20, 2003

2 Overview Relational database design Implementing a relational database in Microsoft Access Designing a database for a typical study by our group Resources Questions

3 Relational Database Design

4 What is a Database? Collection of data organized for efficient operations Everyone uses them multiple times a day, often without realizing it Examples –Airline reservations –Online shopping Underlying design principles are largely universal

5 Problems with “Flat” Files Data Redundancy –Leads to more work and inconsistencies –Wreaks havoc on performing basic manipulations such as searching and sorting File Management –Multiple files –Concurrent users –Security –Intermediate results Ad hoc programming (reinventing the wheel)

6 Data Modeling: Entity- Relationship Model Models data as entities, with attributes, and relationships between entities Entity: person, place, or thing Instance: example of an entity Attribute: feature of an entity Relationship: describes association between (usually two) entities

7 E-R Notation

8 Designing an E-R Diagram Issues to Consider –What questions will the data be used to answer? –What are the entities and how do they relate to each other? –What attributes uniquely identify entities? –What attributes need to be sub-divided? Goal: Eliminate Redundancy Process is called “normalizing” data

9 Sample E-R Diagram: MP3 Files

10 Relational Model Lower-level model used for actual database implementation Translating from E-R model –Entities become tables –Attributes become fields –Many-to-many relationships become tables Unique identifiers from involved tables as fields –Unique identifiers from “one” sides are added as fields to corresponding “many” sides

11 Relational Database Management Systems (RDBMSs) Database Management System (DBMS): software with purpose of helping user design and use a database Relational Database Management System (RDBMS): DBMS for databases based on relational model Most major commercial products (e.g. MS Access, Oracle, MySQL, SQL Server)

12 SQL Need language to tell the DBMS –The design of the database –Actual data to be entered –What data to retrieve and in what format SQL = standardized language used by almost all major DBMSs Standard language provides interoperability and portability

13 SQL Examples CREATE TABLE artist (artistID INT AUTO_INCREMENT, artistName VARCHAR(75)) INSERT INTO artist (artistName) VALUES (“The Beatles”) UPDATE album SET label = “EMI” WHERE albumTitle = “Abbey Road”

14 SQL Examples Continued SELECT songTitle, quality FROM song, recording WHERE song.songID = recording.songID SELECT songTitle, quality FROM song INNER JOIN recording ON song.songID = recording.songID SELECT albumTitle, albumAge AS releaseYear - Date() FROM album

15 SQL Examples Continued SELECT Count(artistID) from artist SELECT MAX(recording.quality) FROM artist, recorded, recording WHERE artist.artistName = recorded.artistName and recorded.recordingID = recording.recordingID and artist.artistName = “The Beatles”

16 Relational Database Implementation in Microsoft Access

17 Clinical Research Database Design

18 Typical Simple Study Baseline and fixed number of follow-ups Subject reaches each time point only once Different time points have different scale protocols Considerable overlap in scales between time points Isolated from other studies

19 Four Database Design Approaches Approach One: entire assessment administration as entity, e.g. all of baseline or all of 12 week –One table per time point, items as attributes Approach Two: scale administrations within each assessment as entity, e.g. 12 Week Hamilton –One table per scale per time point, items as attributes Approach Three: scale administration as entity –One table per scale, items as attributes Approach Four: item as entity –One table (theoretically)

20 Evaluation of Approach One May seem appropriate because common format for analysis is one record per subject Problems –Limited number of fields allowed in some DBMSs –Will have many missing values –General redundancy issues (shares with Approach Two, to follow)

21 Pros and Cons of Approach Two Versus Approach Three Pros –“Horizontal” format –Flexibility for handling inter-time point scale disparities Cons (for simple studies) –Data model complexity –Table creation and modification time multiplied –Space consumption –More data locations (entry and retrieval complexity) –Re-assigning to different time points

22 Reassigning Scale Time Points Using Approaches Two and Three

23 Reassigning Scales: Modified Approach Three

24 Approach Two Cons for More Complex Studies Poor at handling an indefinite number of follow-up time points Modified Approach Three is better at handling studies in which subjects are assessed at the same time point multiple times –May happen due to progressing through the study multiple times –May also happen due to e.g. being screened multiple times

25 Evaluation of Approach Four Pros –Could potentially handle changes more elegantly –Perhaps more “normalized” theoretically Cons –Considerably harder to design entry interface –Harder to obtain data in formats usually required Doesn’t fix non-database problems with data collection changes

26 Databases and Datasets Database: Collection of data organized for efficient entry, updating, storage, and retrieval Dataset: Subset of data retrieved from database in a format optimized for a specific reporting or analysis purpose Well-designed databases should facilitate creation of datasets in any desired format Datasets should be formatted for a particular purpose and used only for that purpose

27 Normalizing Data Within Scales: Medication Data

28 Normalizing Data Continued: Comparison of Medication Queries

29 Multi-Study Issues: To Separate or Not To Separate If same data will count for multiple studies –Keeping design and data in sync –E.g. updating all copies when data changed –E.g. ensuring scale changes are reflected in all tables and forms If handling multiple, possibly “isolated” studies –Keeping design in sync –Can still use views so actual storage is transparent to user

30 Summary Careful planning must go into designing a database First step in design is to model the data –E-R  relational model is effective DBMSs, such as Access, offer tools for creating, using, and maintaining databases When designing clinical research databases, as with any databases, priority should be normalization, hence elimination of redundancy Properly designed databases will supply data in any format desired

31 Resources Access Help (Help in top menu, Contents and Index, Contents tab) Access Database Wizard (in main menu upon opening) Oreilly Access Database Design & Programming, 3rd Edition –For database design theory: online chapter at 4.html 4.html Access (97/2000/etc.) Bible –Available here; not 100% accurate Database System Concepts Fourth Edition (Silberschatz, Korth, Sudarshan)

Download ppt "Database Design Techniques for Clinical Research Melissa K. Carroll, M.S. October 20, 2003."

Similar presentations

Ads by Google