Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Similar presentations


Presentation on theme: "Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series."— Presentation transcript:

1 Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series offered by the Regional East African Centre for Health Informatics (REACH-Informatics) in Eldoret, Kenya. Funding was made possible by NIH’s Fogarty Center. The training was held at the Academic Model Providing Access to Healthcare (AMPATH), a USAID-funded program, supported by the Regenstrief Institute at Indiana University. The moduleswere created in collaboration with the School of Informatics at IUPUI. Creative Commons Attribution-ShareAlike 3.0 Unported License

2 Quality Control Quality Control is the process of monitoring and maintaining the reliability, accuracy, and completeness of the data during the conduct of the project. Requires a multidisciplinary team which includes clinicians, data entry staff, statisticians, systems administrations, and data managers. Requires sharing knowledge about disease progression, clinical practice patterns, effects of medical treatments, relationships between variables and expected timing of events. 2

3 Ensuring Data Quality Point of Assessment – Collection: review form before patient leaves the clinic – Entry: range restrictions, logical checks – Post-entry clean-up queries – Statistical Analysis: data trends 3

4 Ensuring Data Quality (cont.) To ensure data quality the data manager needs to understand: – Goals of program – Standards of operation – Impact of intervention or program – Relationships between variables – Expected timing of events 4

5 Clean-up Queries Missing Data Generate reports regarding the percent of missing data for each item on the data collection forms Highlight differences between programs or specific groups of patients in order to identify methods to minimize missing data 5

6 Date Comparison Ensure that the date of birth precedes all other dates. Calculate age and verify that the date of birth makes sense. For patients who have died, ensure that the date of death follows all other dates. 6 Clean-up Queries

7 Date Comparison (cont.) Generate a clean-up list for observation dates that are after today’s date or, preferably, the date of data entry. Generate a similar list for observation dates that precede the date of inception of your program. Examine the interval between observation/visit dates to ensure that the expected time frame is reflected. 7 Clean-up Queries

8 Checks on Numeric Data Confirm all values are within the expected range. Investigate possible outliers by verifying against source document, comparing with other values for same subject, or cross- referencing with other variables such as current illnesses in the case of elevated lab result Confirm that values make sense with respect to patient’s age, gender, disease status, etc. 8 Clean-up Queries

9 Checks on Adult Heights/Weights Calculate BMI from height and weight (BMI=weight (kg) / height(m)  ) Most should be between 10 and 40 Flag unexpected weight fluctuations 9 Clean-up Queries

10 Checks on Pediatric Heights/Weights Calculate weight-for-age Z-scores using Epi Info NutStat software (http://www.cdc.gov/epiinfo/) or SAS software (http://www.cdc.gov/nccdphp/dnpao/growth charts/resources/sas.htm)http://www.cdc.gov/epiinfo/http://www.cdc.gov/nccdphp/dnpao/growth charts/resources/sas.htm Review date of birth, visit date, age and weight for Z-scores less than -5 or greater than 5. Similar checks can be made with height-for- age and weight-for-height Z-scores. 10 Clean-up Queries

11 Checks on Numeric Data (cont.) Review longitudinal data. If special missing values are coded, ensure that the codes do not overlap with valid data. For lab results, a qualifier such as should be stored in a separate variable. 11 Clean-up Queries

12 Cross-Variable Checks Confirm that there is consistency between gender and other variables such as pregnancy. Look for contraindicated medication combinations. Look for data that may have been recorded under the wrong patient ID. 12 Clean-up Queries

13


Download ppt "Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series."

Similar presentations


Ads by Google