Quality Data – An Improbable Dream? Quality Data An Improbable Dream? Elizabeth Vannan Centre for Education Information Victoria, BC, Canada
Quality Data – An Improbable Dream? Information quality is a journey, not a destination - Larry P. English
Quality Data – An Improbable Dream? Agenda Data Definitions and Standards Project What is Quality Data? The Cost of Poor-Quality Data Improving Data Quality – Our Process Questions?
Quality Data – An Improbable Dream? BC Higher Education Canada’s Western-most province Population: Million Land Area: 366,795 Sq Miles Publicly Funded Post- Secondary System –22 Colleges –6 Universities
Quality Data – An Improbable Dream? CEISS The Centre for Education Information is an independent organization that provides research and technology services to improve the performance of the BC education system
Quality Data – An Improbable Dream? CEISS Implement and manage administrative systems Perform custom surveys, research and analysis Facilitate development and implementation of data standards Negotiate and manage province wide software contracts (Oracle, SCT Banner, Datatel)
Quality Data – An Improbable Dream? DDEF Project The Problem –Better data about the BC higher education sector needed for decision-making –No infrastructure in place to facilitate the collection of data electronically Data Definitions and Standards Project Initiated in 1995
Quality Data – An Improbable Dream? DDEF Project The Solution –Create data standards for all higher education information (Student, HR, Finance) –Develop a data warehouse based on standards for reporting –Implement a common technical infrastructure at all higher education institutions
Quality Data – An Improbable Dream? DDEF Project Project Goals –Improve the quantity and QUALITY of data available –Reduce the number of data and reporting requests –Develop business information system to support the management and evaluation of the BC Post-Secondary system
Quality Data – An Improbable Dream? How Are We Doing? 16 institutions implemented/implementing Institutions using data warehouses for internal reporting Data requests reduced Ministry using data
Quality Data – An Improbable Dream? Why Focus on Data Quality? Poor data quality in our data warehouse impacts: –Confidence –Decision making –Funding
Quality Data – An Improbable Dream? Quality Data Are… The Four Attributes of Data Quality
Quality Data – An Improbable Dream? Quality Data Are… Accurate –Free from errors –Representative
Quality Data – An Improbable Dream? Quality Data Are… Complete –All values are present
Quality Data – An Improbable Dream? Quality Data Are… Timely –Recorded immediately –Available when required
Quality Data – An Improbable Dream? Quality Data Are… Flexible –Data definitions understood –Can be used for multiple purposes
Quality Data – An Improbable Dream? Quality Data… Don’t have to be perfect Good enough to fill the business need at a price you’re willing to pay Our Challenge Defining Quality Criteria for Higher Education Data
Quality Data – An Improbable Dream? Cost of Poor-Quality Data Business Process Costs Incorrect Registrations Inaccurate Tuition Billings Payroll Errors
Quality Data – An Improbable Dream? Cost of Poor-Quality Data Rework Re-collect Data Correct Errors Data Verification
Quality Data – An Improbable Dream? Cost of Poor-Quality Data Missed Opportunities Substandard Customer Service Poor Decision Making Loss of Reputation
Quality Data – An Improbable Dream? Improving Data Quality Business Process Review Improve d Data Quality Data Quality Assessment Business Practice Change Data Cleansing
Quality Data – An Improbable Dream? Business Process Review When, where, how is data collected? Where is data stored? Who creates data? Who uses data? What outputs are required? What quality checks already exist?
Quality Data – An Improbable Dream? Business Process Review Involve all stakeholders! –For student data we involve Executive Registrars office IT Department Institutional Research
Quality Data – An Improbable Dream? Business Process Review Results –Understanding of business practices –Identification of data creators, custodians, users –Preliminary quality metrics –Problem business practices
Quality Data – An Improbable Dream? Data Quality Assessment Establish Metrics Apply metrics to data Review results
Quality Data – An Improbable Dream? Establish Metrics For each element determine quality criteria –Acceptable range of values –Acceptable syntax –Comparison to known values –Business rules –Thresholds
Quality Data – An Improbable Dream? Quality Metrics
Quality Data – An Improbable Dream? Applying Metrics Collect known information for comparison Develop queries to test each of your validation criteria –We use Oracle Discoverer, but other tools exist (MS Access, SQL)
Quality Data – An Improbable Dream? Applying Metrics Test 1 PEN must be 9 digits long. No characters, no shorter values acceptable
Quality Data – An Improbable Dream? Test 1 Results Two Student Records Contain Invalid PEN Numbers
Quality Data – An Improbable Dream? Test 1 Results Invalid PEN’s Data Entry Error? Can Identify specific students for data cleansing
Quality Data – An Improbable Dream? Applying Metrics Test 2 At least 80% of student records must have valid PEN number
Quality Data – An Improbable Dream? Test 2 Results This Institution Meets the Quality Threshold
Quality Data – An Improbable Dream? Applying Metrics Test 3 No Duplicate PEN’s
Quality Data – An Improbable Dream? Test 3 Results This institution has a BIG problem! Can we see more details?
Quality Data – An Improbable Dream? Test 3 Results Addition information reveals data loading problems
Quality Data – An Improbable Dream? Reviewing Results Systematic approach needed Develop strategy for data cleaning Identify source of data problems Deal with Disparate Data Shock!
Quality Data – An Improbable Dream? Reviewing Results Insert a quality review checklist
Quality Data – An Improbable Dream? Reviewing Results
Quality Data – An Improbable Dream? Data Cleansing Location –Administrative System? –Staging Area? Who Scope
Quality Data – An Improbable Dream? Typical Data Cleansing Correcting data entry errors Removing or correcting nonsensical dates Deleting “garbage” records Combining or deleting duplicates Updating and applying code sets
Quality Data – An Improbable Dream? Business Practice Change Two components –Implementing changes to improve data quality –Adopting ongoing data quality review process Changing Business Practices is a Challenge Get Stakeholder Support
Quality Data – An Improbable Dream? Business Practice Change Education Centralizing responsibility for codes Consolidating data collection Implementing validation routines Change business processes
Quality Data – An Improbable Dream? Quality Review Process Review data regularly Make someone responsible Establish procedures for correcting data problems Communicate quality improvements
Quality Data – An Improbable Dream? Some Changes in BC Creation of Data Manager position, responsible for code sets, data quality Regular education for registration clerks and other data creators Established relationships between data creators and users Re-engineered administrative systems
Quality Data – An Improbable Dream? Improvements to BC Data Improved data quality and quantity –Nonsensical dates almost eliminated –Completeness of key elements improved (from 50% to 80-90%) –Data now being collected for CE in standard format
Quality Data – An Improbable Dream? Final Thoughts… Quality Data are Probable if you are willing to… –Take a critical look at your existing data –Implement changes to how you collect and manage data –Invest the time to educate and communicate with data users and creators –Make data quality improvement an on- going process
Quality Data – An Improbable Dream? Recommended Reading Brackett, Michael H., Data Resource Quality, Turning Bad Habits into Good Practices (New York:Addison-Wesley, 2000) English, Larry P., Improving Data Warehouse and Business Information Quality (New York: John Wiley and Sons, 1999) Redman, Thomas C., Data Quality for the Information Age (Boston;Artech House, Inc., 1996)
Quality Data – An Improbable Dream? Thank You! Presentation Available At or