1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso
Agenda Scenarios Definitions, Processes and Standards Data Quality Services (DQS) DQS Solutions
Organizational Compliance Optimized Productivity 11 Extend Any Data, Anywhere Fast Time to Solution Scalable Analytics & DW 8 Credible, Consistent Data Peace of Mind Managed Self- Service BI 4 Rapid Data Exploration 3 Blazing-Fast Performance 2 Required 9s & Protection 1 Scale on Demand
Credible, Consistent Data % of master data complete & accurate Hrs spent per employee each week searching for info Top 20% PerformersTop 20% Performers 1.2hrs Middle 50% Performers 2.8hrs 91% 68% Under 50% Bottom 30% Performers 6hrs Companies with accurate data perform better¹Companies with accurate data perform better¹ Semantic Single BI Semantic Model DataQuality Services ¹Source: “Turning Pain into Productivity with Master Data Management,” Aberdeen Group, Feb 2011 Delivered with Master Data Services #7
Why is Data Quality Important?
Common Data Quality Issues
Agenda Scenarios Definitions, Processes and Standards Data Quality Services (DQS) DQS Solutions
Data Governance IT Governance Data Governance Data Management Data Quality Data Correctness Strategic Tactical
Data Management Data Standarization Data Management Master Data Management
Data Quality Data quality consists of verifying whether the data is suitable for their intended use in operations, decision making and planning. Domain Management Knowledge Discovery Discovery Value Management
Quality Control Efforts Knowing the context of the data Profile the data required Create and maintain quality standards Tracking Data Quality
Requirements for Data Quality Solution Cleansing MatchingProfiling Monitoring Tracking and monitoring the state of data quality activities and quality of data. Analysis of the data source; providing insight into the quality of the data, to identify data quality issues. Amend, remove or enrich data that is incorrect or incomplete. This includes correction, standardization and enrichment. Identifying, linking and removing duplications within or across sets of data.
How to Manage Data Quality? Data quality management entails the establishment and deployment of: – Roles – Responsibilities – Policies – Procedures – Technology
Data Quality Standards ISO 8000 Data Quality Principles Characteristics that defines data quality Processes that ensure data quality ISO Defines open technical dictionaries Applying dictionaries to master data International Association for Information and Data Quality
Agenda Scenarios Definitions, Processes and Standards Data Quality Services (DQS) DQS Solutions
Data Quality Services (DQS) is a Knowledge-Driven data quality solution, enabling IT Pros and data stewards to easily improve the quality of their data
DQS Solution Concepts Knowledge-Driven Based on a Data Quality Knowledge Base (DQKB) that is reusable for a variety of data quality improvements Knowledge Discovery Acquire additional knowledge through data samples and user feedback Open and Extendible Support use of user-generated knowledge and IP by 3 rd party reference data providers Easy to Use Compelling user experience designed for increased productivity
Data Quality Knowledge Base (DQKB) Matching Policy Domains Composite Domains Repository of knowledge about data: – Domains define values and rules for each field – Matching policies define rules for identifying duplicate records
DQS Knowledge Sources Windows Azure Marketplace™ Data Market Cleanse and enrich data with Reference Data Services from DataMarket DQS Data Store Website that contains DQS knowledge available for downloading 3rd Party Reference Data Providers Open integration with external 3 rd party reference data providers Organization Data Create domains from your own data sources Out of the Box Knowledge A set of data domains that come out of the box with DQS
What is a Domain? Domain Values Reference Data Rules and Relationships Domains are specific to a data field Domains contain the rules for the data Domains can be individual or composite
KB Name Family Name First Name What is a Reference Data Service? Address The Azure Marketplace hosts specialist data cleansing providers Set up an account Subscribe to a reference service Map your domain to the reference service
DQS Architecture Overview DQS Clients DQS Cloud Services DataMarket - Categorized Reference Data DQS Client DQS Server Reference Data API (Browse, Set, Validate…) Reference Data API (Browse, Set, Validate…) Reference Data API (Browse, Get, Update…) Reference Data API (Browse, Get, Update…) Common Knowledge Store DQS Engine Knowledge Discovery Data Profiling Exploration Matching Cleansing Reference Data Reference Data Services DQS Store - KB, Domains © 2010 Microsoft Corporation. Microsoft Materials - Confidential. All rights reserved. Future Clients: Excel, SharePoint, MDS… Future Clients: Excel, SharePoint, MDS… SSIS DQS Cleansing Component DQ Projects Store Other DQS Clients 3 rd Party Reference Data
Agenda Scenarios Definitions, Processes and Standards Data Quality Services (DQS) DQS Solutions
Integrated Profiling Progress Notifications Status DQS process Build Use DQ Projects Knowledge Management Cloud Services Knowledge Base Enterprise Data Enterprise Data Reference Data Reference Data
Interactive Cleansing – DQS Project Analyzes the quality of source data Automatically corrects and enriches the data Manual approval/rejection of suggestions provided by the cleansing algorithm/ reference data services
Batch Cleansing - Using SSIS Matching Policy Reference Data Definition Invalid Corrected Suggested Correct Reference Data Services New DQS server Values/Rules
Matching – DQS Project Why Match? Identify duplicates within the data source Create consolidated view of data DQS Matching Build a matching police Matching training Create a matching project Choose survivors
Agenda Scenarios Definitions, Processes and Standards Data Quality Services (DQS) DQS Solutions
Q&A Personal Blog PASS Mexico City Chapter SolidQ Journal Microsoft business-intelligence.aspx