Download presentation
Presentation is loading. Please wait.
1
Implementing Enterprise Information Programs with MDM-CDI & SOA
DAMA-NY 2008 Implementing Enterprise Information Programs with MDM-CDI & SOA Larry Dubov, Sr. Director, Sales Consulting & Architecture New York, NY May 15, 2008
2
Agenda Definitions Why MDM and EDM Now and Key Challenges MDM and Data Hub Capabilities Data Stewardship Framework and Information Quality SOA and Data Services: Strengths and Weaknesses Information Management Methodology Lessons Learned and Accelerators What are the next disruptive things in EDM and MDM?
3
What is Master Data Management (MDM)?
Master Data Management (MDM) is a framework of processes and technologies aimed at creating and maintaining an authoritative, reliable, and sustainable, accurate, and secure data environment that represents a “single version of the truth,” an accepted system of record used both intra- and inter-enterprise across a diverse set of application systems, lines of business, and user communities. Master data are those data which are foundational to business processes, are usually widely distributed, which, when well managed, are directly contributing to the success of an organization, and when not well managed pose the most risk There’s no standard definition of MDM, or Master Data, and Ecosystems (aka Information Exchanges or Fusion Centers) are a new term to most, although we’ve been involved in several of these across three continents. Most all MDM definitions include people, business processes that need to be “fixed” and managed, and technologies to implement data quality business rules and workflow/task management.
4
Customer Data Integration (CDI) is the “Entry Point” to MDM
CDI Focus is on Individual & Organizational Entities: MDM Expands the Problem to Include New Entities: MDM Customers Prospects Patients People Citizens Employees Vendors Suppliers Trading Partners Products Equipment Financial Assets Vessels Containers Weapons Locations Drugs Vehicles CDI Space was first defined in early 2005 But already MDM market size today at more than $5 Billion with predicted growth surpassing $10B in the coming years Versus Forrester sized the BPMS market at $1.2 in 05 with expected growth to $2.7B in 2009 Enterprise content management (ECM) at $2.2B in 2005 with expected growth to $3.9B in 2008 Party (CDI) Product (PIM)
5
Key Challenges New Customer & Relationship Centric Business Processes CDI Consuming Applications Customer Identification, Correlation & Grouping Service Oriented Architecture Business & Operational Reporting Exception Capture & Processing Data Acquisition, Distribution & Synchronization (Batch & Real Time) Visibility, Security, Confidentiality, Compliance Metadata Management Data Governance & Standards Information Quality External Data Providers Very complex, multidimensional, and multi-disciplined and can be risky Difficult to sell data initiatives to the business and executive management No single vendor provides a comprehensive solution These factors mandate development and reliance on sound models, open integration standards, and methodologies in building holistic solutions from multiple best-of-breed components -Problem complexity by its multidimensional nature. Some of the dimensions and components are shown in the picture: customer data integration to move to customer centric processes, information quality, legacy systems, data visibility and security concerns etc. If you want to take into account everything upfront – impossible If you make it very narrow, it will never be an enterprise solution but rather another flavor of a silo -Difficult to sell to business. EDM and MDM are perceived as infrastructural with lack of business benefits. (Cost center). Since they are oftentimes perceived as purely infrastructure initiatives with limited or unclear benefit. Some examples including spendings on the roadmaps -No single vendor provides a solution… (good) and spending. For all enterprise data problems due to a great variety of problems, industries, corporate cultures, platforms, and standards along with continuously changing market dynamics and business requirements -It is not just about vendors. It is about a vision of a solution and ability to assemble this solution from best vendor components and product and enable enterprise for flexibility
6
Enterprise Customer Data Managed by Lines of Business
Enterprises organized by Lines of Business and manage customer information in a product-centric model with overlapping customer domains Business Line 1: Business Line Products Business Line Customers Business Line 2: Business Line Products Business Line Customers Business Line 3: Business Line Products Business Line Customers Business Line 4: Business Line Products Business Line Customers Business Line 5: Business Line Products Business Line Customers Key points here: Definition of MDM defined through a single version of the truth. What is it? There are other types of Master data – to be defined by each business
7
So, What’s the MDM-CDI Focus?
The Need to Transition from Product/Account to Party…it’s a Big Deal Current State: Future State: Household Client Grouping Account 123: Product 1 E-Statement KYC Acct & Client Docs Approval Account 456: Product 2 E-Statement KYC Acct. & Client Docs Approval Account 789: Product 3 Paper Statement KYC Acct & Client Docs Approval Spouse Joe Mary E-Statement KYC Client Doc Approval Paper Statement KYC Client Doc Approval Owner Owner (Joint) Owner Beneficiary Owner Joe Organizations missing the Party level data: from the structure perspective and from the businbess process perspective Mary Joe Mary Mary Account 123: Product 1 Acct. Attr. Only Account 456: Product 2 Acct. Attr. Only Account 789: Product 3 Acct. Attr. Only Derived Account Grouping Joe Mary 1 Mary 2
8
Drivers Business Area: Drivers:
Business Development, Sales & Marketing Cross sell/up sell to existing customers Effectiveness of marketing campaigns Recurring revenue from existing customers Retain “good” customers by reducing attrition rates Recognize “bad” customers Customer Service Account setup time Customer service time Customer intelligence and level of service Consolidated statements Risk, Privacy, Compliance & Control Risk Management Accurate Books & Records Compliance with AML & KYC Regulations Compliance with corporate standards and policies Regulatory fines and penalties Operational Efficiency Account setup costs Customer acquisition costs Administrative overhead of redundant data entry Operation costs: duplication, redundancies, transaction errors, data processing errors & exceptions Failed tactical initiatives Reduces costs of planned initiatives due to CDI This change to customer centricity occurs in all industry verticals. Horizontal trend Four groups of areas where MDM-CDI helps: Business Development and Marketing Campaigns: Credit Cards/student example – not only customer recognition but also relationships, marketing campaigns efficiency example, “spinners” – example, example of redundant data entry for FA, inability to implement privacy laws – how do you collect and store opt in / opt out information, risk adjusted return on investment, operational inefficiencies at account opening/contract creation and maintenance and how this information is propagated from contract creation to billing systems, consolidated billing statements, government program in the UK. Need for Risk Adjusted Returns.
9
MDM Is Adding Value When You…
… Purchase Software … Pick Up Your Prescription … Apply for a Loan … Check In & Earn Points … Identify Risks … File Insurance Claims … Register a Patient Some examples from our customers. Most of these started with CDI, and are only now beginning to discuss moving into Item-centric MDM, but the value for them isn’t apparent yet, so a “product-centric MDM” strategy is slower, but we’re seeing several of our customers begin down a Location MDM initiative as well as other non-customer data management. Real time Customer Data Integration for B2C and B2B marketing, sales enablement in high tech. Real time tracking of customers and prescriptions across 3500 stores for WalMart Saving millions in cleansing and integrating data – even after it had been cleansed by an external data provider (like an Experian, Harte Hanks, or the like) Joining together customers across nearly a dozen different hotel brands under the portfolio of one hospitality company Catching “bad guys” and sharing data across scores of agencies across jurisdictional, political, agency, and cultural boundaries Improving patient data across insurance and provider ecosystems
10
MDM and Data Hub Capabilities
11
Single Version of The Truth
Merge and Persist Composite View or INFORMATION J. Jones (Name) 35 West 15th Street (Address) Toledo, OH (Address 2) Sales INFORMATION James (First) 35 West 15th Street (Address) Toledo (City) Jones (Last) OH (State) M Gender) 30 (Age) INFORMATION James Jones (Name) 35 W. 15th Street (Address) Toledo, OH (Address 2) You can see that James or Jim Jones appears to this company as three different people because there is different demographic data for him. People move, get married…their data changes Go thru four bullets 1. non intrusive…no new unique identifiers, do not force standardization, no code at the source systems…why our implementations are 3 – 6 months. 2. different applications have different data needs…we provide the right data to the right application…again in real time. By linking all these data and creating the Golden record we are “cleaning” and deduping the data Reduce duplication rates to well under 1% in a typical installation Discovered that 23% of a credit card issuer’s customers shared an address with another cardholder, enabling company to dramatically decrease marketing and mailing costs 3. we then maintain these linkages so that every time Bill Conroy knocks on the door…we make sure that another duplicate never appears. Customer Support INFORMATION Jim Jones (Name) 35 West 15th St. (Address) Toledo, OH (Address 2) E.R.
12
Single Version of Truth: Commercial Customers
Provides accurate, real-time access to complete customer or entity data across disparate sources, systems and networks D&B Name Addr Cont. Phone ABC Incorporated 9146 E VIA DEL SOL NETOWN, CA Joe Smith Trusted System of Record Back Office ABC Inc. Will Jones Name Cont. Phone INFORMATION Name Addr Cont. Phone ABC Inc. 9146 VIA DEL SOL NETOWN, CA 45883 Joseph Smith Will Jones ABCC Incorp. Joseph Smithe Name Cont. Phone AIU AB&C 9146 VIA DEL SOL NETOWN, CA Name Addr Phn
13
Why Do We Need MDM? (DW is Only as Good as Its Dimensions)
Product Product Time Time Customer Customer -Let’s move on to another topic of this presentation: Why do we need MDM? Is that really something new or just a new buzz word? -A friend of mine, a person with a strong ETL experience brought an argument in our conversation. He said: MDM is defined thru an ability to create and maintain a single version of truth for enterprise entities such as Customer, Supplier, Product etc.. We have been doing this for years now for Datawarehouses, datamarts, and CRM solutions. Then why is MDM new? What is a single most important and disruptive new thing about MDM? -My answer was that: “Modern probabilistic MDM technologies significantly enhance traditional deterministic ETL A traditional deterministic ETL oftentimes is not capable of providing high quality dimensions for data warehouses and data marts. -The value of the datawarehousing ability to slice and dice.is greatly impacted by insufficiencies of deterministic ETL in building complex data warehousing dimensions, relationships and hierarchies within dimensions. Indeed it is not easy to slice and dice if your data warehousing dimensions are crooked -This is where probabilistic MDM comes in. Potentially every complex datewarehousing dimension may need a data hub extension based on probabilistic technology as shown in the next slide Can we really ‘slice and dice?’ Traditional Deterministic ETL may not be sufficient… This is where Probabilistic MDM enabled by Data Hubs comes in
14
Star Schema Hubs If your MDM solution is BI driven, align your MDM solution with complex DW dimensions Customer Hub Time Branch Hub Customer Facts Branch Account Product For each complex dimension you may want to consider a data hub This is a logical view and doesn’t mean that different products are required for different hubs This picture emphasizes the notion of Analytical MDM. Initiatives are frequently begin with this and then move deeper in more operational and collaborative MDM where probabilistic technology helps real-time synchronization in the OLTP world, e.g. support account opening and client on-boarding Account Hub Product Hub Status
15
The Initiate MDS Solution in an Enterprise Architecture
Orders CRM Call Center Sales Profiles Web Self-Service Marketing Initiate Master Data Service™ Services Web Events & Alerts Transaction Support Security & Audit Trail Matching Accuracy Master Data Views APIs Messaging Performance Scalability Relationships Hierarchies Data Stewardship Batch Support Loads & Extracts Orchestrated Services
16
Implementation Styles
Batch Near Real-Time Real-Time Link Integration Style Consolidation Style Co-Exist Combine Registry/Slave Hybrid Transaction/Master Ownership Style
17
More on Matching and Linking
Step 1: Optimizes data for statistical comparisons Normalizes & compacts data, creates derived data layer, source data remains intact Phonetic equivalences, tokenization, nicknames, etc. Step 2: Finds all the potential matches Casts a wide net – all matches on current or historical attributes, prevents misses Partial matches, reversals, anonymous values, etc. Step 3: Scores accurately via probabilistic statistics Compares attributes one-by-one and produces a weighted score (likelihood ratio) for each pair of records Frequency weights specific to your business Edit distance, proximity of match Allows custom deterministic rules, e.g. false positive filters Step 4: Custom threshold settings Single or dual threshold models Link, don’t link, don’t know – “learns” from manual input Manage cost/quality trade-offs Manage the linkages, workflow review Manual review Lowest possible score Highest possible score Don’t link Link Lowest threshold Upper threshold Should be linked Should not be linked This slide provides more detailed view on matching and linking. A lot of science has been involved here including multiple PhD’s to build this. The goal is to have algorithm think like a person when records are compared. Bill William, Larry Lawrence Misspelled values (missed letters, transposed letters, extra records) Transposed first name and last name Anonymous values Validation for TIN, Credit card number Name change / Address change Uniqueness of values The process includes the following stages: Optimization for statistical comparison (the engine creates comparison strings internally). Case conversion, removing extra spaces, standardizing phone numbers, removes anonymous values, standardization functions are available (generic, address functions (country specific and address component specific) , date functions, name functions Finds all potential matches (bucketing – score only what makes sense for comparison) – scalability (linear vs quadratic) One record can be in multiple buckets Compares and produces weighted score Links
18
Hierarchy Management The term hierarchy is used only as a simple hierarchy with one and only one root, only one parent for each node within one hierarchy Typically one hierarchy is selected as the Master used as a foundation (e.g. D&B or custom) There is a notion of source precedence / tree of truth High performance match to build the hierarchy 40MM records for 15 minutes One original systems record (member) or single version of truth record (entity) can belong to multiple hierarchies (e.g. corporate for D&B and geography with territories, regions etc.) A data steward can edit the hierarchy manually, (e.g. if there is a knowledge of a merger) When later the merger update is coming from the Master source, the data steward can reconcile the source merge with the node previously created manually Hierarchy query and navigation is done using various types of methods that allow to navigate to the node, node’s immediate children and all the sub-tree below or navigate from the node up and across (to be checked) The product can export a hierarchy (e.g. to build a DW dimension)
19
Hierarchy: Management & Services
Understand & Visualize Customer Relationships Resolve logical master from multiple internal or third party source hierarchies Rule based hierarchy & relationship creation and management Maintain individual & organizational hierarchies through web application to support active data stewardship Establish business & consumer hierarchies Hierarchy Source Data 1. Hierarchy is certain kind of relationship. Customer Data
20
Initial Hierarchy Harmonization: Original State of Customer Records
Customer’s Organization Records are Fragmented Utility of hierarchies housed in SAP & other apps is inconsistent Disclaimer: (Example Data Only) 80 90 30 20 40 50 60 10 70 Existing (SAP) Relationships Shipping Location Pricing Legend: ID: Source: Bill To: Name: Address: Phone: 10 SAP Harley-Davidson Inc 3700 W Juneau Ave Milwaukee, WI 20 Harley-Davidson Motor Co 40 Andy’s Harley-Davidson Business Highway 81 N Grand Forks, ND 58203 60 Shumate Harley Davidson 6815 E TRENT AVE Spokane Valley, WA 30 .COM Buell Motorcycle Co 2815 BUELL DR East Troy, WI 80 Buell Motorcycles 8272 Gateway Blvd E El Paso, TX
21
Initial Hierarchy Harmonization: DNB Reference Source
1 2 3 4 5 6 ID: Source: Parent: Name: Address: Phone: 1 DNB Harley-Davidson Inc 3700 W Juneau Ave Milwaukee, WI 2 Harley-Davidson Financial Services, Inc 150 S Wacker Dr Chicago, IL 6 Harley-Davidson Credit Corp 4150 Technology Way Carson City, NV 3 Buell Motorcycle Company 2815 Buell Dr East Troy, WI 4 Harley-Davidson Europe LTD 6000 Garsington Rd Oxford, Oxfordshire OX4 2DQ 5 Harley-Davidson Motor Company Inc ID: Source: Parent: Name: Address: Phone: 1 DNB Harley-Davidson Inc 3700 W Juneau Ave Milwaukee, WI 5 Harley-Davidson Motor Company Inc
22
Initial Hierarchy Harmonization: Target State
1 2 3 4 5 6 20 40 50 60 80 90 30 10 70 ID: Source: Parent: Name: Address: Phone: 3 DNB 1 Buell Motorcycle Company 2815 Buell Dr. East Troy, WI 30 SAP Buell Motorcycle Co. 2815 BUELL DR East Troy, WI ID: Source: Parent: Name: Address: Phone: 1 DNB Harley-Davidson Inc. 3700 W. Juneau Ave. Milwaukee, WI 10 .COM 3700 W. Juneau Ave. Milwaukee, WI ID: Source: Parent: Name: Address: Phone: 5 DNB 1 Harley-Davidson Motor Company Inc. 3700 W. Juneau Ave. Milwaukee, WI 20 SAP Harley-Davidson Motor Co. 3700 W. Juneau Ave. Milwaukee, WI
23
Resolve & Rationalize Hierarchies for Immediate Impact
Harley Davidson: Global Account Info: Customer Locations / Bus Units Current Yearly Purchasing $900K Added Potential $300K 1 10 2 4 6 2 3 4 5 20 70 30 20 70 ID – Pricing Code: 20 – KFDRR 70 – [missing] 6 80 90 40 50 60 Accurate Relationships Will Drive Revenue: DNB members that are not yet Customers Will be properly included and targeted in marketing campaigns and in sales activity Misaligned Pricing or Territories: Unassigned or incorrect track codes and rep assignments Improved Customer Satisfaction, Improved Sales Coverage, Improved Sales Operations Incomplete Customer Profiles: Matched & organized records required for accurate analytics Deliver complete customer relationships to Data Warehouse and Marketing Analytics apps
24
New Account Creation Scenarios
B C ID Source Parent Name Address Phone A New Andya Harley Davidson HWY 81 N Grand Forks, ND 58206 B Buell Motor Cycles 8272 Gateway Boulevard El Paso, TX 77907 C Schumate Harley Davidson 7001 E Trent Ave Spokane, WA 99212
25
Improve Account Creation Process & Data Quality
Duplicate prevention Pricing Code: KFFDE KFFDE Pricing alignment Territory assignment 1 A B C 10 2 3 4 5 30 20 Sales Territory: 6 840123 Duplicate: DO NOT ADD! 80 90 40 50 60 70 Assign Correct Track Code to New Account Assign Correct Territory to New Account ID: Source: Parent: Name: Address: Territory: 90 .COM 30 Shumate Harley Davidson 6815 E. TRENT AVE. Spokane Valley, WA 840123 C [new] Schumate Harley Davidson 7001 E. Trent Ave. Spokane, WA ID: Source: Parent: Name: Address: Phone: 40 SAP 20 Andy’s Harley-Davidson Business Highway 81 N. Grand Forks, ND A [new] Andya Harley Davidson HWY 81 N. Grand Forks, ND ID: Source: Parent: Name: Address: Track Code: 6 SAP 2 Buell Motorcycles 8272 Gateway Blvd. E. El Paso, TX N / A B [new] Buell Motor Cycles 8272 Gateway Boulevard El Paso, TX 840123 KFFDE
26
Relationship Management
Relationship is a much looser construct than a hierarchy. Relationship can be used to associate people with group or products with categories, etc. Relationships supports one-to-many and many-to-many associations Relationships can be symmetric and asymmetric For each relationship type its cardinality (one-to-many or many-to-many) is defined along with its symmetry (symmetric or asymmetric) Also when a relationship type is defined, the types of records that can be related are also defined One-to-Many Many-to-Many Asymmetric Symmetric
27
Information Quality and Data Stewardship Framework
28
Approaches to Information Quality
“Upstream” at the point of entry of customer information Better validation Change in business process is likely required Change in applications, workflows and data flows “Downstream” Focus on ETL Includes data stewardship Less invasive – does not require changes in business processes Less effective – always on the flow of dirty data Combination of both “Upstream” and “Downstream” approaches required to accomplish best results
29
Customer Information File Data Enrichment Vendor
Moving Resolution of IQ Issues Closer to Point of Entry: Account Opening & Client On-boarding Account-centric: Customer-centric: Data Entry Data Entry Product 1: Product 1: Account 1: Account Attributes Customer Attributes Account 2: Account Attributes Customer Attributes Account 1: Account Attributes Customer Attributes Account 2: Account Attributes Customer Attributes Customer Information File Product 2: Product 2: Account 3: Account Attributes Customer Attributes Account 3: Account Attributes Customer Attributes Data Hub Householding System Product 3: Product 3: Account 4: Account Attributes Customer Attributes Account 5: Account Attributes Customer Attributes Data Enrichment Vendor Account 4: Account Attributes Customer Attributes Account 5: Account Attributes Customer Attributes
30
Enterprise Data Stewardship Framework
Performs ongoing data quality task resolution Data Stewards Data Governance Policies, standards, processes, roles, responsibilities, metrics, & controls Data Quality Improvement Loop Systems Technology Supports Customizable Data Quality Task Resolution Queues: Validity of Identifiers Updates overlaying identity Link Merge Conflicts between match & hierarchy / relationships associations Summary reports on data quality metrics Improves data entry validation Configures data quality task generation & reports MDM can’t be separated from Data Quality A relationship between DQ and MDM is two-fold On the one hand MDM can’t be implemented without certain level of data quality for matching attributes On the other hand matching and deduplication by their nature are concerns in the realm of data quality There have been a lot of discussions about data governance lately. Indeed data governance is very important. Data Governance defines data policies, standards and key data processing and access rules. Once a data governance organization is created, it should be able to execute these policies. There should be mechanisms and controls developed enabling the organization to enforce and control the policies and standards. Data Governance and Data Stewardship need a framework, procedures and supporting technologies for continual data quality improvement If information quality is not continually monitored and maintained, it will deteriorate. The deterioration rate depends on specifics of the industry and a given company’s business processes and technology maturity, but a generally estimated deterioration rate of 2 percent per month is quite typical. Initiate Inspector supports a number of data stewardship functions including: validity of key identifiers, and conditions like potential linkage, merge, etc. The current strategy is to provide a comprehensive data governance framework with its abilities to review and resolve data quality issues and monitor the data quality progress overall Information Technology
31
InitiateTM Inspector: High Level Capabilities
Only to illustrate the idea, not a presentation. Even more I don’t believe in one tool. Baseline tool and reusable components/services to be able to change the functionality as needed Search and inspect – reactive mode. Let’s edit records for Cynthia Johnson We can add new records if needed Resolve – proactive activity Potential Duplicates Potential Linkage Relationships
32
InitiateTM Inspector: Primary Tasks
33
InitiateTM Inspector: “Potential Linkage”
34
InitiateTM Inspector: “Potential Linkage” – Select and Action (Page 1 of 2)
35
Service Oriented Architecture and Data Services: Strengths and Weaknesses
36
Service Oriented Architecture
Software design and implementation architecture characterized by the following: Logical View – abstraction of loosely coupled, reusable business needs and functions; Message Orientation – exchange between provider and requester Description Orientation – machine-processable metadata to support interoperabilty Granularity – combination of coarse-grained and fine-grained Network Orientation – typically used over network Platform Neutral Software architecture in which software components are exposed as services on the network, which can be reused as necessary for different applications When implemented with web services, offers a standard foundation for functionality reuse and data access within a federated enterprise and services provided externally What is in common between SOA initiatives and EDM or MDM? Data Services Service Oriented Architecture is a software design and implementation architecture of loosely coupled, coarse-grained, and reusable services. Each of these three qualifiers have a specific meaning: Loosely Coupled – it means that if we defined services for use in a certain combination and sequence, we should be able to use the same services in other call sequences or call combinations. Loosely coupled is of course opposite to monolithic. Coarse-grained means that services should be defined at a high enough level typically higher than a single database table. Fine-grain services are important too since they may provide flexibility Reusable – this term is pretty straightforward. Services design should be optimized for reuse. Services are exposed on the network and can be reused by multiple applications Web Services is one of the most popular standards. It offers a standard foundation for functionality, data access and data management reuse within a federated enterprise and with services provided externally It is quite common for modern enterprises to have SOA initiatives, committees and counsels. It is much less common to see that a SOA initiative works closely with Information Management initiatives. This constitute a significant problem. SOA initiative and information management initiatives must cooperate closely. Indeed, 60% of all services are data services.
37
Data Services: Format Agnostic & Source Agnostic Data Management
1 Executive Manager Analyst End User Interfaces with the System by Requesting a data source agnostic Business/Data Service Report Premium Revenue Create New Account List Reports Find Client Create New Report 2 Data Services Metadata translates the course-grain business service call into an orchestrated set of data location aware service calls. The Metadata processes parameters of the request and security access eligibility and data visibility parameters of the requestor. Data Services Metadata Capture Exception Calculate Transform Data CRUD Record Get Best Source Data and Process It Execute and Return to Requestor -At a high level data services do two primary things: First, Data Services make interaction with data format agnostic – no matter how the data is stored; whether it is a relational data base like Oracle, Teradata, SQL Server, Sybase or Informix, MS Access, Excel, XML, flat file or a packaged application, web services standard makes data access data format agnostic -The second point is more profound but it is also more difficult to execute. Data services make interaction with data, systems and application agnostic. -With this architecture data services metadata is responsible for locating the right records. Even if an underlying system is replaced with a new system, IT’s responsibility and its SLA with business community demand to preserve the existing services and their characteristics. 3 4 Note: Metadata insulates user applications and business services from data sources. Thus the data sources can be changed or replaced seamlessly with no changes in user interfaces and user experience Data Sources
38
Data Hub: SOA Architecture Viewpoint
Service Consumers – Business Applications External (Exposed) Services Layer Individual Identification & Recognition Privacy Preference Capture & Notification Identification of Associations, Roles & Relationships Customer Grouping Management Compliance Notification & Reporting Customer Information Maintenance Customer Insight Reporting Internal Services Layer Key Services Third-Party Data Interface Service Data Synchronization & Queue Management Data Archival & Versioning Coordination & Orchestration Visibility, Entitlements, Privacy & Security Rules / Workflow Administration Metadata Management Transaction Logging & Auditing Content Management & Caching Event / Notification Management Error Management Data Hub acts as a services platform that supports two major groups of services: internal, infrastructure-type services that maintain Data Hub data integrity and enable necessary functionality; and external, published services. The latter category of services maps well to the business functions that can leverage CDI Data Hub. These services are often considered as business services, and the Data Hub exposes these external business services for consumptions by the users and applications. Data Provider Service Interface Data Provider Service Interface Data Provider Service Interface Legacy Data Store CDI Customer Hub Other Data Store
39
Reference Architecture
ILLUSTRATIVE Business Processes Layer Contact Mgmt. Campaign Mgmt. Process Mgmt. Orchestration Relationship Mgmt. Document Mgmt. Hub Data Management Layer Client/Suspect Identification Profile Mgmt. Party Mgmt. Grouping Mgmt. Enrichment & Sustaining Hub Data Rules Layer Identity Matching Aggregation & Split Rules Synchronization Rules Visibility Rules Transformation Rules Rules Capture & Mgmt. Hub Data Quality Layer GUID Mgmt. Address Standardization Data Quality Mgmt. Transformation & Lineage Reporting Hub Systems Services Layer Security Visibility Services Orchestration Transaction & State Mgmt. Persistence Synchronization Legacy Connectivity Data Sources
40
Pros & Cons of SOA: When to Use or Not to Use SOA
Reduced data redundancy Business, data governance and data stewards can define SLA via services Data services provide level of abstraction – no need to work at the data and data model levels Standardized interaction within the enterprise, external and vendor provided services Increased productivity of development and agility to support evolving requirements Performance problems if multiple pieces of information are to be joined If implemented with web services, solutions do not support transactional integrity for synchronous processes; compensating transactions required SOA initiatives don’t meet expectations if not supported by data strategy Use of data services requires strong governance and a new culture for the data governors, data stewards, and testers Having service oriented capabilities and data services framework is almost a must for a modern enterprise. These capabilities allow the enterprise to Reduce data redundancy – we don’t have to copy data in a new location but rather use services pointed to the right data sources Business, data governance and data stewards can define SLA via services Data services provide a required level of abstraction – no need for data governance and data stewards to work at the data and data model levels Services provide standardized interaction within the enterprise, external and vendor provided services One of the strongest arguments for data services is increased productivity of development and agility to support evolving requirements. On the left we listed considerations restricting the use of services. Performance problems can be significant, especially if multiple pieces of information are to be joined If implemented with web services, solutions do not support transactional integrity for synchronous processes, which means that compensating transactions are required Use of data services requires strong governance and educated users. As soon as you provide services they can be misused otherwise. This requires a new SOA culture in the enterprise SOA initiatives typically do not meet expectations if not supported by a sound data strategy When implemented properly SOA and data services provide significant benefits for MDM and EDM
41
Testing Data Hub (Not Just Data But Also Services)
Testing SOAP messages Testing WSDL files and using them for test plan generation Web service consumer and producer emulation Testing the publish, find, and bind capabilities of a SOA Testing the asynchronous capabilities of Web services Testing dynamic run-time capabilities of Web Services Web services orchestration testing Web service versioning testing Data Hub
42
Information Management Methodology
43
Importance of Information Management Methodology
Implementation of enterprise information management projects (DW, ODS, MDM, CDI, etc) require well structured methodology Methodologies used for data intensive projects are different from traditional application development methodologies
44
Methodology – Need and Overview
Mike2.0 (Method for an Integrated Knowledge Management) is an open source for Enterprise Information Management ( Developed by BearingPoint Available for Open Source Community since December 2006 Transition to Open Source “Creative Commons” license completed in May 2007 Over 2000 online pages and growing Contains Phases, Activities and Tasks Open Source Mike2.0 allows global community to use this methodology right now for their Information Development initiatives Organizations and individuals can sign-up to become a contributing member of Mike2.0 Brief overview of Mike2.0 includes the concepts like: Phases Activities Tasks Usage Model Complexity and global nature of EDM is adequately addressed by Open Source Mike2.0 If the Internet connectivity is available we will make brief overview of Mike2.0 online
45
Mike2.0 – Usage Model Details
The MIKE2.0 Usage Model provides a guide in determining which phases, activities and tasks from the overall methodology process are used for different categories of Enterprise Information Management engagements. The IM engagements in Mike2.0 are categorized as follows: Business Intelligence Information Asset Management Access, Search and Content Delivery Enterprise Content Management Enterprise Data Management Information Strategy, Architecture and Governance Composite Core Solution Offerings MDM-CDI is under Enterprise Data Management
46
Mike2.0 SAFE Architecture
For each Mike2.0 Activity and task there is a description, deliverables, resources and other useful information SAFE (Strategic Architecture for the Federated Enterprise) provides the technology solution framework for MIKE2.0.
47
Lessons Learned and Accelerators
48
The Three Dimensional Socialization Roadmap
Ownership: Demonstrated commitment to the change and accountability Buy-in: Agreement with the concepts and ideas & expressed support Understanding: Internalizing the concepts and ideas and grasping the implications of the change Awareness: Becoming cognizant and developing a sense of appreciation for the change Lifecycle Phases/Releases Training Testing Development Planning Ownership Buy-in Understanding Awareness Security Front Office Back Office Senior Management Technology & Infrastructure Legacy Systems Level of Involvement Stakeholders 80% of the failed projects fail due to socialization issues and not due to technology Socialization is shown here as a three dimensional concept Level of ownership Stakeholders Lifecycle Phase and Releases Helps program managers build communications plan Helps Program Managers Build Communications Plan
49
Helps Program Managers Build Project Plan
Typical Implementation Work Streams: Organizing for Success & Breaking the Problem Down Master Entity Identification Entity groups & relationships Data governance, standards, quality, & compliance Data architecture Metadata management & administrative applications Initial data load Inbound data processing (batch & real-time) Outbound data processing (batch & real-time) Changes to legacy systems & applications Visibility & security Exception processing Infrastructure Data Hub applications Reporting requirements of a stratified user community Testing Release management Deployment Training It is much easier to discuss, define and plan MDM-CDI when the problem is broken down into more manageable areas and specialty domains Helps Program Managers Build Project Plan
50
Complexity vs. Manageability
Plan a Release Here Critical Point Don’t try to see everything through: it is too complex and unmanageable Don’t put too much emphasis on Current State – this will safe time The initiative should be transparent to your stakeholders want to know how the initiative will help them achieve their goals Helps Program Managers Define Phases & Releases
51
Relevant Relationships in the Data Data, Systems, Processes & People
Fastest ROI High Potential End State Business Value & ROI Initial Phase ROIInitial Resolve… Relevant Relationships in the Data Synchronize… Data, Systems, Processes & People Master… Your Data Start w/ Resolve Start w/ Master The slide compares 2 scenarios: blue with the registry hub and red as the master hub With this approach business it is typical to have first business results within 4 to 6 months. If business wants hub master as the target state, in months If you try to get to the hub solution straight (start with the hub), business value is obtained much slower and risk is much higher. Exception may be developing markets Low CDIStart 6 12 18 24 Time - Months
52
Implementation Continuum
Pure Registry Mastered Customer Process Management Customer Transaction Management Manage business process associated with customer data/ transaction management Customer Data Synchronization Customer Data Access Transactional applications built on top of customer definition across sources Transference of record ownership – Hub owned and maintained Real-time/ Operational Batch/ Analytical Provide bi-directional update between sources of customer information through messaging, APIs and other integration methods Cross Reference Management Establish and maintain a trusted source for analysis Provide people with on demand search Create linkages amongst all records Prepare data for new systems
53
Develop Repeatable Initiative On-boarding Processes & Templates
Program Initiation: Program DATA MODELING Initiated Business Case & Value Proposition Business Requirements Target State Solution Detailed Roadmap Data Governance, Standards, Data Quality Architectural Principles DW DEVELOPMENT CDI-MDM Enterprise Information Program DATA PROFILING & DATA QUALITY DATA SERVICES FRAMEWORK Global Geography DATA GOVERNANCE & STEWARDSHIP Program Planning & Definition On-boarding Initiative 1 In the first year and first implementation phase the number of legacy systems integrated in scope of MDM-CDI is limited (typically 1-3) How to accelerate on-boarding of new systems in the consequent phases given that it is not unusual that systems can be in scope of MDM-CDI integration? A well-defined set of system on-boarding standards and procedures determines common rules that each legacy system should comply with to be integrated into the evolving MDM-CDI solution Enables a repetitive on-boarding process and enables sustainable accelerated solution growth in terms of the number of systems and LOBs Preserves integrity and consistency of the MDM-CDI solution Improved data governance Enables highly sustainable pace of On-boarding Initiative 2 On-boarding Initiative 3
54
Develop Repeatable MDM Systems on-boarding Processes and Templates
In the first year and first implementation phase the number of legacy systems integrated in scope of MDM-CDI is limited (typically 2-3) How to accelerate on-boarding of new systems in the consequent phases given that it is not unusual that systems can be in scope of MDM-CDI integration? A well-defined set of system on-boarding standards and procedures determines common rules that each legacy system should comply with to be integrated into the evolving MDM-CDI solution Enables a repetitive on-boarding process and enables sustainable accelerated solution growth in terms of the number of systems and LOBs Preserves integrity and consistency of the MDM-CDI solution Improved data governance Enables highly sustainable pace of
55
Two Schools of Thought on Hub Data Model
Hub with Out-of-the-box Data Model Pros Seems attractive to have the “right” data model out-of-the-box The product has some pre-built coarse-grain business transactions Cons How flexible is it to support on-going changes? Overhead of having multiple entities and attributes that never used by your specific solution Data Model Agnostic Hub Product Pros Flexible to accommodate any data model and its changes Can generate fine-grain services on top of any data model Cons Development work required to build coarse-grain services to support composite transactions Possible performance and maintenance impact due to additional metadata lookups
56
Hub Implementation: Buy vs. Build vs. Data Enrichment Partner
Traditional “buy or build” question is typically resolved in favor of “buy” An additional consideration is the use of an External Data Enrichment vendor Can we outsource the primary function of CDI hub and do customer match externally? Use of an External Data Enrichment partner has its own pros and cons Pros Higher match accuracy that based on the Knowledge Base (US NCOA and other Libraries) Ability to recognize new customers and prospects Additional data from the Knowledge Base – “data enrichment” Cons Need to share customer data with external vendor Capabilities and Knowledge Base quality depends on the country – domestic better than international Additional cost
57
Focus on Data Mapping Data Mapping is an activity that “maps” the legacy system attributes to the new customer-centric model and vice versa This activity is performed by business analysts The produced data maps are used by ETL and EAI developers The mapping process is time-consuming, can cause numerous errors and can be on the critical path of development A data mapping vendor product can help accelerate delivery Drag-and-drop interface Open source mapping metadata Ability to integrate the mapping metadata with ETL, EAI and EII tools and share the metadata rules Ability to reverse the transformation rules when possible
58
Creation and Protection of Test Data
Sensitive Customer data must be anonymized (obfuscated, cloaked) to disguise it from unauthorized personnel in test and development environments. Some anonymization techniques are as follows: Masking Data Substitution Shuffling records . Number Variance Gibberish Generation Encryption / Decryption Key challenges in using data anonymization techniques include: Ability to preserve logic potentially embedded in data to ensure that application logic continues to function Need to provide consistent transformation outcome for the same data
59
Some Key Reasons for Project Failure to Avoid at Project Kick-off
Lack of executive support and budgetary commitment Lack of cooperation and/or coordination between business and technology Lack of consuming applications – “if we build, they will come…” Lack of end-user adoption Underestimation of legacy impact Insufficient socialization throughout the enterprise to include all stakeholders at the right level Underestimation of the need for layered architecture provided by SOA Gaps in data governance, stewardship, and information quality strategy Miscalculated staffing needs
60
What are the Next Disruptive Things in MDM and EDM?
Match and link evolution: From entities to relationships Integration of MDM with Business Rules Engines and Work Flows Data Stewardship Framework Metadata Integration Externalization of data visibility and security So far probabilistic Match and Link technologies have been primarily used to establish that two or more records represent the same entity (the same individual, supplier. Product, company etc). In some scenarios it is important to dynamically identify relationships between records based on common attribute values. These relationships can be established by probabilistic technologies even if we deal with fuzzy matches. Fuzzy matches can’t be established by applying SQL or other deterministic methods. As some of the examples of systemically inferred relationships we can mention: dynamic definition of households, individuals in position of power in the context of a business your organization may deal with etc. Integration of MDM tools with Business Rules Engines will open new capabilities in applications like client on-boarding, account opening and maintenance, policy issuance etc. Comprehensive Data Stewardship solutions also require Business Rules engine components that will allow the organization to configure and customize data conditions that trigger sending the records to data stewardship queues for resolution As we discussed today probabilistic MDM technologies should be better integrated with ETL technologies and major messaging vendor products to enable higher quality BI solutions Metadata integration is a hot topic. With a variety of products when each of the products is metadata driven, how to reuse the metadata and develop sound standards and practices for code translation semantics to support interoperability Finally, from the enterprise perspective it is not a satisfactory practice that each vendor product has its own independent data security and visibility structures. Data security and visibility should typically be externalized from the tools and supported in a centralized fashion, for instance by LDAP. Vendor solutions should be able to consume the rules of data visibility and security owned externally.
61
? Q&A
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.