2 Interoperable Information Backbone Enterprise-wide data abstraction layer for applications Integrated views of data from multiple sources –Relational databases, applications, files Re-useable Data Services for data consistency Metadata-driven data management and integration Complements other data integration tools (ETL, EAI, quality, etc.) MetaMatrix Enterprise Data Service Layer Applications Data Sources
3 Data Services A type of Web Service Does all of the work to transform any data in any format to a W3C compliant service –Implements all of the logic to effect the transformation –Provides access to data sources, regardless of source API, technology Does not implement application logic Decouples the data from the application while making the data discoverable and accessible
4 Custom Apps Web Services, Business Processes Packaged Apps Reporting, Analytics EAI, Data warehouses xml databases warehouses spreadsheets services geo-spatial rich media … Enterprise Information Sources (EIS) Information Consumers Reusable Integrated Business Objects ODBC JDBC SOAP Exposed Information Services (contract) Model-Based Approach Maximizes Re-use Data Abstraction Without Coding
5 Data Model Meta- model Meta Object Facility (MOF)
6 MetaMatrix MetaBase Modeler Model disparate information sources –Relational DBs –Content Management Systems –Files –Services –Applications Uses and retains domain-specific modeling terminology –Relational models have Tables, Foreign Keys, Columns, etc. –UML models have Packages, Classes, Attributes, etc.
7 MetaMatrix MetaBase Modeler Define reusable data services/ business objects Transformations defined with: –Selects –Joins –Criteria –Unions –Functions –User defined Perform schema and semantic matching, data type conversion
8 T Data Sources - Authoritative - Redundant - Overlapping Multiple Internal/External Information Sources Aggregate Data Services: Relational or XML Application-specific Access via ODBC, JDBC, or SOAP APIs T T Virtual XML Document … T T T ODBC/JDBC JDBC SOAP Web Services Web Services Portal Applications Business Intelligence Applications Business Intelligence Applications Enterprise-wide or COI-driven Data Model Rationalization and Semantic mediation Layer Harmonization Data Catalog/Dictionary Logical Data Model Semantic Mediation: The Problem bldg_idSITENUMFacility_ID Location_ID bldg_typeDepot_Number Location_Type
9 J-8 Force StructureJ-7 Operational PlansJ-6 C4CS T Data Sources - Authoritative - Redundant - Overlapping Multiple Internal/External Information Sources T T ODBC/JDBC JDBC SOAP Web Services Web Services Portal Applications Business Intelligence Applications Business Intelligence Applications Enterprise-wide or COI-driven Data Models Rationalization Harmonization Data Catalogs Building Enterprise Semantic Model(s) J-5 Plans & PolicyJ-4 Logistics (GCSS)J-3 OperationsJ-2 IntelligenceJ-1 Manpower / Personnel
10 Biggest Challenge in Creating Data Services? Semantics!!! Structural differences are straightforward Differing definitions among data sources Differing vocabularies among COIs Established, emerging, and evolving data standards –C2IEDM, JC3IEDM, GJXDM, NIEM, GFM, many more Not addressed by ETL, EAI, SOA
11 A Previously Intractable Problem TWPDES has core entities NIEM has 100,000+! Even a limited program with a dozen data sources could yield 10s of 1000s of potential mappings Humans cannot address this without help Indeed, it has stopped many data integration/reconciliation programs in their tracks.
Automated Semantic Matching
13 DISCLAIMER Semantic matching can't really be done automatically yet! Requires intelligence to understand the context and semantics. So use computers to do most of the work but then have the user confirm or check the result.
14 Given two symbols, calculate a measure of the relationship between them: Doesnt seem so hard… amount quantity The Matching Problem
15 ftuqky aqfkyeyr The Matching Problem Given two symbols, calculate a measure of the relationship between them: This is what a computer sees.
16 The Matching Problem Even after extracting likely symbols, matching is a difficult problem. Symbols alone are not enough to generate good matches: –ID -> SocialSecurityNumber or NY The solution relies on context: –NJ,MA,CA,ID –Ego, SuperEgo, ID MatchIt provides that context
17 MatchIT 1.0 Integrated component of the MetaMatrix Semantic Data Services product Based on ontology-driven semantic knowledge base –Word relationships, dictionaries, lexicons, thesauri Plug-in architecture Standards-compliant: –OWL –RDF –Inference engines –OSGI –Eclipse –JDBC
18 FBICBPNYCNYNJ Data Source Services Matched (Confidence of 90%) Gender ID Person Sex Code Ontology Sex semantically related to Gender (Semi-)Automated Semantic Mediation *An extensible semantic knowledge base provides a dictionary and thesaurus like information on words, their meanings, and their relationships to other words. *A sophisticated set of matching algorithms provides string similarity matches and semantic matches with confidence ratings and explanations.
19 Matching Techniques MatchIT uses two types of matching techniques: –String Matching Attempts to determine string similarity based on the lexical distance between them. –Semantic Matching Attempts to determine string similarity based on the ontological distance between them within a semantic ontology. Generate Match Sets Can be run individually or in combinations Pluggable architecture allows for algorithmic extendibility
20 String Matching What is the lexical distance between two symbols? –PUZZLE, PUZZ –ID,IDENTIFIER –STRONG,SONG
21 Semantic Matching How semantically similar are two concepts?
22 Semantic Matching Objectives Find and rank the potential matches, but let the user review and decide for sure. I.e., eliminate 99+% of the things that don't match, and let the user review the <1%. Many times, a user can visually scan a small list of the top 1% and very quickly agree or disagree with the results. Favor false positives over false negatives.
23 Semantic Matching in MetaMatrix Ontologies [OWL/RDF] Relational XMLXML XMLXML XMLXML XMLDomain [UML/ER] MetaBase Modeler Custom Any Source XML File System JDBC RDBMS Instance-level Match Schema-level Match MatchIt Ontology Semantic Knowledge Base MetaMatrix Connector Framework MetaMatrix Importer Framework Models & Files [versioned] Search Index Web Reporting MetaBase Repository Data Harmonization Complete Metadata Access Data/ Content Access Ontological Semantics Access Lexicons Fact Repository Onomasticons Find Matches Analyze Visualize Collaborate Transform Import Export Conceptual/Logical/Physical Data Models Enterprise Information Sources Representations
25 Overall process Import two nontrivial vocabularies –ERwin model of large data warehouse –TWPDES XML schema Extract symbols –Schema-specific tokenization algorithms Assign semantics to each –Symbols are keys into dictionaries Perform semantic matching between them Analyze results
26 ERwin Data Warehouse Model
27 TWPDES XML Schema Mapping Classes for each XML frag in hierarchy
28 Generated Symbol Dictionary (TWPDES)
29 Generated Symbol Dictionary (ERwin model)
30 Editing the Dictionary Modify Definition
31 Editing the Semantics Control Senses
32 Target Model Match Results
33 Examine Details
34 Match Details
35 Matches Used to Build Mappings
36 From Pat Cassidy & COSMO Obligation Duty GenericObligation SameAs The Integrating Function of the Common Semantic Model – via Domain-level Mapping
37 MatchIt Semantic Matching Tool A way to use ontologies in a world where nearly 100% of what already exists is not in an ontology. Map connections between ontologies that are being built and artifacts currently in use: –RDBMs schemas –XML and XSD files –Spreadsheet data –More coming, including ontologies! Map an imported model to a Vocabulary, and a Vocabulary to an Ontological structure