Presentation is loading. Please wait.

Presentation is loading. Please wait.

4 North Park Suite 106 Hunt Valley, MD 21030 410-584-0009 www.revelytix.com Ontology Based Information Management MatchIT 1.1: Data Integration with Semantic.

Similar presentations


Presentation on theme: "4 North Park Suite 106 Hunt Valley, MD 21030 410-584-0009 www.revelytix.com Ontology Based Information Management MatchIT 1.1: Data Integration with Semantic."— Presentation transcript:

1 4 North Park Suite 106 Hunt Valley, MD 21030 410-584-0009 www.revelytix.com Ontology Based Information Management MatchIT 1.1: Data Integration with Semantic Mapping Technologies Michael Schidlowsky Sr. Software Architect

2 Data Integration Motivated by: Organizational Changes Mergers and Acquisitions Internal reorganizations (e.g., DHS) Data Mining Standards Conformance Migration Efforts Legacy Systems Decouple data sources from application code

3 Data Integration Challenges for integration specialist include: Domain-specific terms Unfamiliarity with source schemas Large size of schema set Semantics often not captured Captured semantics Stored in ad-hoc formats Cannot be reused to facilitate future data integration efforts

4 Data Integration: Example Background: Acme Inc., merges with CompuGlobalHyperMeganet. Technical Challenge: Need “Virtual Database” of all sales for all stores in real-time. Which fields represent customers? CUSTOMERID CUST_ID SSN Which fields represent ‘Price’? Sale_Amt Total_Sale What if your database has 10,000 columns?

5 Data Integration: Example Background: HR needs to use employee information for new company portal. Technical Challenge: Data must be in XML and conform to standard HR schema. Find all fields related to Address? RESIDENCE PREV_RESIDENCE What if your database has 10,000 columns?

6 Ideal Matching Solution Finds lexical relationships Captures semantic information Finds semantic relationships Provides programmatic access to results (API) Fast Scalable Human Involvement

7 MatchIT Philosophy Best Matching tool already exists! What is meant by “ID”?

8 MatchIT Philosophy Best Matching tool already exists! What is meant by “ID”? -“PLEASE PRESENT ID”

9 MatchIT Philosophy Best Matching tool already exists! What is meant by “ID”? -“PLEASE PRESENT ID” -NY, NJ, ID

10 MatchIT Philosophy Best Matching tool already exists! What is meant by “ID”? -“PLEASE PRESENT ID” -NY, NJ, ID -SUPEREGO, EGO, ID

11 MatchIT 1.1 - MatchIT is a semantic and lexical matching tool. - Session Outline: -Import and process schemas -Perform lexical matching -Create and manage a semantic vocabulary -Perform semantic matching -Demonstrate 3 rd Party integration with Data Integration tool (MetaMatrix)

12 Import & Process Schemas Revelytix Models are RDF/OWL Flexible model architecture Extensible Interoperable Current Importers: JDBC XML Schema MetaMatrix XMI Models Importer Demo

13 Lexical Matching Uses lexical distance measures to determine lexical similarity. Fastest matching technique Requires no work other than importing schemas Often yields interesting results Lexical Matching Demo

14 Create Vocabulary from Schemas A Vocabulary is A set of symbols Occurrences of those symbols in your schemas Binding of each symbol to one or more semantic concepts Created by MatchIT from schemas using tokenization algorithms. Reusable

15 Tokenization Algorithms Different schemas require different tokenization techniques. Tokenization algorithms determine how symbols are extracted from schemas: Capitalization Delimiters English Language Vocabulary Demo

16 Matching Techniques MatchIT currently uses two types of matching techniques: Lexical Matching Attempts to determine similarity based on the lexical distance between them. Semantic Matching Attempts to determine similarity based on the ontological distance between them within a semantic knowledge base.

17 Parts Supplier Schema (as seen by a person)

18 Parts Supplier Schema (as seen by a computer)

19 Semantic Matching How semantically similar are two concepts?

20 Semantic Matching Uses knowledge base distance measures to determine semantic similarity. Presents ranked candidate matches Based on semantics captured in Vocabularies The only way to effectively find relationships between lexically dissimilar symbols: GenderCodeSexCode ProviderSupplier AmountQuantity Semantic Matching Demo

21 3 rd Party Integration MatchIT Integration MatchIT Java API Stand-alone application Embeddable application (as Eclipse plug-ins). Hides unapproved matches Useful for various 3 rd Party applications: -Data Integration -Data Discovery -Ontology Mediation -Search -Metadata Management -Data Cleansing MetaMatrix Demo

22 4 North Park Suite 106 Hunt Valley, MD 21030 410-584-0009 www.revelytix.com Ontology Based Information Management Questions? MatchIT 30-day trial available at http://www.revelytix.com Michael Schidlowsky michaels@revelytix.com


Download ppt "4 North Park Suite 106 Hunt Valley, MD 21030 410-584-0009 www.revelytix.com Ontology Based Information Management MatchIT 1.1: Data Integration with Semantic."

Similar presentations


Ads by Google