Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David.

Slides:



Advertisements
Similar presentations
Relational Database Design Via ER Modelling
Advertisements

McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Logical Database Design
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 5/1 Copyright © 2004 Please……. No Food Or Drink in the class.
CS 340 UML Class Diagrams. A model is an abstraction of a system, specifying the modeled system from a certain viewpoint and at a certain level of abstraction.
Automating Bespoke Attack Ruei-Jiun Chapter 13. Outline Uses of bespoke automation ◦ Enumerating identifiers ◦ Harvesting data ◦ Web application fuzzing.
1 Class Number – CS 304 Class Name - DBMS Instructor – Sanjay Madria Instructor – Sanjay Madria Lesson Title – EER Model –21th June.
Conceptual XML for Systems Analysis Reema Al-Kamha PhD Proposal Supported by NSF.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Fundamentals, Design, and Implementation, 9/e COS 346 Day 8.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Fundamentals, Design, and Implementation, 9/e Chapter 5 Database Design.
1 Semi-Automatic Semantic Annotation for Hidden-Web Tables Cui Tao & David W. Embley Data Extraction Research Group Department of Computer Science Brigham.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
Physical Database Monitoring and Tuning the Operational System.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
From OSM-L to JAVA Cui Tao Yihong Ding. Overview of OSM.
Fundamentals, Design, and Implementation, 9/e Chapter 7 Using SQL in Applications.
1 A Tool to Support Ontology Creation Based on Incremental Mini-ontology Merging Zonghui Lian.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Relational Database Design by ER- and EER-to- Relational Mapping.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Relational Database Design by ER- and EER-to- Relational Mapping.
Chapter 6 Relations. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-2 Topics in this Chapter Tuples Relation Types Relation Values Relation.
Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
ER- and EER-to-Relational Mapping
Chapter 6 Relations. Topics in this Chapter Tuples Relation Types Relation Values Relation Variables SQL Facilities.
Module Title? DBMS E-R Model to Relational Model.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
Chapter 9 Integrity. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.9-2 Topics in this Chapter Predicates and Propositions Internal vs.
Data Modelling – ERD Entity Relationship Diagram’s Entity Relationship Diagrams and how to create them. 1.
THE RELATIONAL DATA MODEL CHAPTER 3 (6/E) CHAPTER 5 (5/E) 1.
Instructor: Churee Techawut Basic Concepts of Relational Database Chapter 5 CS (204)321 Database System I.
Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering.
1 Lecture 3 (part 3) Functions – Cardinality Reading: Epp Chp 7.6.
Chapter 12 View Design and Integration. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Motivation for view design.
Dimitrios Skoutas Alkis Simitsis
1 The Relational Model. 2 Why Study the Relational Model? v Most widely used model. – Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. v “Legacy.
Chapter 9 View Design and Integration. © 2001 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Outline Motivation for view design.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
1 A Demo of Logical Database Design. 2 Aim of the demo To develop an understanding of the logical view of data and the importance of the relational model.
A Universal Turing Machine
1 Chapter 17 Methodology - Local Logical Database Design.
UNIT_2 1 DATABASE MANAGEMENT SYSTEM[DBMS] [Unit: 2] Prepared By Lavlesh Pandit SPCE MCA, Visnagar.
Chapter 9 Logical Database Design : Mapping ER Model To Tables.
INFO275 Database Management Term Project. Overview Your project will be to define, design and build a functioning database, to support an application.
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
IS6145 Database Analysis and Design Lecture 6: Logical Modelling Rob Gleasure
DatabaseIM ISU1 Chapter 7 ER- and EER-to-Relational Mapping Fundamentals of Database Systems.
The relational model A data model (in general) : Integrated collection of concepts for describing data (data requirements). Relational model was introduced.
1 ER Modeling BUAD/American University Mapping ER modeling to Relationships.
Chapter 10 Designing Databases. Objectives:  Define key database design terms.  Explain the role of database design in the IS development process. 
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Lesson # 8 HP UCMDB 8.0 Essentials.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relational State Assertions These slides.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts Chapter 2: Entity-Relationship Model Entity Sets Relationship Sets Mapping Constraints Keys.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Methodology - Logical Database Design. 2 Step 2 Build and Validate Local Logical Data Model To build a local logical data model from a local conceptual.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 12 Designing.
MBI 630: Week 9 Conceptual Data Modeling and Designing Database 6/10/2016.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
1 Entity Relationship Approach u Top-down approach to data modeling u Uses diagrams u Normalization - confirms technical soundness u Entity Relationship.
Lecture # 14 Chapter # 5 The Relational Data Model and Relational Database Constraints Database Systems.
Logical Database Design and the Rational Model
Chapter 5 Database Design
Relational Database Design by ER- and EER-to- Relational Mapping
Physical Structure of GDB
ER- and EER-to-Relational
Modern Systems Analysis and Design Third Edition
INSTRUCTOR: MRS T.G. ZHOU
CMSC-461 Database Management Systems
Presentation transcript:

Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University Funded by NSF

Information Exchange SourceTarget Information Extraction Schema Matching Leverage this … … to do this

Presentation Outline Overview Matching (Direct) Matching (Derived) Matching Algorithm Summary

Requirements 1.f is an injective function. 2.f maps obj. sets to obj. sets and rel. sets to rel. sets 3.f respects rel-set arities. 4.f respects referential integrity. 5.f respects types. 6.f respects real-world identity. 7.f ’s coercions are G/S compatible. 8.f respects subset constraints. 9.f respects mutual-exclusion constraints. 10.f respects union constraints

User Interaction (IDS Statements) Issue –Explains the issue –Example: units, may need transformation Default –Explains the default option –Example: if no transformation, no conversion Suggestion –Gives a suggestion about how to resolve the issue –Example: if needed, specify the conversion

Theorem Let f be the generated mapping from target t to source s, populated such that s has a valid interpretation. Let t’ be the submodel of t populated from s by f. Then t’ has a valid interpretation. Proof: the paper is the proof …

Target (Graphical View)

Target (Textual View)

Source Example (Assumed to be Populated)

Matching (Direct) Object Sets Relationship Sets

Object-Set Type Compatibility 1.type(a) = type(b) 2.type(a)  type(b) 3.type(a)  type(b) 4.type(a)  type(b)

type(a) = type(b) Same type –string = string, but Airport  Head Of State –Need better matching techniques Same type, different units –Size  Nr Sq Km –Need unit conversion Same type, different format –Date  Date, but 01/02/2002  Jan 2, 2002 –Need format conversion Same type, same units and format, different assumptions –Altitude  Altitude, but altitude of aircraft and spacecraft differ –Need same assumptions Same type, same units and format, same assumption, OIDs

type(a)  type(b) and type(a)  type(b) Real  Integer or Video  Image –Target has greater discriminating power –Can add.0 or make a video of a single image (?) Integer  Real or Image  Video –Source has greater discriminating power –Can round off or select one of the frames (?)

type(a)  type(b) Image  String –Mismatch, even if same attribute (e.g. both City) –Types can help discard potential matches String(5)  Integer –But suppose the integer is 2 –Might work, but is “2.000” ok?

Relationship Match Requirements Referential integrity Constraints –Cardinality –Mandatory/Optional

Referential Integrity a b a’ b’ TargetSource... a’’ The types of a, a’, and a’’ can all be different, but not arbitrary. Example: a (String), a’ (Integer), a’’ (Real).

Relationship-Set Constraint Compatibility 1.constr(a) constr(b) 2.(constr(a) constr(b)) 3.(constr(a) constr(b)) 4.(constr(a) constr(b))

constr(a) constr(b) Person Car owns drives o o o o Person Car ? o o Need more information to resolve: Perhaps “?” is “purchased.”

(constr(a) constr(b)) City City Map City City Map ab The target (a) expects many maps, but the source can’t supply them.

(constr(a) constr(b)) City City Map City City Map ab The target (a) expects one map, but the source can supply many.

(constr(a) constr(b)) City City Map City City Map ab The target (a) expects at least one and potentially many maps, but the source may have none or at most one. o

Matching (Derived) Generalization/Specialization Composite Values Derived Relationship Sets Displayable/Nondisplayable Object Sets

Generalization/Specialization For a target object set, a source object set may: –have no overlap (just ignore) –have a proper subset (accept or find missing generalization) –have the same values (direct match) –have a proper superset (hard, except for roles) –overlap (like proper subset and proper superset) Consider roles and missing generalizations

Roles target: source: City Travel Video CityClip: Video o o o o Video With City Scene Video With City Scene

Missing Generalization targetsource City MapCountry MapCity Map: ImageCountry Map: Image Map: Image  

Composite Values Composite in Source (split) Composite in Target (merge) Examples of Derived Relationships

Composite in Source Video Nr HoursNr Minutes Video Time Nr HoursNr Minutes targetsource Note also that we generated a source path.

Composite in Source Video Nr HoursNr Minutes Video Nr HoursNr Minutes targetsource

Composite in Target Video Nr HoursNr Minutes target Video Time source Time

Composite in Target Video target Video Time source Time

Displayable/Nondisplayable Object-Set Matches Nondisplayable in Source: find a key Nondisplayable in Target: create a key

Nondisplayable in Source targetsource Airport No Key: Discard Match City Airline flys to serves

Nondisplayable in Source targetsource Airport No Key: Discard Match City Airline flys to serves

Nondisplayable in Source targetsource Airport One Key: Choose it City Airline flys to serves Airport Name

Nondisplayable in Source targetsource Airport One Key: Choose it City Airline flys to serves Airport Name

Nondisplayable in Source targetsource Airport Two or more Keys: Choose One City Airline flys to serves Airport Name Airport Code

Nondisplayable in Source targetsource Airport Two or more Keys: Choose One City Airline flys to serves Airport Name Airport Code

Matching Algorithm

Sample Match Table

Pictorial View of Match Table target source

Summary

Concluding Remarks QED (the theorem holds) Let f be the generated mapping from target t to source s, populated such that s has a valid interpretation. Let t’ be the submodel of t populated from s by f. Then t’ has a valid interpretation. Proof: the paper is the proof …

Pictorial View of Match Table t = target s = source f = the mapping t’ has a valid interpretation t’ = submodel

Concluding Remarks QED (the theorem holds) Merge (several sources) –All sources extracted to same view –Union merge Object identity problems Constraint problems Source Modeling (convert to OSM) Framework defined, but not implemented