Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Lesson-10 Information System Building Blocks(2)
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
Web Mining Research: A Survey
Integrating Hypermedia Functionality into Database Applications Anirban Bhaumik * +, Deepti Dixit *, Roberto Galnares *, Manolis Tzagarakis **, Michalis.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Bina Nusantara 2 C H A P T E R INFORMATION SYSTEM BUILDING BLOCKS.
Overview of the Database Development Process
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Dimitrios Skoutas Alkis Simitsis
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
Data Mining By Dave Maung.
Copyright © 2004 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS6th Edition Irwin/McGraw-Hill.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Foundations of Business Intelligence: Databases and Information Management.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Week 7 Lecture Part 2 Introduction to Database Administration Samuel S. ConnSamuel S. Conn, Asst Professor.
Data Warehousing Data Mining Privacy. Reading FarkasCSCE Spring
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
UNIVERSITY UTARA MALAYSIA COLLEGE OF ARTS & SCIENCES.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Data Mining – Intro.
facilitating the Net-enabled Ecosystem
“New” things Discussed in London
Methodology Logical Database Design for the Relational Model
Client/Server Databases and the Oracle 10g Relational Database
Chapter 2 Database System Concepts and Architecture
Fundamentals of Information Systems, Sixth Edition
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
CS 174: Server-Side Web Programming February 12 Class Meeting
Introduction to Database Systems
Chapter 2 Database Environment Pearson Education © 2009.
Order Database – ER Diagram
RichAnnotator: Annotating rich (XML-like) documents
Semantic Interoperability and Data Warehouse Design
Database Systems Instructor Name: Lecture-3.
COMP 208/214/215/216 – Lecture 7 Documenting Design.
Web Mining Department of Computer Science and Engg.
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Systems Architecture & Design Lecture 3 Architecture Frameworks
Chapter 2 Database Environment Pearson Education © 2014.
Using Mapping Relations to Semi Automatically Compose Web Services
Data Mining.
Metadata use in the Statistical Value Chain
Web Mining Research: A Survey
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Summary of Transformation Rules
“New” things Discussed in London
Presentation transcript:

Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona A Knowledge Discovery Approach for Semantic Interoperability among Multiple Heterogeneous Data Sources Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona

Need for Integration

A Framework for Integration Schema Translation Describing individual data sources in a common data model, e.g, USM. Interschema Relationship Identification Identifying related data objects from multiple data sources. Integrated Schema Generation Generating an integrated schema of the data sources. Schema Mapping Generation Mapping the integrated schema to local schemas. Most critical and time- consuming. Intensive human interaction

Understanding Correspondences Two weeks later, “That depends, you know.” Domain Experts Letter, phone or fax: “Does their mission start time mean the same as your mission take off time?” The next day, “I maintain the database. But how to interpret the data is up to the domain experts. " Integrator Local DBA Volume: Hundreds of tables, thousands of attributes. MITRE has spent several years, largely on human interaction, to integrate several database systems of the U.S. Air Force.

A Knowledge Discovery Approach Pre-processing Data Mining Post-processing

Data Mining Technique Self-Organizing Map Clustering data objects by "semantic" similarity. Visualizing the semantic clusters on a "Map". Suggesting similar or related objects. Combing all available information Entity name, attribute name, relationship name Schematic information (keys, data types, etc.) Data contents (domains, mean, stddev, %nulls, etc.) Documentation (information retrieval techniques) Usage data (frequency of usage, #users, etc.) Other available information (e.g., business rules)

A Motivating Example Legend

A Semantic Map of Attributes Response nodes are labeled with attribute numbers. Gray levels indicate similarities between clusters. The darker a node, the less similar it is to its neighbors. There is almost no boundary between attribute 4 (DB1.courses.name) and attribute 38 (DB2.course. name), indicating that they are very similar. But they are quite different from attribute 17 (DB1. professors. name) and attribute 32 (DB2.faculty. name).

Future Work Automate the extraction of semantic information from different types of data sources. Integrate the Interschema Relationship Identification procedure into a complete integration system, which provides the user an interaction interface and guides the user through a semi-automatic integration process. Validate the utility of this approach in real-world data warehousing and web information discovery application.