1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru.

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Database Management Systems Purpose of Database Systems View of Data.
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Data Access Patterns. Motivation Most software systems require persistent data (i.e. data that persists between program executions). In general, distributing.
1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
ADVANCED DATABASES WITH ORACLE 11g FOR ADDB7311 LEARNING UNIT 1 of 7.
Module Title? DBMS Introduction to Database Management System.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Fundamentals of Information Systems, Fifth Edition
Jeremy D. Bartley Kansas Geological Survey An Introduction to an Index of Geospatial Web Services.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
CORE 2: Information systems and Databases CENTRALISED AND DISTRIBUTED DATABASES.
Csi315csi315 Client/Server Models. Client/Server Environment LAN or WAN Server Data Berson, Fig 1.4, p.8 clients network.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
The Saguaro Digital Library for Natural Asset Management Dr. Sudha RamSudha Ram Advanced Database Research Group Dept. of MIS The University of Arizona.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Federated Database Set Up Greg Magsamen ITK478 SIA.
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Database Management Systems (DBMS)
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
Glossary WMS – OGC Web Mapping Services WFS – OGC Web Feature Services XML- Extensible Markup Language OGC – Open GIS Consortium ADN –
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
A Use Case for GEON 1 A user request of the form: “For a given region (i.e. lat/long extent, plus depth), return a 3D structural model with accompanying.
Databases Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
CS 325 Spring ‘09 Chapter 1 Goals:
Databases and DBMSs Todd S. Bacastow January 2005.
Unit 1: INTRODUCTION Database system, Characteristics Database Users
Data and Applications Security Developments and Directions
Data warehouse and OLAP
UCSD Neuron-Centered Database
Chapter 1: Introduction
Data Warehouse.
Grid Data Integration In the CMS Experiment
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data and Applications Security Developments and Directions
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Data and Applications Security Developments and Directions
Chapter 1: Introduction
Presentation transcript:

1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru

2 What is the issue? Ability to access data stored in multiple, different databases using a single request, e.g. –Get geologic information from multiple geologic databases –Get employee information from all branches Ability to update data stored in multiple databases, e.g. –Transfer salary amount from University to my bank account –Transfer funds from Visa account to vendor’s account

3 Distributed data access Client Database 1Database 2Database 3 Homogeneous: mySQL mySQL mySQL Heterogeneous: mySQL Oracle DB2 How about creating a “cached” local copy? mySQLExcelASCII flat file

4 Data Warehousing Client Data Source 1 Data Source 2Data Source 3 Data Warehouse (common schema) ETL – Extract – Transform – Load ETL 1. Load data from sources to warehouse 2. Query processing interaction only between client and warehouse But, warehouse data could be “stale”, i.e. out of synch with source data…

5 Data integration via middleware Client Database 1 Database 2Database 3 Data integration Middleware (aka Mediator) 1. Each client request goes to sources, via middleware 2. Result collected by middleware and returned to client

6 Warehousing vs Mediation Warehousing: User ETL to “massage” local data to fit into a common global, warehouse schema Mediation: Modify user query to match schemas exported by each source –But, which schema does the user query? –The Integrated View Schema –Sources “export” a view (the export schema) Federated databases –Local sources belong to different “administrative domains”, i.e. different owners. –Local autonomy

7 The Canonical Mediator / Wrapper Architecture Client Application Wrapper Mediator (Integrated view in mediator data model, e.g. relational, XML) Local view in local data model Export view in mediator data model Q1Q1 Q 11 Q 12 Q 13 Q 14 Cached data Wrapper processes could execute at sources, at mediator, or elsewhere q 14 Data source 1 Local schema Data source 2 Local schema Data source 3 Local schema Data source 4 Local schema

8 Example: A Relational Mediator Client Application Mediator (Relational data model) Wrapper Relational DBMS e.g. PostGIS Shape file

9 Example: A Shape-file Based Mediator Client Application Mediator (Shape file-based data model) Wrapper Relational DBMS e.g. PostGIS Shape file

10 Example: An XML Mediator User / Applications Mediator (XML-based data model, e.g. GML) Wrapper Relational DBMS e.g. PostGIS Shape file Wrapper XML file e.g. ArcXML

11 User Authentication and Access Control Client Application Mediator Wrapper Data source 1 Data source 2 2. User connects to mediator (passes credentials to mediator) 1. User authenticates to system 3.Mediator connects to sources a)Using original user credentials b)Or, mapped credentials (role-based access) 4. Need to define users or roles in sources

12 Different types of heterogeneity in data integration Platform heterogeneity: different OS platforms DBMS heterogeneity: different database systems, e.g. SQLServer, mySQL, DB2 Data type heterogeneity Schema heterogeneity Heterogeneity in units, accuracy, resolution Semantic heterogeneity

13 A long standing Computer Science problem Simple case –Mediator View: (SampleID varchar, Rock_Type varchar, Age int) –In Source2 Table, map Age to int Wrapper: convert between int and varchar for Age Wrapper Sample ID: Rock type: Age: … varchar varchar int Schema Integration Sample ID: Rock type: Age: … varchar varchar varchar Source 1 Table Source 2 Table

14 Another integration scenario –Mediator View: (SampleID varchar, Rock_Type varchar, Age varchar, Era varchar, Period varchar) –In Source 2 Table, parse Age to obtain sub-components of the field Sample ID: Rock type: Eon: Era: Period: varchar varchar varchar varchar varchar Phanerozoic Mesozoic Jurassic “Phanerozoic/mesozoic;jur” Source 1 Table Sample ID: Rock type: Age: varchar varchar varchar Source 2 Table

15 A more advanced integration scenario Mediator View: (SampleID varchar, Rock_Type varchar, Eon varchar, Era varchar, Period varchar) –Same as Source1 table schema Query: Get rock types for all rocks from the Jurassic period Sample ID: Rock type: Eon: Era: Period: varchar varchar varchar varchar varchar Phanerozoic Mesozoic Jurassic 150 Source 1 Table Sample ID: Rock type: Age: varchar varchar int Source 2 Table

16 Doing the integration Query sent to mediator: SELECT DISTINCT(Rock_Type) FROM Mediator_View WHERE Period=‘Jurrasic’ Query to Source 1: SELECT DISTINCT(Rock_Type) FROM Source1_Table WHERE Period=‘Jurrasic’ For Source2, need to map Period=“Jurassic” to Age values Sample ID: Rock type: Age: varchar varchar int Source 2 Table Eon: Era: Period: Min Max varchar varchar varchar int int Geologic_Time Table

17 Query “fragment” sent to Source 2 SELECT DISTINCT (S2.Rock_Type) FROM Source2_Table S2, Geologic_Time_Table GT WHERE GT.Period = ‘Jurrasic’ AND (S2.Age >= GT.Min) AND (S2.Age <= GT.Max) Where is the Geologic_Time table stored ?

18 Another complex query Query: Get rock types for all rocks from the mesozoic era –Easy to do for Source 1: Era = “Mesozoic” –For Source 2: Need to find numeric age range for Mesozoic –Find age range across all subclasses of Mesozoic (Cretaceous, Jurassic, Triassic) Select all Source 2 Table records whose age range falls within the Mesozoic age range

19 Data Integration Carts © Integrating data sets without explicitly creating views An example request: Plot all gravity data points that fall within the spatial extent of rocks of a given type, in the Rocky Mountain testbed region –Use GEONsearch to find all gravity and geologic data using bounding box for “Rocky Mountain testbed region” Need gazeteer / spatial ontology to determine Rocky Mountain region Need to know classification of datasets (as gravity and geology) Intersect extent of gravity and geologic datasets (from metadata) with extent of Rocky Mountain region –Plot gravity point data that fall within polygons of rocks of given type

20 Ad hoc integration GEONsearch Plot map Map Data Integration Cart © Query Search Metadata Catalog “Geologic and gravity data in Rocky Mountains”

21 Data Registration Igneous GraniteQuartzmonzonite Rock Classification Ontology Gravity dataset (X, Y) Metadata Geologic dataset Lat, Long, RockType Metadata Item Detail Registration Item Registration (Schema registration) Location LatitudeLongitude Spatial OntologyPoint Polygon

22 Data Registration is Important!