Reference Book Principles of Distributed Database System Chapters 4. Distributed DBMS Architecture 5. Distributed Database Design 7.5 Layers of Query Processing.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Chapter 10: Designing Databases
Basis for Distributed Database Technology
Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Data - Information - Knowledge
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed DBMSPage 4. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background  Distributed DBMS Architecture  Datalogical Architecture.
Chapter 9 : Distributed Database.
Chapter 2 Database Environment. Agenda Three-Level ANSI-SPARC Architecture Database Languages Data Models Functions of DBMS Components of DBMS Teleprocessing.
Overview Distributed vs. decentralized Why distributed databases
Data Management I DBMS Relational Systems. Overview u Introduction u DBMS –components –types u Relational Model –characteristics –implementation u Physical.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
Outline Introduction Background Distributed Database Design
Distributed databases
Alexandria Dodd Janelle Toungett
Distributed Databases
DISTRIBUTED DBMS ARCHITECTURE
Distributed Databases and DBMSs: Concepts and Design
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Design – Lecture 16
Distributed DBMS Architecture
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
Chapter 2 Database Environment
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
DISTRIBUTED DATABASE DESIGN
Session-9 Data Management for Decision Support
Distributed DBMS Architecture
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
DDBMS Distributed Database Management Systems Fragmentation
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Object storage and object interoperability
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Distributed DBMS Architecture Chapter 4 Principles Of Distributed Database Systems,2/e By Ozsu, Patrick Valduriez.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Databases and DBMSs Todd S. Bacastow January 2005.
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Distributed Database Management Systems
Data, Databases, and DBMSs
Database Architecture
Distributed Database Management Systems
Distributed Database Management System
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

Reference Book Principles of Distributed Database System Chapters 4. Distributed DBMS Architecture 5. Distributed Database Design 7.5 Layers of Query Processing Preethi Vishwanath Week 2 : 5 th September 2006 – 12 th September 2006

–External View, which is that of the user, who might be a programmer basically concerned with how users view the data. basically concerned with how users view the data. –Conceptual view, that of the enterprise –Internal View, that of a system or a machine, that of a system or a machine, deals with the physical definition and organization of data. deals with the physical definition and organization of data.Users External View Conceptual View Internal View ANSI/SPARC Architecture

Possible ways to put together multiple databases Autonomy of Local Systems –Refers to distribution of control –Indicates degree of independence of individual databases Alternatives to autonomy –Tight Integration Single image of entire db_ Is available for any user who wants to share the info, which may reside in multiple db_. –Semiautonomous systems Consists of DBMSs that can operate independently, but have decided to participate in a federation. –Total Isolation Stand Alone DBMs

Distribution –Deals with Physical distribution of data over multiple sites –Three alternative architectures available Client-Server, communication duties are shared between the client machines and servers. Peer-to-peer systems, no distinction of client machines versus servers. Non-distributed systems

Heterogeneity –Occurs in Various forms –Data models: Representing data with different modeling tools –Query Languages: Not only involves the use of completely different data access paradigms in different data models, but also covers difference in languages, even when the individual systems use the same data model.

Client-Server architecture Distinguish the functionality and divide these functions into two classes, server functions and client functions. Server does most of the data management work –query processing –data management –Optimization –Transaction management etc Client performs –Application –User interface –DBMS Client model Multiple Client - Single Server –Single Server accessed by multiple clients Multiple Client – Multiple Server –Multiple Servers accessed by multiple clients –2 alternate management strategies 1. Heavy client Systems –Each client manages its own connection to the appropriate server. –Simplifies server code –Loads client machines with additional responsibilities 2. Light Client Systems –Each client knows of only its “home server” which then communicates with other servers as required. –Concentrates on data management functionality at the servers.

Peer-to-Peer Distributed Systems Schemas Present –Individual internal schema definition at each site, local internal schema –Enterprise view of data is described the global conceptual schema. –Local organization of data at each site is describe in the local conceptual schema. –User applications and user access to the database is supported by external schemas. Local conceptual schemas are mappings of the global schema onto each site. Databases are typically designed in a top-down fashion, and, therefore all external view definitions are made globally. Major Components of a Peer- to-Peer System –User Processor –Data processor

Peer-to-Peer Distributed Systems User Processor User-interface handler responsible for interpreting user commands, and formatting the result data Semantic data controller checks if the user query can be processed. checks if the user query can be processed. Global Query optimizer and decomposer determines an execution strategy determines an execution strategy Translates global queries into local one. Distributed execution Coordinates the distributed execution of the user request Data processor Local query optimizer Acts as the access path selector Responsible for choosing the best access path Local Recovery Manager Makes sure local database remains consistent Run-time support processor Is the interface to the operating system and contains the database buffer Responsible for maintaining the main memory buffers and managing the data access.

MDBS Architecture Models Using a Global Conceptual Schema GCS is defined by integrating either the external schemas of local autonomous databases or parts of their local conceptual schema Users of a local DBMS define their own views on the local database. If heterogeneity exists in the system, then two implementation alternatives exist: unilingual and multilingual Unilingual requires the users to utilize possibly different data models and languages Basic philosophy of multilingual architecture, is to permit each user to access the global database. GCS in multi-DBMS –Mapping is from local conceptual schema to a global schema –Bottom-up design Models without a global conceptual schema Consists of two layers, local system layer and multi database layer. Local system layer, present to the multi-database layer the part of their local database they are willing share with users of other database. System views are constructed above this layer Responsibility of providing access to multiple database is delegated to the mapping between the external schemas and the local conceptual schemas. Full-fledged DBMs, exists each of which manages a different database. GCS in Logically integrated distributed DBMS –Mapping is from global schema to local conceptual schema –Top-down procedure

Global Directory Issues Global Directory is an extension of the normal directory, including information about the location of the fragments as well as the makeup of the fragments, for cases of distributed DBMS or a multi- DBMS, that uses a global conceptual schema, Global Directory Issues –Relevant for distributed DBMS or a multi-DBMS that uses a global conceptual schema –Includes information about the location of the fragments as well as the makeup of fragments. –Directory is itself a database that contains meta-data about the actual data stored in database. –Three issues A directory may either be global to the entire database or local to each site. Directory may be maintained centrally at one site, or in a distributed fashion by distributing it over a number of sites. –If system is distributed, directory is always distributed Replication, may be single copy or multiple copies. –Multiple copies would provide more reliability

Organization of Distributed systems Three orthogonal dimensions –Level of sharing No sharing, each application and data execute at one site Data sharing, all the programs are replicated at other sites but not the data. Data-plus-program sharing, both data and program can be shared –Behavior of access patterns Static –Does not change over time –Very easy to manage Dynamic –Most of the real life applications are dynamic –Level of knowledge on access pattern behavior. No information Complete information –Access patterns can be reasonably predicted –No deviations from predictions Partial information –Deviations from predictions

Top Down Design –Suitable for applications where database needs to be build from scratch –Activity begins with requirement analysis –Requirement document is input to two parallel activities: view design activity, deals with defining the interfaces for end users view design activity, deals with defining the interfaces for end users conceptual design, process by which enterprise is examined –Can be further divided into 2 related activity groups Entity analyses, concerned with determining the entities, attributes and the relationship between them Functional analyses, concerned with determining the fun Distributed design activity consists of two steps –Fragmentation –Allocation Bottom-Up Approach –Suitable for applications where database already exists –Starting point is individual conceptual schemas –Exists primarily in the context of heterogeneous database.

Fragmentation Advantages 1. Permits a number of transactions to executed concurrently 2. Results in parallel execution of a single query 3. Increases level of concurrency, also referred to as, intra query concurrency 4. Increased System throughput Disadvantages 1. Applications whose views are defined on more than one fragment may suffer performance degradation, if applications have conflicting requirements. 2. Simple asks like checking for dependencies, would result in chasing after data in a number of sites

Horizontal Fragmentation Rows split : Sal > 20K Vertical Fragmentation Columns split : Primary Key retained IdNameSalDept100A10KD1 200B20KD2 300C30KD3 IdNameSalDept100A10KD1 200B20KD2 IdNameSalDept300C30KD3 IdName100A 200B 300C IdSalDept10010KD KD KD3

Correctness rules of fragmentation Completeness If a relation instance R is decomposed into fragments R 1,R 2 …. R n, each data item that can be found in R can also be found in one or more of R i ’s. If a relation instance R is decomposed into fragments R 1,R 2 …. R n, each data item that can be found in R can also be found in one or more of R i ’s.Reconstruction If a relation R is decomposed into fragments R 1,R 2 …. R n, it should be possible to define a relational operator such that If a relation R is decomposed into fragments R 1,R 2 …. R n, it should be possible to define a relational operator such that R = ▼R i, ¥R i ε F R, R = ▼R i, ¥R i ε F R, Please note the operator would be different for the different forms of fragmentation Disjointness If a relation R is horizontally decomposed into fragments R 1,R 2 …. R n, and data item d i is in R j, it is not in any other fragment Rk (k != j). If a relation R is horizontally decomposed into fragments R 1,R 2 …. R n, and data item d i is in R j, it is not in any other fragment Rk (k != j).

Comparison of Replication Alternatives Full Replication Partial Replication Partitioning Query Processing Easy Same SameDifficulty Directory Management Easy or nonexistent Same SameDifficulty Concurrency Control ModerateDifficultEasy Reliability Very High HighLow Reality Possible Application Realistic Possible application

Derived Horizontal Fragmentation Defined on a member relation of a link according to a selection operation specified on its owner. Link between the owner and the member relations is defined as equi-join Link between the owner and the member relations is defined as equi-join An equi-join can be implemented by means of semijoins. Given a link L where owner (L) = S and member (L) = R, the derived horizontal fragments of R are defined as R i = R α S i, 1 <= I <= w R i = R α S i, 1 <= I <= wWhere, S i = σ F i (S) S i = σ F i (S) w is the max number of fragments that will be defined on F i is the formula using which the primary horizontal fragment S i is defined Example Consider two tables EmpPAY PAY1 = EMP1 α PAY PAY2 = EMP2 α PAY Emp 1 = σ Sal <= 20K (Emp) Emp 2 = σ Sal > 20K (Emp) PAY1PAY2 IdNameDept 100AD1 200BD2 300CD3DeptSalD110K D220K D330K IdNameDept100AD1 200BD2 IdNameDept300CD3

Primary Horizontal Fragmentation Primary horizontal fragmentation is defined by a selection operation on the owner relation of a database schema. Given relation R i, its horizontal fragments are given by R i = σ Fi (R),1<= i <= w Fiselection formula used to obtain fragment R i Fiselection formula used to obtain fragment R i The example mentioned in slide 20, can be represented by using the above formula as Emp 1 = σ Sal <= 20K (Emp) Emp 1 = σ Sal <= 20K (Emp) Emp 2 = σ Sal > 20K (Emp) Emp 2 = σ Sal > 20K (Emp) Vertical Fragmentation Grouping Starts by assigning each attribute to one fragment At each step, joins some of the fragments until some criteria is satisfied. Results in overlapping fragments Splitting Starts with a relation and decides on beneficial partitioning based on the access behavior of applications to the attributes Fits more naturally within the top-down design Generates non-overlapping fragments.

Hybrid Fragmentation Horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user applications. In certain cases, a vertical fragmentation may be followed by a horizontal one, or vice versa. Since two types of partitioning strategies are applied one after the other, this alternative is called hybrid fragmentation. R R1R2 R11 R12 R21 R22 R23 In case of horizontal fragmentation, one has to stop when each fragment consists of only one tuple, whereas the termination point for vertical fragmentation is one attribute per fragment. Example discussed in slides 20 and 26 can be converted into hybrid fragmentation U α α R 11 R 12 R 21 R 22 R 23