PMIT-6103 Advanced Database Systems

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Distributed Database Systems
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Distributed Database Systems Dr. Mohamed Osman Hegazi.
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
1 File Processing n Data are stored in files with interface between programs and files. n Various access methods exist (e.g., Sequential, indexed, random)
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline  Introduction à What is a distributed DBMS à Problems à Current state-of-affairs.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Distributed DBMSPage 4. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background  Distributed DBMS Architecture  Datalogical Architecture.
Overview Distributed vs. decentralized Why distributed databases
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline  Introduction à What is a distributed DBMS à Problems à Current state-of-affairs.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Distributed Databases
Outline Introduction Background Distributed Database Design
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
2. Introduction Chapter 1 Introduction 1.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
Distributed Database Management Systems
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
1 About the Instructor Name: Gong Zhiguo Office: N512 Phone: Remark: Some of the slides are tailored from.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Distributed Database Systems Overview
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline  Introduction à What is a distributed DBMS à Problems à Current state-of-affairs.
Distributed Databases Midterm review. Lectures covered Everything until (including) March 2 nd Everything until (including) March 2 nd Focus on distributed.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed Database Systems INF413. Distributed Database Management Systems, SAEED K. RAHIMI FRANK S. HAUG Course Books 2.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
Distributed Database Systems
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Distributed database system
CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Ch.1/1 Outline Introduction – What is a distributed DBMS – Distributed DBMS Architecture Background Distributed Database Design Database Integration Semantic.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Chapter 01 Introduction to Distributed Database. Overview File System ◦ Menyediakan suatu prosedur bagi suatu program untuk menyimpan, melakukan update,
Virtual University of Pakistan Distributed database Management Systems Lecture 03.
Distributed Database Concepts
DISTRIBUTED DATABASE ARCHITECTURE
Chapter 19: Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

PMIT-6103 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

Schedule Continue from 16.01.2015-…… Every week Friday From 2:30 PM-4:30 PM NB: Schedule may change

Grading Policy Attendance =10% Exercise test =10% Instant test Assignment Presentation Class Test (Average of three) =20% Final Examination =60% ================================ =100%

Course Plan Introduction (Lecture 01) Overview of Relational DBMS (Lecture 02, 03) Distributed Database Design (Lecture 04) Overview of Query Processing (Lecture 05) Distributed Query Processing (Lecture 06) Distributed Transaction Management (Lecture 07) Distributed Concurrency Control (Lecture 08, 09) Reliability (Lecture 10, 11) Parallel Database Systems (Lecture 12,13) Distributed Object DBMS (Lecture14) Tutorial-1 Tutorial-2 Tutorial-3 Tutorial-4

Text Book Principles of Distributed Database Systems M Text Book Principles of Distributed Database Systems M. Tamer Özsu & Patrick Valduriez

Exam schedule Tutorial Date and Time Tutorial-01 06th February 2015 20th March 2015 NB: Schedule may change

Lecture 01 Introduction to DDBMS

Outline Introduction Distributed Database System Applications Distributed DBMS Promises Problem Areas Architectural Models for Distributed DBMSs

Database Management Application program 1 DBMS Data description Data manipulation control

Integrate Databases and Commuinication Technology Computer Networks integration distribution Distributed Database Systems integration

Distributed Processing (Distributed Computing systems) A number of autonomous processing elements that are interconnected by a computer network and that cooperate in performing their assigned tasks. The “processing element” referred to a computing device that can execute a program on its own.

What is being distributed? Processing logic: processing logic or processing elements are distributed Functions: Various functions of a computer system could be delegated to various pieces of hardware or software Data: Data used by a number of applications may be distributed to a number of processing sites Control: The control of the execution of various tasks might be distributed instead of being performed by one computer system.

What is a Distributed Database System? “Distributed database system” (DDBS) is used to refer jointly distributed database and the distributed DBMS. A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (D–DBMS) is the software manages the DDB and provides an access mechanism makes this distribution transparent to the users.

What is a Distributed Database System? Physical distribution does not necessarily imply that the computer systems be geographically far apart; May be in the same room. The communication between them is done over a network instead of through shared memory or shared disk (multiprocessor systems) with the network as the only shared resource.

What is not a DDBS? A timesharing computer system A loosely or tightly coupled multiprocessor system Not DDBS, Because in DDBS communication between computer systems is done over a network instead of through shared memory or shared disk with the network as the only shared resource. A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node

Timesharing computer system The CPU time is shared by different processes Time slice is defined by the OS, for sharing CPU time between processes.

Shared-Memory Architecture (Tightly coupled) Pn D M Not a DDBS

Shared-Nothing Architecture (Loosely coupled) Each processor node has its own primary and secondary memory, may also have its own peripherals, are quite similar to the distributed environment, but there are differences. The fundamental difference is the mode of operation. Database systems that run over multiprocessor systems are called parallel database systems P1 M1 D1 Pn Mn Dn Not a DDBS

Centralized DBMS on a Network Site 1 Site 2 Site 5 Communication Network Site 4 Site 3 Not a DDBS

Distributed DBMS Environment Site 1 Site 2 Site 5 Communication Network Site 4 Site 3

Distributed DBMS - Reality User Query DBMS Software User Application DBMS Software DBMS Software Communication Subsystem User Application DBMS Software User Query DBMS Software User Query

Implicit Assumptions Data stored at a number of sites  each site logically consists of a single processor. Processors at different sites are interconnected by a computer network  no multiprocessors parallel database systems Distributed database is a database, not a collection of files  data logically related as exhibited in the users’ access patterns relational data model D-DBMS is a full-fledged DBMS not remote file system.

Applications Manufacturing - especially multi-plant manufacturing Military command and control Electronic fund transfers and electronic trading Corporate MIS Airline restrictions Hotel chains Any organization which has a decentralized organization structure

Distributed DBMS Promises Transparent management of distributed, fragmented, and replicated data Improved reliability/availability through distributed transactions Improved performance Easier and more economical system expansion

Transparent management of distributed, fragmented, and replicated data Example: Four relations: EMP(ENO, ENAME, TITLE) PROJ(PNO,PNAME, BUDGET) SAL(TITLE, AMT) ASG(ENO, PNO, RESP, DUR). For a centralized DBMS, find out the names of employees with salary who worked on a project for more than 12 months SELECT ENAME, AMT FROM EMP, ASG, SAL WHERE ASG.DUR > 12 AND EMP.ENO = ASG.ENO AND SAL.TITLE = EMP.TITLE Fully transparent access means that the users can still pose the query as specified above, without paying any attention to the fragmentation, location, or replication of data, and let the system worry about resolving these issues.

Example EMP ASG ENO ENAME TITLE ENO PNO RESP DUR E1 J. Doe Elect. Eng. Manager 12 E2 M. Smith Syst. Anal. E2 P1 Analyst 24 E3 A. Lee Mech. Eng. E2 P2 Analyst 6 E4 J. Miller Programmer E3 P3 Consultant 10 E5 B. Casey Syst. Anal. E3 P4 Engineer 48 E6 L. Chu Elect. Eng. E4 P2 Programmer 18 E7 R. Davis Mech. Eng. E5 P2 Manager 24 E6 P4 Manager 48 E8 J. Jones Syst. Anal. E7 P3 Engineer 36 E7 P5 Engineer 23 E8 P3 Manager 40 PROJ Sal PNO PNAME BUDGET TITLE AMT P1 Instrumentation 150000 Elect. Eng. 40000 P2 Database Develop. 135000 Syst. Anal. 34000 P3 CAD/CAM 250000 Mech. Eng. 27000 P4 Maintenance 310000 Programmer 24000

Transparent management of distributed, fragmented, and replicated data To localize data such that data about the employees in Waterloo office are stored in Waterloo, those in the Boston office are stored in Boston, and so forth. The same applies to the project and salary information. That is data is distributed. We partition each of the relations and store each partition at a different site. This is known as fragmentation. Data that are commonly accessed by one user can be placed on that user’s local machine as well as on the machine of another user with the same access requirements. That is data is replicated

Transparent Access Fully transparent access means that the users can still create the query without paying any attention to the fragmentation, location, or replication of data. let the system worry about resolving these issues. Paris projects Paris employees Paris assignments Boston employees Montreal projects New York projects with budget > 200000 Montreal employees Montreal assignments Boston Communication Network Montreal Paris New York Boston projects Boston assignments New York employees New York assignments Tokyo SELECT ENAME,AMT FROM EMP,ASG,SAL WHERE DUR > 12 AND EMP.ENO = ASG.ENO AND SAL.TITLE = EMP.TITLE

Transparent management of distributed, fragmented, and replicated data A transparent system “hides” the implementation details from users. Fundamental issue is to provide Data independence in the distributed environment Network (distribution) transparency Replication transparency Fragmentation transparency horizontal fragmentation: selection vertical fragmentation: projection hybrid

Data independence It refers to the immunity of user applications to changes in the definition and organization of data. Logical data independence Logical data independence refers to the immunity of user applications to changes in the logical structure (i.e., schema) of the database. Physical data independence Deals with hiding the details of the storage structure from user applications.

Network Transparency In centralized database systems, the only available resource that needs to be shielded from the user is the data. In a distributed database environment a second resource that needs to be managed in much the same manner: the network. The user should be protected from the operational details of the network; possibly even hiding the existence of the network. Then there would be no difference between database applications that would run on a centralized database and those that would run on a distributed database. This type of transparency is referred to as network transparency or distribution transparency.

Network Transparency From a DBMS perspective, distribution transparency requires that users do not have to specify where data are located. Sometimes two types of distribution transparency are identified: location transparency Naming transparency.

Network Transparency Location transparency refers to the fact that the command used to perform a task is independent of both the location of the data and the system on which an operation is carried out. Naming transparency means that a unique name is provided for each object in the database. In the absence of naming transparency, users are required to embed the location name as part of the object name.

Replication Transparency Distribute data in a replicated fashion across the machines on a network. If one of the machines fails, a copy of the data are still available on another machine on the network Increase reliability, and availability of data. Increases the locality of reference.

Replication Transparency Data are replicated, the transparency issue is: The users should not be aware of the existence of copies and the system should handle the management of copies. The users not to be involved with handling copies and having to specify the fact that a certain action can and/or should be taken on multiple copies.

Fragmentation Transparency Increase performance, availability and reliability. fragmentation can reduce the negative effects of replication. Each replica is not the full relation but only a subset of it; thus less space is required and fewer data items need be managed.

Fragmentation Transparency Horizontal fragmentation: A relation is partitioned into a set of sub-relations each of which have a subset of the tuples (rows) of the original relation. Vertical fragmentation: Where each sub-relation is defined on a subset of the attributes (columns) of the original relation.

Reliability Through Distributed Transactions Improve reliability since they have replicated components and, thereby eliminate single points of failure. The failure of a single site, or the failure of a communication link which makes one or more sites unreachable, is not sufficient to bring down the entire system.

Improved Performance Proximity to its points of use (also called data localization). Requires some support for fragmentation and replication. This has two potential advantages: Since each site handles only a portion of the database, contention for CPU and I/O services is not as severe as for centralized databases. Localization reduces remote access delays that are usually involved in wide area networks.

System Expansion Issue is database scaling One aspect of easier system expansion is economics. It normally costs much less to put together a system of “smaller” computers with the equivalent power of a single big machine.

Problem Areas First, data may be replicated in a distributed environment. A distributed data base can be designed so that the entire database, or portions of it, reside at different sites of a computer network. Second, if some sites fail (e.g., by either hardware or software malfunction), or if some communication links fail (making some of the sites unreachable) While an update is being executed, the effects will not be reflected on the data residing at the failing or unreachable. The third point is that since each site cannot have instantaneous information on the actions currently being carried out at the other sites, The synchronization of transactions on multiple sites is considerably harder than for a centralized system.

Architectural Models for Distributed DBMSs Possible ways in which a distributed DBMS may be architected: (1) Autonomy of local systems, (2) Their distribution, and (3) Their heterogeneity.

Architectural Models for Distributed DBMSs Autonomy Autonomy, refers to the distribution (or decentralization) of control, not of data. It indicates the degree to which individual DBMSs can operate independently. Autonomy is a function of a number of factors such as whether the component systems (i.e., individual DBMSs) exchange information, whether they can independently execute transactions, and whether one is allowed to modify them.

Architectural Models for Distributed DBMSs Dimensions of Autonomy Design autonomy Individual DBMSs are free to use the data models and transaction management techniques that they prefer. Communication autonomy Each of the individual DBMSs is free to make its own decision as to what type of information it wants to provide to the other DBMSs or to the software that controls their global execution. Execution autonomy Each DBMS can execute the transactions that are submitted to it in any way that it wants to.

Architectural Models for Distributed DBMSs Distribution The distribution dimension of the taxonomy deals with data. Physical distribution of data over multiple sites; The user sees the data as one logical pool. There are a number of ways DBMSs have been distributed. Two classes: client/server distribution peer-to-peer distribution (or full distribution).

Architectural Models for Distributed DBMSs Client/server distribution The client/server distribution concentrates data management duties at servers while the clients focus on providing the application environment including the user interface. The communication duties are shared between the client machines and servers.

Architectural Models for Distributed DBMSs Peer-to-peer distribution (or full distribution). In peer-to-peer systems, there is no distinction of client machines versus servers. Each machine has full DBMS functionality and can communicate with other machines to execute queries and transactions.

Architectural Models for Distributed DBMSs Heterogeneity Hardware heterogeneity Differences in networking protocols to variations in data managers. Heterogeneity in query languages not only involves the use of completely different data access paradigms in different data models. but also covers differences in languages even when the individual systems use the same data model.

Thank You

Exercise What is the basic difference between Database systems and distributed Database Systems? What is being distributed? Define a loosely or tightly coupled multiprocessor system Draw Distributed Database System –Reality What do you mean by replicated data? What are the Promises Distributed DBMS