# Distributed Database Management System Lecture 8.

## Presentation on theme: "Distributed Database Management System Lecture 8."— Presentation transcript:

Distributed Database Management System Lecture 8

BZUPAGES.COM 2SemijoinSemijoin Subset of tuples of R that participate in join of R with SSubset of tuples of R that participate in join of R with S R S = F R  A  B  A  B (S) F

BZUPAGES.COM 3DivisionDivision The division of R with degree r with S with degree s is the set of (r-s)-tuples t such thatThe division of R with degree r with S with degree s is the set of (r-s)-tuples t such that

BZUPAGES.COM 4 for all s-tuples in, the tuple tu is in R. for all s-tuples in, the tuple tu is in R.  A (R)-  A (  A (R) xS)-R) R  S =  A (R)-  A (  A (R) xS)-R) for all s-tuples in, the tuple tu is in R. for all s-tuples in, the tuple tu is in R.

BZUPAGES.COM 5 eNopNopNamebudget E1 E2 E1 E4P1P1P3P2BridgeBridgeTowerMosque11.5m11.5m10.2m9.1mpNopNamebudgetP1P3BridgeTower11.5m10.2m R S The employees who work in all projects more than 10M budget

Relational Calculus

BZUPAGES.COM 7 Rather than how to obtain results, we say what the result is by specifying relationship between dataRather than how to obtain results, we say what the result is by specifying relationship between data

BZUPAGES.COM 8 Tuple Relational Calculus Based on first-order predicate logicBased on first-order predicate logic Expressed asExpressed as { t | F(t)} Where t is a tuple variable and F is well-formed formulaWhere t is a tuple variable and F is well-formed formula

BZUPAGES.COM 9 Find the set of all tuples t such that F(t) is true, where F implies the predicate condition. Find the set of all tuples t such that F(t) is true, where F implies the predicate condition.

BZUPAGES.COM 10 Atomic Formula Tuple-variable membership expression: specified as R(t) or R.tTuple-variable membership expression: specified as R(t) or R.t ConditionsConditions –s[A] Θ t[B] –S[A] Θ c

BZUPAGES.COM 11SQLSQL Language based on Tuple- oriented CalculsLanguage based on Tuple- oriented Calculs

BZUPAGES.COM 12ExampleExample Select EMP.eName, DEP.dName from EMP, DEP where EMP.dNo = DEP.dNoSelect EMP.eName, DEP.dName from EMP, DEP where EMP.dNo = DEP.dNo

BZUPAGES.COM 13 Domain Relational Calculus Domain variable ranges over values in a domain and specifies a tupleDomain variable ranges over values in a domain and specifies a tuple

BZUPAGES.COM 14 A query in DRCA query in DRC x 1,x 2,….x n |F(x 1,….x n ) where F is a wff and Xs are free variables Implementation: QBEImplementation: QBE Query by Example (QBE) is a method of creating database queries using examples based on a text string, the name of a document or a list of documents. The QBE system converts the user input into a formal database query. This approach allows the user to perform powerful searches without the need of having to learn a more formalized query mechanism such as Structured Query Language (SQL).Query by Example (QBE) is a method of creating database queries using examples based on a text string, the name of a document or a list of documents. The QBE system converts the user input into a formal database query. This approach allows the user to perform powerful searches without the need of having to learn a more formalized query mechanism such as Structured Query Language (SQL).

Interface with Programming Languages

BZUPAGES.COM 16 Tightly CoupledTightly Coupled –Programming Language and database languages are merged Loosely CoupledLoosely Coupled –PL is extended with special concepts

Computer Networks

BZUPAGES.COM 18 A computer network is a system for communication between two or more computers A computer network is a system for communication between two or more computers

BZUPAGES.COM 19 Computers areComputers are –Interconnected –Autonomous NWing involvesNWing involves –Hardware components –Software components

BZUPAGES.COM 20 Computers are called Nodes, sites, hosts, in general node or host is meant the hardware and site means hw+swComputers are called Nodes, sites, hosts, in general node or host is meant the hardware and site means hw+sw Other equipment at nodes is also possible like Printers, Disks etc.Other equipment at nodes is also possible like Printers, Disks etc. Equipment connected via links and channels, link is a physical thing where as channel is logical oneEquipment connected via links and channels, link is a physical thing where as channel is logical one

Data Communication

BZUPAGES.COM 22 Comm links carry data in form of Digital or analog signalsComm links carry data in form of Digital or analog signals Each channel has a certain capacity, that is capability of transmitting data over a certain time unitEach channel has a certain capacity, that is capability of transmitting data over a certain time unit This capacity is referred as bandwidthThis capacity is referred as bandwidth

BZUPAGES.COM 23 Data transmitted on analog links is to be Modulated which is done by changing three basic properties carrier signalData transmitted on analog links is to be Modulated which is done by changing three basic properties carrier signal At the receiving end it has to be DemodulatedAt the receiving end it has to be Demodulated Modem is the device that performs this taskModem is the device that performs this task

BZUPAGES.COM 24 Multiplexing is the technique that allows multiple signals to be transmitted over the same line simultaneouslyMultiplexing is the technique that allows multiple signals to be transmitted over the same line simultaneously Two types FDM, TDMTwo types FDM, TDM

BZUPAGES.COM 25 Mode of Operation Simplex: link operates in only one direction, like printersSimplex: link operates in only one direction, like printers Half Duplex: can transmit in both directions, but not simultaneously, link has to be “turned around”Half Duplex: can transmit in both directions, but not simultaneously, link has to be “turned around” Full Duplex: Simultaneously both waysFull Duplex: Simultaneously both ways

BZUPAGES.COM 26 Performance of a Communication System BandwidthBandwidth Mode of OperationMode of Operation Software employedSoftware employed –Redundancies within message –Headers and trailers with the message

BZUPAGES.COM 27 Header Block Error Check Text Source Address Destination Address Message Number Packet Number Acknowledgement Control Information

Types of Networks

BZUPAGES.COM 29 Classification Criteria Interconnection Structure (Topology)Interconnection Structure (Topology) Transmission ModeTransmission Mode Geo. Distribution (Scale)Geo. Distribution (Scale)

Topology based Classification

BZUPAGES.COM 31StarStar Central Control Node

BZUPAGES.COM 32StarStar All communication via a central nodeAll communication via a central node Excessive load on the central nodeExcessive load on the central node Disadvantage is that if the central node fails all network goes downDisadvantage is that if the central node fails all network goes down

BZUPAGES.COM 33 Ring Network Unidirectional Ring Interface

BZUPAGES.COM 34 Ring Network Computers connected with transmission media in the form of loopComputers connected with transmission media in the form of loop Each station also serves as a repeater, it repeats the signal that it receivesEach station also serves as a repeater, it repeats the signal that it receives Control is generally managed via a TokenControl is generally managed via a Token

BZUPAGES.COM 35 A token is circulated on the around the network, with certain bit pattern to indicate the network is freeA token is circulated on the around the network, with certain bit pattern to indicate the network is free Any site wanting to communicate grabs the token, sets it to busy and then sends the messageAny site wanting to communicate grabs the token, sets it to busy and then sends the message When communication is over, the site again sets token to free BPWhen communication is over, the site again sets token to free BP

BZUPAGES.COM 36 To improve the reliability a double loop topology has been proposed that lessens the chance of network failure in case of a single node breakageTo improve the reliability a double loop topology has been proposed that lessens the chance of network failure in case of a single node breakage

BZUPAGES.COM 37 Bus Network BUS

BZUPAGES.COM 38 Bus Topology Common channel used to transmit and receive dataCommon channel used to transmit and receive data Link control is performed asLink control is performed as –CSMA –CSMA/CD In addition, token can also be usedIn addition, token can also be used

BZUPAGES.COM 39CSMA/CDCSMA/CD Behave in CSMA, except that node keep listening to bus after even they have transmittedBehave in CSMA, except that node keep listening to bus after even they have transmitted The purpose is to detect if some collision has occurredThe purpose is to detect if some collision has occurred Collision occurs when multiple sites try to transmit at the same timeCollision occurs when multiple sites try to transmit at the same time

BZUPAGES.COM 40 When collision is detected, sites abort the transmissions, wait for an arbitrary time and re-transmit the messageWhen collision is detected, sites abort the transmissions, wait for an arbitrary time and re-transmit the message

BZUPAGES.COM 41 Meshed Network

BZUPAGES.COM 42 Meshed Network Every computer connected with every other.Every computer connected with every other. Gives maximum reliability, but is not practicable even for not a very large networkGives maximum reliability, but is not practicable even for not a very large network

Transmission Mode

BZUPAGES.COM 44 Point to Point (unicast)Point to Point (unicast) Broadcast (multi-point networks)Broadcast (multi-point networks)

BZUPAGES.COM 46 The intermediate nodes check the destination address in the message header, if not for them transmit to next intermediate nodeThe intermediate nodes check the destination address in the message header, if not for them transmit to next intermediate node Communication medium is generally Coaxial, Twisted Pair or the Fibre optic cablesCommunication medium is generally Coaxial, Twisted Pair or the Fibre optic cables

BZUPAGES.COM 47 Broadcast Networks Common channel utilized by all nodesCommon channel utilized by all nodes Message received by all, ownership checkedMessage received by all, ownership checked Multicasting: message sent to a certain subset of nodes in nwMulticasting: message sent to a certain subset of nodes in nw Generally Radio or Satellite basedGenerally Radio or Satellite based

BZUPAGES.COM 48 In Satellite based, each site beams transmission to satelliteIn Satellite based, each site beams transmission to satellite That beams it back at a different frequencyThat beams it back at a different frequency Broadcast Networks can also use Microwave that can be over Satellite or TerrestrialBroadcast Networks can also use Microwave that can be over Satellite or Terrestrial

ScaleScale

BZUPAGES.COM 50 Local Area NWLocal Area NW Metropolitan Area NWMetropolitan Area NW Wide Area NWWide Area NW Distinction between them is blurred, still they existDistinction between them is blurred, still they exist Major categorization is probably Protocols, to be discussed nextMajor categorization is probably Protocols, to be discussed next

BZUPAGES.COM 51WANsWANs Used Inter-City, country or even continentalUsed Inter-City, country or even continental Gives low bandwidth, high latency due to different switching, equipment and transmission mediumGives low bandwidth, high latency due to different switching, equipment and transmission medium

BZUPAGES.COM 52 Can be Broadcast and Point to PointCan be Broadcast and Point to Point In Point to PointIn Point to Point –Circuit Switching: generally used in telephone connections, connection between sender and receiver is maintained till the end of communication –Packet Switching

BZUPAGES.COM 53 In Packet switchingIn Packet switching –Message is broken into packets, each packet transmitted individually, and may take different route but to the same destination –May reach out of order, destination will have to sort them into original order

BZUPAGES.COM 54 Advantages of Packet Switching Higher utilization of link, since it is not dedicated for a certain communicationHigher utilization of link, since it is not dedicated for a certain communication Computer communication is bursty in nature not continuous, meanwhile others can use the linkComputer communication is bursty in nature not continuous, meanwhile others can use the link Message can be sent in parallelMessage can be sent in parallel

BZUPAGES.COM 55LANsLANs Small geographical area (usu. 2 km) High bandwidth Low latency Technology – Mainly Ethernet, now 100/1000Mbps

BZUPAGES.COM 56MANsMANs Between LAN and WANBetween LAN and WAN Cover city or portionCover city or portion Larger LANsLarger LANs

BZUPAGES.COM 57

BZUPAGES.COM 58 Protocol Standards

BZUPAGES.COM 59 Connecting computers is not enough to establish communication Requires software systems called protocols Set of rules and formats for exchanging data, arranged into layers called protocol suite/ stack.

BZUPAGES.COM 60 WAN faces max heterogeneity, of varying equipment, word length, speed, coding scheme etc.WAN faces max heterogeneity, of varying equipment, word length, speed, coding scheme etc. Needs protocols than othersNeeds protocols than others Most widely known WAN protocol is based ISO/OSI architecture (International Standards Organization, Open Systems inter connectionMost widely known WAN protocol is based ISO/OSI architecture (International Standards Organization, Open Systems inter connection

BZUPAGES.COM 61 ISO/OSI Architecture Network built in seven layersNetwork built in seven layers Interfaces for passing information b/w layersInterfaces for passing information b/w layers Protocols between corresponding layers at different sitesProtocols between corresponding layers at different sites Lower three layers form Comm. Subnet, responsible for providing reliable physical communicationLower three layers form Comm. Subnet, responsible for providing reliable physical communication

BZUPAGES.COM 62

BZUPAGES.COM 63 TCP/IP Architecture Another popular ArchitectureAnother popular Architecture Five layersFive layers Standardization is specified by IEEE Committee 802, who has specified different standards for different protocolsStandardization is specified by IEEE Committee 802, who has specified different standards for different protocols

Distributed DBMS Architecture

BZUPAGES.COM 65 Architecture of a systems defines its structure, means, the components of the systems, function performed by each component and the relationship among components Architecture of a systems defines its structure, means, the components of the systems, function performed by each component and the relationship among components

BZUPAGES.COM 66 Three major architectures of DDBMS discussedThree major architectures of DDBMS discussed –Peer to peer –Client/Server –Multdatabase These are idealized architectures, practical installations may waryThese are idealized architectures, practical installations may wary

BZUPAGES.COM 67 DBMS Standardization A conceptual framework whose purpose is to divide standardization work into manageable pieces and to show at a general level how these pieces are related to one another. Approaches

BZUPAGES.COM 68 Component-based Components of the system are defined together with the interrelationships between components. Good for design and implementation of the system. However it is difficult to determine functionality of system by seeing its individual component

BZUPAGES.COM 69 Function-based Classes of users are identified together with the functionality that the system will provide for each class. The objectives of the system are clearly identified. But how do you achieve these objectives?

BZUPAGES.COM 70 Data-based Identify the different types of describing data and specify the functional units that will realize and/or use data according to these views

BZUPAGES.COM 71PracticallyPractically Every aspect has to be consideredEvery aspect has to be considered These Classification schemes are OrthogonalThese Classification schemes are Orthogonal A committee for the DBMS standardization was established in 1972 by ANSI under SPARC (Standards Planning and Requirement Committee)A committee for the DBMS standardization was established in 1972 by ANSI under SPARC (Standards Planning and Requirement Committee)

BZUPAGES.COM 72 Published its initial report in 1975 and then in 1977Published its initial report in 1975 and then in 1977 Its full name being “ANSI/X3/SPARC DBMS Framework”Its full name being “ANSI/X3/SPARC DBMS Framework” Mainly based on Data organizationMainly based on Data organization

BZUPAGES.COM 73 Reference Model Internal View Conceptual View External View External View External View Internal Schema Conceptual Schema External Schema Users

BZUPAGES.COM 74 Dimensions for DDBS Architecture Autonomy refers to the distribution of control not of data. It indicates the degree to which individual DBMSs can operate independently. Types could be Design, Communication and Execution Autonomy. Degree of Autonomy varies in different DDBS architectures Autonomy refers to the distribution of control not of data. It indicates the degree to which individual DBMSs can operate independently. Types could be Design, Communication and Execution Autonomy. Degree of Autonomy varies in different DDBS architectures

BZUPAGES.COM 75 Distribution deals with data. Logically, data appears to be placed at a single place but practically it may be spread at physically different locations Distribution deals with data. Logically, data appears to be placed at a single place but practically it may be spread at physically different locations

BZUPAGES.COM 76 Heterogeneity refers the differences in hardware and software among the individual databases. Like different machines, OS, Data Models, DBMSs, or query languages Heterogeneity refers the differences in hardware and software among the individual databases. Like different machines, OS, Data Models, DBMSs, or query languages

BZUPAGES.COM 77

BZUPAGES.COM 78 Major DDBS Architectures-I 1.Client-Server Architecture –The term used in different meanings; generally C and S refer to processes, may be running at the same machines –In the context of DDBS both client and server are machines not processes (Fig 4.4) –Server performs most of the data Management; Query Processing Transaction and Storage Management –Client, mainly has application and user interface a client module of DBMS Data and lock management cached their sometimes

BZUPAGES.COM 79 Major DDBS Architectures-I & II –Client passes user queries to server without trying to understand or optimize them –One Server Multiple Clients –Multiple Servers One Server at a time Multiple Servers (transparently) at a time (A DDBS) DDBS 2- Peer to Peer Distributed Systems Heterogeneous databases at each site, defining Local Internal SchemasHeterogeneous databases at each site, defining Local Internal Schemas On top of that, Local Conceptual Schema, then the overall view is depicted by Global conceptual schema that supports the External schemas (Fig 4.5)On top of that, Local Conceptual Schema, then the overall view is depicted by Global conceptual schema that supports the External schemas (Fig 4.5)

BZUPAGES.COM 80 Interpreting user commands and formats results Checks if user query can be processed Optimized execution strategy Global queries to local ones Coordinates distribution execution of user requests

BZUPAGES.COM 81 Chooses best access path to any data item Makes sure the consistency of local data even in case of failure Physically accesses the data as per the commands generated by query optimizer. Interacts with the OS

BZUPAGES.COM 82 A Multidatabase System Provides access from multiple, autonomous heterogeneous, and distributed databases. Two Major architectures:Provides access from multiple, autonomous heterogeneous, and distributed databases. Two Major architectures: Global Schema Architecture Federated Schema Architecture Major DDBS Architectures-III

BZUPAGES.COM 83 Multidatabase Systems: Architectures External Schema External Schema Global Schema Component Schema Component Schema Local Schema Local Schema Global Schema Architecture Schema Translation Schema Integration Federated Schema Federated Schema Export Schema Component Schema Component Schema Local Schema Local Schema Export Schema Export Schema External Schema External Schema Federated Database Architecture Export Schema

BZUPAGES.COM 84 Global Directory Issues A directory is a database that contains data about data (meta- data). Called global directory in case of a DDBS. Three issues;A directory is a database that contains data about data (meta- data). Called global directory in case of a DDBS. Three issues; A single large or local for each siteA single large or local for each site Location; whether to keep at a single site or distributed.Location; whether to keep at a single site or distributed. Single copy or replicationSingle copy or replication All three issues are orthogonal to each otherAll three issues are orthogonal to each other That concludes chapter 4, questions?That concludes chapter 4, questions?

BZUPAGES.COM 85 Distributed database environments (adapted from Bell and Grimson, 1992)

BZUPAGES.COM 86 Distributed Database Options Homogeneous - Same DBMS at each node. –Autonomous - Independent DBMSs. –Non-autonomous - Central, coordinating DBMS. Heterogeneous - Different DBMSs at different nodes. –Gateways - Simple paths are created to other databases without the benefits of one logical database.

BZUPAGES.COM 87 Distributed Database Options –Systems - Supports some or all of the functionality of one logical database. Full DBMS Functionality - All dist. Db functions. Partial-Multi-database - Some dist. Db functions. –Federated - Supports local databases for unique data requests. »Loose Integration - Local dbs have their own schemas. »Tight Integration - Local dbs use common schema. –Unfederated - Requires all access to go through a central, coordinating module.

BZUPAGES.COM 88 Homogeneous, Non- Autonomous Database Data is distributed across all the nodes.Data is distributed across all the nodes. Same DBMS at each node.Same DBMS at each node. All data is managed by the distributed DBMS (no exclusively local data.)All data is managed by the distributed DBMS (no exclusively local data.) All access is through one, global schema.All access is through one, global schema. The global schema is the union of all the local schema.The global schema is the union of all the local schema.

BZUPAGES.COM 89 Focus on The Following Heterogeneous Environment Data distributed across all the nodes.Data distributed across all the nodes. Different DBMSs may be used at each node.Different DBMSs may be used at each node. Local access is done using the local DBMS and schema.Local access is done using the local DBMS and schema. Remote access is done using the global schema.Remote access is done using the global schema.