PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

Lecture 02 Introduction to DDBMS Overview of Relational DBMS

Introduction à Distributed DBMS Promises à Problem Areas à Architectural Models for Distributed DBMSs

In centralized database systems, the only available resource that needs to be shielded from the user is the data. In a distributed database environment à a second resource that needs to be managed in much the same manner: the network. The user should be protected from the operational details of the network; possibly even hiding the existence of the network. Then there would be no difference between database applications that would run on a centralized database and those that would run on a distributed database. This type of transparency is referred to as network transparency or distribution transparency.

From a DBMS perspective, distribution transparency requires that users do not have to specify where data are located. Sometimes two types of distribution transparency are identified: à location transparency à Naming transparency.

Location transparency refers to the fact that the command used to perform a task is independent of à both the location of the data and the system on which an operation is carried out. Naming transparency means that a unique name is provided for each object in the database. à In the absence of naming transparency, users are required to embed the location name as part of the object name.

Distribute data in a replicated fashion across the machines on a network. If one of the machines fails, a copy of the data are still available on another machine on the network à Increase reliability, and availability of data. à Increases the locality of reference.

Data are replicated, the transparency issue is: à The users should not be aware of the existence of copies and the system should handle the management of copies. à The users not to be involved with handling copies and having to specify the fact that a certain action can and/or should be taken on multiple copies.

Increase performance, availability and reliability. fragmentation can reduce the negative effects of replication. à Each replica is not the full relation but only a subset of it; à thus less space is required and fewer data items need be managed.

Horizontal fragmentation : A relation is partitioned into a set of sub-relations each of which have a subset of the tuples (rows) of the original relation. Vertical fragmentation : Where each sub- relation is defined on a subset of the attributes (columns) of the original relation.

Improve reliability since they have replicated components and, thereby eliminate single points of failure. à The failure of a single site, or the failure of a communication link which makes one or more sites unreachable, is not sufficient to bring down the entire system.

Proximity to its points of use (also called data localization). Requires some support for fragmentation and replication. This has two potential advantages: à Since each site handles only a portion of the database, contention for CPU and I/O services is not as severe as for centralized databases. à Localization reduces remote access delays that are usually involved in wide area networks.

Issue is database scaling One aspect of easier system expansion is economics. à It normally costs much less to put together a system of “smaller” computers with the equivalent power of a single big machine.

First, data may be replicated in a distributed environment. à A distributed data base can be designed so that the entire database, or portions of it, reside at different sites of a computer network. Second, if some sites fail (e.g., by either hardware or software malfunction), or if some communication links fail (making some of the sites unreachable) à While an update is being executed, the effects will not be reflected on the data residing at the failing or unreachable. The third point is that since each site cannot have instantaneous information on the actions currently being carried out at the other sites, à The synchronization of transactions on multiple sites is considerably harder than for a centralized system.

Possible ways in which a distributed DBMS may be architected: (1) Autonomy of local systems, (2) Their distribution, and (3) Their heterogeneity.

Autonomy Autonomy, refers to the distribution (or decentralization) of control, not of data. à It indicates the degree to which individual DBMSs can operate independently. Autonomy is a function of a number of factors such as à whether the component systems (i.e., individual DBMSs) exchange information, à whether they can independently execute transactions, and whether one is allowed to modify them.

Dimensions of Autonomy à Design autonomy  Individual DBMSs are free to use the data models and transaction management techniques that they prefer. à Communication autonomy  Each of the individual DBMSs is free to make its own decision as to what type of information it wants to provide to the other DBMSs or to the software that controls their global execution. à Execution autonomy  Each DBMS can execute the transactions that are submitted to it in any way that it wants to.

Distribution The distribution dimension of the taxonomy deals with data. Physical distribution of data over multiple sites; à The user sees the data as one logical pool. There are a number of ways DBMSs have been distributed. Two classes: à client/server distribution à peer-to-peer distribution (or full distribution).

Client/server distribution The client/server distribution concentrates data management duties at servers à while the clients focus on providing the application environment including the user interface. à The communication duties are shared between the client machines and servers.

Peer-to-peer distribution (or full distribution). In peer-to-peer systems, there is no distinction of client machines versus servers. Each machine has full DBMS functionality and can communicate with other machines to execute queries and transactions.

Heterogeneity Hardware heterogeneity Differences in networking protocols to variations in data managers. Heterogeneity in query languages à not only involves the use of completely different data access paradigms in different data models. à but also covers differences in languages even when the individual systems use the same data model.

Overview of Relational DBMS à Structure of Relational Databases à Relational Algebra

Most of the distributed database technology has been developed using the relational model à Very simple model. à Often a good match for the way we think about our data. Example of a Relation: account (account- number, branch-name, balance)

Simplest approach (not always best): convert each Entity Set to a relation and each relationship to a relation. Entity Set  Relation Entity Set attributes become relational attributes. Becomes : account (account-number, branch-name, balance) account account-number balance branch-name

Table = relation. Column headers = attributes. Row = tuple Relation schema = name(attributes) + other structure info., e.g., keys, other constraints. Example: Account (account- number, branch-name, balance) à Order of attributes is arbitrary, but in practice we need to assume the order given in the relation schema. Relation instance is current set of rows for a relation schema. Database schema = collection of relation schemas. Account

A1 A2 A3... An a1 a2 a3 an b1 b2 a3 cn a1 c3 b3 bn. x1 v2 d3 wn Set theoretic Domain — set of values like a data type n-tuples (V1,V2,...,Vn) s.t., V1  D1, V2  D2,...,Vn  Dn Tuples = members of a relation inst. Arity = number of domains Components = values in a tuple Domains — corresp. with attributes Cardinality = number of tuples Relation as table Rows = tuples Columns = components Names of columns = attributes Set of attribute names = schema REL (A1,A2,...,An) Arity CardinalityCardinality Attributes Component Tuple

Each attribute of a relation has a name The set of allowed values for each attribute is called the domain of the attribute Attribute values are (normally) required to be atomic, that is, indivisible à E.g. multivalued attribute values are not atomic à E.g. composite attribute values are not atomic The special value null is a member of every domain

A 1, A 2, …, A n are attributes R = ( A 1, A 2, …, A n ) is a relation schema E.g. Customer-schema = ( customer-name, customer-street, customer-city ) r ( R ) is a relation on the relation schema R E.g. customer (Customer-schema)

The current values ( relation instance ) of a relation are specified by a table An element t of r is a tuple, represented by a row in a table Jones Smith Curry Lindsay customer-name Main North Park customer-street Harrison Rye Pittsfield customer-city customer attributes (or columns) tuples (or rows)

A database consists of multiple relations Information about an enterprise is broken up into parts, with each relation storing one part of the information E.g.: account : stores information about accounts depositor : stores information about which customer owns which account customer : stores information about customers Storing all information as a single relation such as bank ( account-number, balance, customer-name,..) results in à repetition of information (e.g. customer own two account) à the need for null values (e.g. represent a customer without an account) Normalization theory deals with how to design relational schemas

The customer Relation 31 The branch Relation The depositor Relation Account Relation Slide 31

borrower Relation 32 The Loan Relation Loan- number Branch-nameamount L-11Round Hill900 L-14Downtown1500 L-15Perryridge1500 L-16Perryridge1300 L-17Downtown1000 L-23Redwood2000 L-93Mianus500 Slide 32

Superkey is a set of attributes within a table whose values can be used to uniquely identify a tuple. A candidate key is a minimal set of attributes necessary to identify a tuple, this is also called a minimal superkey. For example, given an employee schema, consisting of the attributes employeeID, name, job, and departmentID, we could use the employeeID in combination with any or all other attributes of this table to uniquely identify a tuple in the table. à Examples of superkeys in this schema would be { employeeID, Name }, { employeeID, Name, job }, and { employeeID, Name, job, departmentID }. The last example is known as trivial superkey, because it uses all attributes of this table to identify the tuple. In a real database we do not need values for all of those attributes to identify a tuple. We only need, per our example, the set {employeeID}. This is a minimal superkey – that is, a minimal set of attributes that can be used to identify a single tuple. So, employeeID is a candidate key. Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.

Strong entity set. The primary key of the entity set becomes the primary key of the relation. Weak entity set. The primary key of the relation consists of the union of the primary key of the strong entity set and the discriminator of the weak entity set. Relationship set. The union of the primary keys of the related entity sets becomes a super key of the relation. à For binary many-to-one relationship sets, the primary key of the “many” entity set becomes the relation’s primary key. à For one-to-one relationship sets, the relation’s primary key can be that of either entity set. à For many-to-many relationship sets, the union of the primary keys becomes the relation’s primary key

Language in which user requests information from the database. Categories of languages à Procedural  User instructs the system to perform a sequence of operations on the database to compute the desired result. à non-procedural  User describes the desired information without giving a specific procedure for obtaining that information. “Pure” languages: à Relational Algebra à Tuple Relational Calculus à Domain Relational Calculus

Procedural language Six basic operators à select à project à union à set difference à Cartesian product à rename The operators take two or more relations as inputs and give a new relation as a result.

Select Operation – Example Relation r  A=B ^ D > 5 (r) Slide 37

Notation:  p ( r ) p is called the selection predicate Defined as:  p ( r ) = { t | t  r and p(t) } Where p is a formula in propositional calculus consisting of terms connected by :  ( and ),  ( or ),  ( not ) Each term is one of: op or where op is one of: =, , >, . <. 

Example of selection:  branch-name = “Perryridge” ( loan )  branch-name=“Perryridge” (loan) Slide 39

Relation r : = n  A,C (r) Duplicate rows removed

Notation:  A1, A2, …, Ak ( r ) where A 1, A 2 are attribute names and r is a relation name. The result is defined as the relation of k columns obtained by erasing the columns that are not listed Duplicate rows removed from result, since relations are sets E.g. To eliminate the branch-name attribute of account  account-number, balance ( account )

Relations r, s: r  s:

Notation: r  s Defined as: r  s = { t | t  r or t  s } For r  s to be valid. 1. r, s must have the same arity (same number of attributes) 2. The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of values as does the 2nd column of s ) E.g. to find all customers with either an account or a loan  customer-name ( depositor )   customer-name ( borrower)

Names of All Customers Who Have Either a Loan or an Account  customer-name (depositor)   customer-name (borrower) Union Operation Slide 44

Notation r – s Defined as: r – s = { t | t  r and t  s } Set differences must be taken between compatible relations. à r and s must have the same arity à attribute domains of r and s must be compatible

Relations r, s: r – s :

Notation r x s Defined as: r x s = { t q | t  r and q  s } Assume that attributes of r(R) and s(S) are disjoint. (That is, R  S =  ). If attributes of r(R) and s(S ) are not disjoint, then renaming must be used.

Cartesian-Product Operation- Example Relations r, s: r x s: Slide 48

Can build expressions using multiple operations Example:  A=C ( r x s ) r x s  A=C ( r x s )

Allows us to refer to a relation by more than one name. Example :  x ( E ) returns the expression E under the name X If a relational-algebra expression E has arity n, then  x ( A1, A2, …, An ) ( E ) returns the result of expression E under the name X, and with the attributes renamed to A 1, A2, …., An.

branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-city) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number)

Find all loans of over $1200 nFind the loan number for each loan of an amount greater than $1200  amount > 1200 (loan)  loan-number (  amount > 1200 (loan)) Loan-numberBranch-nameamount L-14Downtown1500 L-15Perryridge1500 L-16Perryridge1300 L-23Redwood2000 Loan-number L-14 L-15 L-16 L-23 loan

Find the names of all customers who have a loan, an account, or both, from the bank nFind the names of all customers who have a loan and an account at bank.  customer-name (borrower)   customer-name (depositor)  customer-name (borrower)  customer-name (depositor)

Find the names of all customers who have a loan at the Perryridge branch. n Find the names of all customers who have a loan at the Perryridge branch but do not have an account at any branch of the bank.  customer-name (  branch-name = “Perryridge” (  borrower.loan-number = loan.loan-number ( borrower x loan ))) –  customer-name ( depositor )  customer-name (  branch-name=“Perryridge ” (  borrower.loan-number = loan.loan-number (borrower x loan)))

Result of borrower  loan Slide 55

Result of  branch-name = “Perryridge” ( borrower  loan)  customer-name (  branch-name = “Perryridge” (  borrower.loan-number = loan.loan-number (borrower x loan)))  customer-name (  branch-name = “Perryridge” (  borrower.loan-number = loan.loan-number (borrower x loan))) –  customer-name (depositor) Customer-name Adams Slide 56

Customers With An Account But No Loan  customer-name (depositor)-  customer-name (borrower) Slide 57

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Similar presentations

Presentation on theme: "PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Similar presentations

Presentation on theme: "PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University."— Presentation transcript:

Similar presentations

About project

Feedback