Distributed Database Management Systems

Slides:



Advertisements
Similar presentations
Dynamic Grid Optimisation TERENA Conference, Lijmerick 5/6/02 A. P. Millar University of Glasgow.
Advertisements

Distributed Database Management Systems Lecture 15.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 5A Relational Algebra.
01/02/20031 Global affinity measure: GAM = aff(A i, A j )*[aff(A i, A j-1 ) + aff(A i, A j+1 ) + aff(A i-1, A j ) + aff(A i+1, A j )] Since the affinity.
Distributed Databases: Review May 2003Yangjun Chen1 Distributed Databases System Architecture Distributed Database Design Semantic Data Control Distributed.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Institut für Scientific Computing – Universität WienP.Brezany Fragmentation Univ.-Prof. Dr. Peter Brezany Institut für Scientific Computing Universität.
Improving Similarity Join Algorithms using Vertical Clustering Techniques Lisa Tan Department of Computer Science Computing & Information Technology Wayne.
1 Distributed Databases Review CS347 June 6, 2001.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Advance Data Structure 1 College Of Mathematic & Computer Sciences 1 Computer Sciences Department م. م علي عبد الكريم حبيب.
H.Lu/HKUST L04: Physical Database Design (2)  Introduction  Index Selection  Partitioning & Denormalization.
DISTRIBUTED DATABASE DESIGN
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Tools for Privacy Preserving Distributed Data Mining
Lecture 4 on Data Structure Array. Prepared by, Jesmin Akhter, Lecturer, IIT, JU Searching : Linear search Searching refers to the operation of finding.
Analysis of Algorithms CS 477/677
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
CS573 Data Privacy and Security Secure data outsourcing – Combining encryption and fragmentation.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.
CMPT 438 Algorithms.
Sorting.
Fundamentals of Algorithms MCS - 2 Lecture # 11
7.1 Matrices, Vectors: Addition and Scalar Multiplication
Introduction to Algorithms Prof. Charles E. Leiserson
Distributed Information Systems (CSCI 5533) Presentation ID: 19
CPSC 411 Design and Analysis of Algorithms
Relational Algebra Chapter 4, Part A
Computation.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
An Introduction to Support Vector Machines
Sorting in linear time Idea: if we can assume there are only k possible values to sort, we have extra information about where each element might need.
Algorithms + Data Structures = Programs -Niklaus Wirth
Linear Sorting Sections 10.4
מיחזור במערכת החינוך.
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Sequence Alignment 11/24/2018.
Distributed Database Management System
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Algorithms + Data Structures = Programs -Niklaus Wirth
Outline Introduction Background Distributed DBMS Architecture
Distributed Database Management Systems
DATA MINING Introductory and Advanced Topics Part II - Clustering
Vertical Fragmentation
Distributed Database Management Systems
Linear Sorting Section 10.4
Affinity Matrix Calculations
Distributed Database Management Systems
CPSC 411 Design and Analysis of Algorithms
Distributed Database Design
Algorithms CSCI 235, Spring 2019 Lecture 2 Introduction to Asymptotic Analysis Read Ch. 2 and 3 of Text 1.
Database Management System
Distributed Database Management Systems
Distributed Database Management Systems
Distributed Database Management System
Outline Introduction Background Distributed DBMS Architecture
DIVIDE AND CONQUER.
Presentation transcript:

Distributed Database Management Systems Lecture 17

Virtual University of Pakistan In this Lecture Continue with VF Information Requirement Attribute affinities Virtual University of Pakistan

Virtual University of Pakistan Replication of Key attributes does not violate the disjoint ness condition Virtual University of Pakistan

Vertical Fragmentation Information Requirements Virtual University of Pakistan

Virtual University of Pakistan Basic idea of VF is access efficiency Information Requirement is application based Attribute affinities: obtained from more primitive usage data Virtual University of Pakistan

Virtual University of Pakistan (80-20 Rule) Attribute usage values: Given a set of queries Q = {q1 , q2 ,…, qq} that will run on the relation R[A1, A2 ,…, An] Virtual University of Pakistan

Virtual University of Pakistan Attribute Usage Value use(qi,Aj ) 1 if attribute Aj is referenced by query qi use(qi,Aj ) = 0 otherwise use(qi,• ) can be defined accordingly Virtual University of Pakistan

Virtual University of Pakistan PROJ(jNo, jName, budget, loc) q1: SELECT BUDGET FROM PROJ WHERE JNO=Value q2: SELEC JNAME, BUDGET FROM PROJ Virtual University of Pakistan

Virtual University of Pakistan q3: SELECT JNAME FROM PROJ WHERELOC=Value q4: SELECTSUM(BUDGET) FROM PROJ WHERE LOC=Value Let A1= jNo, A2= jName, A3= budget, A4= loc Virtual University of Pakistan

Virtual University of Pakistan A1 A2 A3 A4 1 0 1 0 0 1 1 0 0 1 0 1 0 0 1 1 q1 q2 q3 q4 Attribute Usage Matrix Virtual University of Pakistan

Virtual University of Pakistan AUM does not represent the query frequency at different sites; Attribute affinity between two attribute Ai and Aj, affinity (Ai, Aj), of a relation R(A1, A2, …., An) with respect to applications set Q = {q1, q2, …, qq) is Virtual University of Pakistan

Virtual University of Pakistan aff(Ai, Aj) = ∑ ∑ refl(qk)accl(qk) k|use(qk, Ai) = 1  use(qk, Aj) = 1∀ sites where refl(qk) is number of accesses to attributes (Ai, Aj) for each execution of qk at site Sl, and… accl(qk) is application access frequency measure from Sl Virtual University of Pakistan

Virtual University of Pakistan Attribute Usage Matrix S1 S2 S3 q1 15 20 10 q2 5 q3 25 q4 3 A1 A2 A3 A4 q1 1 q2 q3 q4 Access Frequency Matrix Virtual University of Pakistan

Virtual University of Pakistan acc1(q1) = 15, acc2(q1) = 20, acc3(q1) = 10 acc1(q2) = 5, acc2(q2) = 0, acc3(q2) = 0 acc1(q3) = 25, acc2(q3) = 25, acc3(q3) = 25 acc1(q4) = 3, acc2(q4) = 0, acc3(q4) = 0 Virtual University of Pakistan

Virtual University of Pakistan aff(A3, A4) = ∑k = 4 ∑l =1..3 refl(qk)accl(qk) = 3 *1 + 0 + 0 = 3 aff(A1, A2) = 0, Since no qi accesses them both aff(A2, A2) = 5 * 1 + 0 + 0 = 5 25 * 1 + 25 *1 + 25 * 1 = 75 + 5 = 80 Virtual University of Pakistan

Virtual University of Pakistan q1 15 20 10 q2 5 q3 25 q4 3 A1 A2 A3 A4 q1 1 q2 q3 q4 Virtual University of Pakistan

Virtual University of Pakistan Attribute affinity matrix (AA) A1 A2 A3 A4 45 80 5 75 53 3 78 Virtual University of Pakistan

Clustering Algorithm

Virtual University of Pakistan VF is based on identifying groups of attributes based on AA Vertical Clustering is based on Bond Energy Algorithm (BEA); it uses AA; identifies groups of similar items Virtual University of Pakistan

Virtual University of Pakistan Large affinity attributes are combined together and lower together BEA takes as input the AA and generates the cluster affinity matrix CA Virtual University of Pakistan

Global Affinity Measure (AM)

Virtual University of Pakistan Affinity Measure is a single value that is calculated on the basis of positions of elements in AA and their surrounding elements Virtual University of Pakistan

Virtual University of Pakistan 45 80 5 75 53 3 78 Virtual University of Pakistan

Virtual University of Pakistan AM = ∑ n i = 1 j = 1 ∑ aff(Ai, Aj) [aff(Ai, Aj-1) + aff(Ai, Aj+1) + aff(Ai-1, Aj) + aff(Ai+1, Aj) ] aff(A0, Aj)= aff(Ai, A0)= aff(An+1, Aj)= aff(Ai, An+1)=0 Virtual University of Pakistan

Virtual University of Pakistan 45 80 5 75 53 3 78 Virtual University of Pakistan