Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

Similar presentations


Presentation on theme: "1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3."— Presentation transcript:

1 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3

2 2 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Outline generalities objectives problems

3 3 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College 1

4 4 Introduction communication network server application server DBMS in its own right

5 5 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Introduction distributed database = collection of connected sites each site is a DB in its own right (1) has its own DBMS and its own users operations can be performed locally as if the DB was not distributed the sites collaborate (transparently from the users point of view) the union of all DBs = the DB of the whole organisation (institution) (oppose to (1)) physical or logical distribution strict homogeneity (assumption)

6 6 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Motivation advantages matches the structure of the organisation example efficiency of processing stored closely to where it is being used increased accessibility remote DBs can be accessed disadvantage complexity

7 7 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Implementations (systems) commercial ORACLE ( Oracle Corporation ) INGRES/STAR ( Ask Group Inc. Ingres Division ) DB2 ( IBM ) they all provide some sort of features for distributed databases

8 8 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Fundamental principle a distributed DB system should look to the user exactly as a non-distributed DB system

9 9 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College 2

10 10 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Objectives local autonomy no reliance on central site location independence fragmentation independence replication independence distributed query processing distributed transaction management

11 11 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Objectives are: not independent from each other not exhaustive sometimes contradicting different degree of importance (for the user)

12 12 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Local autonomy all operations at a certain site are fully controlled by that site not achievable (why?) therefore, autonomy should be achieved to the maximum extent possible local data is locally owned and managed local data belongs to the local server even if it is accessible from other servers security, integrity,..., are in the responsibility of the local server

13 13 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College No reliance on a central site reasons bottle-neck vulnerability conclusion all sites must be equal

14 14 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Location independence users should not have to know where data is physically stored why do you think this is needed? think of application programs what does this objective look like?

15 15 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Data fragmentation data fragmentation if a relation can be divided into fragments for storing purposes motivation: performance - data is stored where it is mostly used definition fragment = any subrelation derivable via restriction or projection

16 16 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College FRAGMENT Emp INTO Lo_Emp AT SITE London WHERE Dept_id = Sales Le_Emp AT SITE Leeds WHERE Dept_id = Dev ; Data fragmentation - example

17 17 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Fragmentation independence / transparency users should perceive data as if it were not fragmented why? it is the optimisers responsibility to determine which fragments need to be physically accessed similar to views retrieving updating (JOIN and UNION views)

18 18 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Data replication copies of the same fragment can exist at different sites reasons better availability better performance disadvantage update propagation

19 19 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Replication independence / transparency users should not have to be aware of data replication it is the optimisers responsibility to choose which replica to use commercial systems not full support for replication independence (update problems) - primary copy

20 20 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed query processing the system must have set level operators one record at a time - too many messages (traffic) relational - indicated optimisation particularly relevant! find best way to move data across the network

21 21 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College 3

22 22 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Problems occur due to network utilisation aim minimise network utilisation query processing catalogue management update propagation recovery control concurrency control

23 23 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Query processing in a distributed environment query execution is distributed query optimisation is distributed global optimisation local optimisation example query on relation R issued at site X part of R, say R y, stored at Y part of R, say R z, stored at Z where is the query going to be executed?

24 24 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Catalogue management what other data does the catalog include? fragmentation, replication... where should the catalogue be stored centralised fully replicated loss of autonomy - update propagation! partitioned non local operations - very expensive! combination of first and third

25 25 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Central Catalogue all updates, including local updates, have to be recorded in the central catalogue disadvantages: bottleneck conflicts with the no reliance on a central site objective

26 26 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Fully Replicated Catalogue the entire database catalogue (not only the local one) is stored at each site every time an update is made, it has to be recorded at each site disadvantages loss of local autonomy time and network traffic consuming updates

27 27 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Update propagation problems because of replication data might become less available primary copy scheme one copy is designated primary copy (unique) primary copies exist at different sites (distributed) an update is logically complete if the primary copy has been updated the site holding the primary copy would have to propagate the updates violation of local autonomy

28 28 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Concurrency control locking overhead - increased number of messages primary copy strategy locking only the primary copy the primary copys site will propagate the update loss of autonomy (severely) global deadlock two interlocked (waiting for each other) sites cannot be detected using the wait-for graph - therefore, communication overhead

29 29 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College

30 30 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Conclusion generalities objectives – in brief problems – in brief


Download ppt "1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3."

Similar presentations


Ads by Google