Presentation is loading. Please wait.

Presentation is loading. Please wait.

Institut für Scientific Computing – Universität WienP.Brezany Parallele und Verteilte Datenbanksysteme Univ.-Prof. Dr. Peter Brezany Institut für Scientific.

Similar presentations


Presentation on theme: "Institut für Scientific Computing – Universität WienP.Brezany Parallele und Verteilte Datenbanksysteme Univ.-Prof. Dr. Peter Brezany Institut für Scientific."— Presentation transcript:

1 Institut für Scientific Computing – Universität WienP.Brezany Parallele und Verteilte Datenbanksysteme Univ.-Prof. Dr. Peter Brezany Institut für Scientific Computing Universität Wien Tel Sprechstunde: Di, LV-Portal:

2 Institut für Scientific Computing – Universität WienP.Brezany 2 Motivation Business Medicine Scientific experiments Simulations Earth observations Data and data exploration cloud Data and data exploration cloud

3 Institut für Scientific Computing – Universität WienP.Brezany 3 Data Warehouse Knowledge Cleaning and Integration Selection and Transformation Data Mining Evaluation and Presentation The Knowledge Discovery Process OLAP Online Analytical Mining OLAP Queries

4 Institut für Scientific Computing – Universität WienP.Brezany 4 Data Preprocessing Fig. 3.1

5 Institut für Scientific Computing – Universität WienP.Brezany 5 EcoGRID Scetch Waste Air Soil Water Emmisions Bio- diversity Forests Distributed Data Flow Analysis Geo- Statistic Reporting Popular Presen- tation Prediction Models Distributed Applications Distributed Datamining … Statistic Common Ontology

6 Institut für Scientific Computing – Universität WienP.Brezany 6 Management of TBI patients Traumatic brain injuries (TBIs) typically result from accidents in which head strikes an object. The treatment of TBI patients is very resource intensive. The trajectory of the TBI patients management: –Trauma event –First aid –Transportation to hospital –Acute hospital care –Home care All the above phases are associated with data collection into databases – now managed by individual hospitals. Usage of mobile communication devices

7 Institut für Scientific Computing – Universität WienP.Brezany 7 Data Mining Accuracy vs. Data Size accuracy sampled data size 100% available data size assumed

8 Institut für Scientific Computing – Universität WienP.Brezany 8 The GridMiner Project in Vienna GridMiner : A knowledge discovery Grid infrastructure (http://www.gridminer.org/)  OGSA-based architecture  Workflow management  Grid-aware data preprocessing and data mining services  Data mediation service  OLAP service  GUI  Current Implementation on top of Globus Toolkit 3.2 Applications : Exploration of ecological data, management of patients with traumatic brain injuries Research exhibition available

9 Institut für Scientific Computing – Universität WienP.Brezany 9 Literatur Auf der WWW-Seite der LV

10 Institut für Scientific Computing – Universität WienP.Brezany 10 Distributed Memory Architecture (Shared Nothing) Local Memory Local Memory Local Memory Local Memory CPU Interconnection Network CPU

11 Institut für Scientific Computing – Universität WienP.Brezany 11 DMM: Shared Disk Architecture Local Memory Local Memory Local Memory Local Memory CPU Interconnection Network Global Shared Disk Subsystem

12 Institut für Scientific Computing – Universität WienP.Brezany 12 Shared Memory Architecture (Shared Everything, SMP) CPU Interconnection Network CPU Global Shared Memory

13 Institut für Scientific Computing – Universität WienP.Brezany 13 Cluster of SMPs CPU Interconnection Network CPU 4-CPU SMP CPU 4-CPU SMP CPU 4-CPU SMP CPU 4-CPU SMP

14 Institut für Scientific Computing – Universität WienP.Brezany 14 High-Performance I/O Systems

15 Institut für Scientific Computing – Universität WienP.Brezany 15

16 Institut für Scientific Computing – Universität WienP.Brezany 16 Note: RAID technology is introduced in a separate scriptum.

17 Institut für Scientific Computing – Universität WienP.Brezany Principles of Distributed Database Systems The main literature

18 Institut für Scientific Computing – Universität WienP.Brezany 18 Distributed Database System (DDBS) Technology – Introduction DDBS is the union of what appears to be two diametrically opposed approaches to data processing: database systems and computer network technologies. Database systems have taken us from a paradigm of data processing in which each application defined and maintained its own data (figure follows) to one in which the the data is defined and adminstered centrally (figure follows) -> data independence (The application programs are immune to changes in the logical and or physical organization of the data and vice versa.) One of the major motivations is the desire to integrate the operational data of an enterprise and to provide centralized, thus controlled access to that data.

19 Institut für Scientific Computing – Universität WienP.Brezany 19 DDBS – Introduction (cont.) The technology of computer networks promotes a mode of work that goes against all centralization efforts. How these two contrasting approaches can be synthesized to produce a technology that is more powerful and more promising than either one alone? –The key understanding is the realization that the most important objective of the database technolgy is integration, not centralization. It is important to realize that either one of these terms does not necessarily imply the other. –It is possible to achieve integration without centralization, and that is exactly what the distributed database technology attempts to achieve.

20 Institut für Scientific Computing – Universität WienP.Brezany 20 Distributed Database System Technology - Introduction

21 Institut für Scientific Computing – Universität WienP.Brezany 21

22 Institut für Scientific Computing – Universität WienP.Brezany 22 Central Database on a Network - Example Communication Network Boston Paris San Francisco Edmonton

23 Institut für Scientific Computing – Universität WienP.Brezany 23 Distributed Database System (DDBS) - Definitions Definition 1: Distributed database. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. Definition 2: Distributed database management system (DBMS). It is defined as the software system that permits the management of the DDBS and makes the distribution transparent to the users. A DDBS is not a „collection of files“ that can be individually stored at each node of a computer network. To form a DDBS, files should not only be logically related, but there should be structure among the files, and access should be via a common interface. The physical distribution of data is very important. It creates problems, that are not encountered when the databases reside in the same computer system.

24 Institut für Scientific Computing – Universität WienP.Brezany 24 Promises of DDBSs 1.Transparent Management of Distributed and Replicated Data Transparency refers to separation of the higher-level semantics of a system from lower-level implementation issues; a transparent system „hides“ the implementation details from the user. Example (next slide): Consider an engineering firm that has offices in several cities. –It is preferable, to localize each data such that data about the employees in Edmonton office are stored in Edmonton,..., and so forth. The same applies to the project information. In this process we partition each of the relations and store each partition at a differetn site – it is known as fragmentation. –It may be preferable to duplicate some of this data at other sites for performance and reliability reasons. The result is a distributed database which is fragmented and replicated. Fully transparent access means that the users can still pose queries in the same form as to a centralized system, without paying any attention to the fragentation, location, or replication of data, and let the system worry about resolving these issues.

25 Institut für Scientific Computing – Universität WienP.Brezany 25 Distributed Database System Environment - Example Communication Network Boston Paris San Francisco Edmonton Boston Angestellte (employees) Paris Angestellte (employees) Boston Projekte (projects) Paris Angestellte (employees) Paris Projekte (projects) Boston Angestellte (employees) Boston Projekte(projects) Edmonton (employees) Paris Projekte (projects) Edmont Projekte (projects) San Francisco Angestellte (employees) San Francisco Projekte (projects)

26 Institut für Scientific Computing – Universität WienP.Brezany 26 Promises of DDBSs 2. Reliability Through Distributed Transactions Distributed DBMSs are intended to improve reliability since they have replicated components and, thereby eliminate single points of failure. The failure of a single site, or the failure of a communication link which makes one or more sites unreachable, is not sufficient to bring down the entire system. In the case of a distributed database, this means that some of the data may be unreachable, but with proper care, users may be permitted to access other parts of the dist. database. The „proper care“ comes in the form of support for distributed transactions.

27 Institut für Scientific Computing – Universität WienP.Brezany 27 Promises of DDBSs 3. Improved Performance 1.A distributed DBMS fragments the conceptual database, enabling data to be stored in close proximity to its points of use. 2.The inherent parallelism of dist. systems may be exploited for inter-query and intra-query parallelism. Inter-query parallelism results from the ability to execute multiple queries at the same time. Intra-query parallelism is achieved by breaking up a single query into a number of subqueries each of which is executed at a different site, accessing a different part of the distributed database.

28 Institut für Scientific Computing – Universität WienP.Brezany 28 Promises of DDBSs 4. Easier System Expansion In a distributed environment, it is much easier to accommodate increasing database sizes. Major system overhauls are seldom necessary; expansion can usually be handled by adding processing and storage power to the network. It may be possible to obtain a linear increase in „power“, since this also depends on the overhead of distribution. It normally costs much less to put together a system of smaller computers with the equivalent power of a single big machine.

29 Institut für Scientific Computing – Universität WienP.Brezany 29 Problem Areas Distributed database design Distributed query processing Distributed directory management Distributed concurrency control Distributed deadlock management Heterogeneous databases

30 Institut für Scientific Computing – Universität WienP.Brezany 30 Distributed DBMS Architecture The architecture of a system defines its structure. This means that the components of the system are identified, the function of each component is specified, and the interrelationships and interactions among these components are defined. In this part we classify DBMS architectures. These are idealized views – many research and commercially available systems may deviate from them. We use a classification (next slides) that organizes the systems as characterized with respect to (1) the autonomy of local systems, (2) their distribution, and (3) their heterogeneity.

31 Institut für Scientific Computing – Universität WienP.Brezany 31 Autonomy Autonomy refers to the distribution of control, not of data. It indicates the degree to which individual DBMSs can operate independently. Requirements of an autonomous system: –The local operations of the individual DBMSs are not affected by their participaion in a multidatabase system. –The manner in which the individual DBMSs process queries and optimize them should not be affected by the execution of global queries that access multiple databases. –System consistency or operation should not be compromised when individual DBMSs join or leave the multidatabase confederation.

32 Institut für Scientific Computing – Universität WienP.Brezany 32 Distribution Whereas autonomy refers to the distributed control, the distribution dimension of the taxonomy deals with data. There are a number of ways DBMSs have been distributed. We abstract 2 alternative classes: –client/server distribution –peer-to-peer distribution (or full distribution)

33 Institut für Scientific Computing – Universität WienP.Brezany 33 Heterogeneity Heterogeneity may occur in different forms: –hardware –data models –query languages –transaction management protocols

34 Institut für Scientific Computing – Universität WienP.Brezany 34 Architekturmodell

35 Institut für Scientific Computing – Universität WienP.Brezany 35 Architektur von DBMS Client - Server Architektur (nicht interessant für diese LV) Verteilte Datenbank Architektur Multi Datenbank Architektur

36 Institut für Scientific Computing – Universität WienP.Brezany 36 Client/Server Architektur Hier gibt es typischerweise einen zentralen Datenbank-Server und eine größere Anzahl vernetzter Arbeitsplatzrechner, die keine relevanten Daten speichern. Der Benutzer am Arbeitsplatzrechner sieht die volle Funktionalität des DBMS. Das System verhält sich wie ein zentrales Datenbanksystem, die Kommunikation ist für den Benutzer transparent.

37 Institut für Scientific Computing – Universität WienP.Brezany 37 Client/Server Architektur (cont.)

38 Institut für Scientific Computing – Universität WienP.Brezany 38 Verteiltes Datenbanksystem Hier gibt es mehrere Datenbankserver, wobei bestimmte Daten auf nur einem Rechner oder auch auf mehreren (replizit) gespeichert sein können. Eine virtuelle Datenbank, deren Komponenten physisch in einer Anzahl unterschiedlicher, real existierender DBMS abgebildet werden. Transaktionen können in diesem Fall über mehrere DBMS laufen. Sammlung von Daten, die Aufgrund gemeinsamer, verknüpfender Eigenschaften dem gleichen System angehören Auf versch. Rechnern im Netzwerk verteilt sind Wobei jeder Rechner seine eigene Datenbank besitzt Autonom lokal Aufgaben abwickeln kann

39 Institut für Scientific Computing – Universität WienP.Brezany 39 Verteiltes Datenbanksystem (cont.) - gleichzeitige Benutzung der Rechenleistung mehrerer Rechner - Engpaß in zentralen Datenbanksystemen bei Zugriff auf die Daten wird vermieden, da die Daten verteilt sind (ggf. repliziert) - Daten werden von einem Datenbanksystem verwaltet - Verteilungstransparenz - Grundlage: 4-Ebenen-Schema-Architektur

40 Institut für Scientific Computing – Universität WienP.Brezany 40 Repetition: ANSI/SPARC Architecture External view External view Internal view Conceptual view External view Users External Schema Conceptual Schema Internal Schema The internal view deals with the physical definition and organization of data. The location of data on different storage devices and the access mechanisms used to reach and manipulate data are the issues dealt with at this level. The conceptual schema is an abstract definition of the database – it is the „real view“ of the enterprise being modeled in the database. The requirements of indi- vidual applications or the restrictions of the physical storage media are not considered. The external view is concerned with how users view the database. An individual user‘s view represents the portion of the database that will be accessed by that user as well as the relationships that the user would like to see among the data. A view can be shared among a number of users.

41 Institut für Scientific Computing – Universität WienP.Brezany 41 Verteiltes Datenbanksystem (cont.) 4 - Ebenen - Schema - Architektur externes Schema 1externes Schema N glob. konzept. Schema lokales konzept. Schema lokales internes Schema...

42 Institut für Scientific Computing – Universität WienP.Brezany 42 Functional Schematic of an Integrated Distributed DBMS Global directory (GD/D) permits the required global mappings. Local mappings are per- formed by a local directory/dictionary (LD/D) mappings.

43 Institut für Scientific Computing – Universität WienP.Brezany 43 Components of a Distributed DBMS 1.The user interface handler is responsible for inter- preting users commands and formatting the result data. 2.The semantic data controller uses the integrity constraints and authorizations that are defined as part of the global conceptual schema to check if the user query can be processed. 3.The global query optimizer and decomposer determines an execution strategy to minimize a cost function, and translates the global queries into local ones using the global and local conceptual schemas as well as the global directory. 4.The distributed execution monitor coordinates the distributed execution of the user request. User processor Data processor 1.The local query optimizer is responsible for choosing the best access path (The term access path refers to the data structures and algorithms that are used to access data. A typical access path is an index on one or more attributes of a relation.) to acces any data item. 2.The local recovery manager is responsible for making sure that the locak database remains consistent. 3.The run-time support processor physically accesses the database according to the physical commands in the schedule generated by the query optimizer.

44 Institut für Scientific Computing – Universität WienP.Brezany 44 Multidatenbanksystem - Ein MDBS ist ein Verbund von mehreren Datenbanksystemen. - Das Konzeptionelle Schema repräsentiert nur den Teil von Daten, den die lokalen DBMS teilen wollen. - Auf jedes DBS können lokale Anwendungen zugreifen. - Jedes DBS kann Daten enthalten, welche keine Beziehung zu Daten anderer DBS haben.

45 Institut für Scientific Computing – Universität WienP.Brezany 45 Multidatenbanksystem Modell mit globalem konzeptionellem Schema LES GES GKS LKS 1LKS n LIS 1 LIS n...

46 Institut für Scientific Computing – Universität WienP.Brezany 46 Multidatenbanksystem (cont.) Modell ohne globales konzeptionelles Schema ES 1ES 2ES n LKS 1LKS 3 LIS 1 LIS 3 LKS 2 LIS 2 Multidatabase layer Local system layer

47 Institut für Scientific Computing – Universität WienP.Brezany 47 Components of an MDBS

48 Institut für Scientific Computing – Universität WienP.Brezany 48 Directory Management Strategies - Alternatives


Download ppt "Institut für Scientific Computing – Universität WienP.Brezany Parallele und Verteilte Datenbanksysteme Univ.-Prof. Dr. Peter Brezany Institut für Scientific."

Similar presentations


Ads by Google