Presentation is loading. Please wait.

Presentation is loading. Please wait.

PMIT-6102 Advanced Database Systems

Similar presentations


Presentation on theme: "PMIT-6102 Advanced Database Systems"— Presentation transcript:

1 PMIT-6102 Advanced Database Systems
By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

2 Lecture -14 Parallel Database Systems

3 Outline Parallel Database Systems Fundamental Functional Architecture
Parallel DBMS Architectures shared-memory, shared-disk and shared-nothing.

4 Parallel Database Systems
A parallel computer, or multiprocessor, is a special kind of distributed system made of a number of nodes (processors, memories and disks) connected by a very fast network within one or more cabinets in the same room. Data distribution can be exploited to increase performance (through parallelism) and availability (through replication). They can support very large databases with very high loads. Implementation of parallel database systems naturally relies on distributed database techniques.

5 Advantages A parallel database system should provide the following advantages. High-performance Parallelism can increase throughput, using inter-query parallelism Inter-query parallelism is a form of parallelism in the evaluation of database queries, in which several different queries execute concurrently on multiple processors to improve the overall throughput of the system. decrease transaction response times, using intra-query parallelism Intra-query parallelism is a form of parallelism in the evaluation of database queries, in which a single query is decomposed into smaller tasks that execute concurrently on multiple processors.

6 Advantages High-availability
Because a parallel database system consists of many redundant components, it can well increase data availability and fault-tolerance. Replicating data at several nodes is useful to support failover, a fault-tolerance technique that enables automatic redirection of transactions from a failed node to another node that stores a copy of the data. This provides uninterupted service to users.

7 Advantages Extensibility
Extensibility is the ability to expand the system smoothly by adding processing and storage power to the system. Ideally, the parallel database system should Linear speedup and linear scale-up Linear speedup refers to a linear increase in performance for a constant database size while the number of nodes (i.e., processing and storage power) are increased linearly. Linear scale up refers to a sustained performance for a linear increase in both database size and number of nodes.

8 Advantages Extensibility Fig Extensibility Metrics

9 Functional Architecture
The functions supported by a parallel database system can be divided into three subsystems much like in a typical DBMS. Session Manager Transaction Manager Data Manager

10 Functional Architecture
Session Manager It plays the role of a transaction monitor, providing support for client interactions with the server. In particular, it performs the connections and disconnections between the client processes and the two other subsystems. Therefore, it initiates and closes user sessions (which may contain multiple transactions). In case of OLTP sessions, the session manager is able to trigger the execution of pre-loaded transaction code within data manager modules.

11 Functional Architecture
Transaction Manager It receives client transactions related to query compilation and execution. It can access the database directory that holds all meta-information about data and programs. Depending on the transaction, it activates the various compilation phases, triggers query execution, and returns the results as well as error codes to the client application. Because it supervises transaction execution and commit, it may trigger the recovery procedure in case of transaction failure. To speed up query execution, it may optimize and parallelize the query at compile-time.

12 Functional Architecture
Data Manager It provides all the low-level functions needed to run compiled queries in parallel, i.e., database operator execution, parallel transaction support, cache management, etc. If the transaction manager is able to compile dataflow control, then synchronization and communication among data manager modules is possible. Otherwise, transaction control and synchronization must be done by a transaction manager module.

13 Parallel DBMS Architectures
There are three basic parallel computer architectures depending on how main memory or disk is shared: shared-memory, shared-disk and shared-nothing. Hybrid architectures such as NUMA or cluster try to combine the benefits of the basic architectures.

14 Parallel DBMS Architectures
Shared-Memory In the shared-memory any processor has access to any memory module or disk unit through a fast interconnect (e.g., a high-speed bus or a cross-bar switch). All the processors are under the control of a single operating system. All shared-memory parallel database products today can exploit inter-query parallelism to provide high transaction throughput and intra-query parallelism to reduce response time of decision-support queries.

15 Parallel DBMS Architectures
Shared-Memory Fig Shared-Memory Architecture

16 Parallel DBMS Architectures
Shared-Memory Shared-memory has two strong advantages: simplicity Since meta-information (directory) and control information (e.g., lock tables) can be shared by all processors, writing database software is not very different than for single processor computers. Intra-query parallelism requires some parallelization but remains rather simple load balancing. Load balancing is easy to achieve since it can be achieved at run-time using the shared-memory by allocating each new task to the least busy processor.

17 Parallel DBMS Architectures
Shared-Memory Shared-memory has three problems: high cost, High cost is incurred by the interconnect that requires fairly complex hardware because of the need to link each processor to each memory module or disk. limited extensibility With faster processors (even with larger caches), conflicting accesses to the shared-memory increase rapidly and degrade performance Therefore, extensibility is limited to a few tens of processors, typically up to 16 for the best cost/performance using 4-processor boards. low availability Finally, since the memory space is shared by all processors, a memory fault may affect most processors thereby hurting availability. The solution is to use duplex memory with a redundant interconnect.

18 Parallel DBMS Architectures
Shared-Disk In the shared-disk approach any processor has access to any disk unit through the interconnect but exclusive (non-shared) access to its main memory. Each processor-memory node is under the control of its own copy of the operating system. Then, each processor can access database pages on the shared disk and cache them into its own memory. Since different processors can access the same page in conflicting update modes, global cache consistency is needed. The first parallel DBMS that used shared-disk is Oracle with an efficient implementation of a distributed lock manager for cache consistency. Other major DBMS vendors such as IBM, provide shared-disk implementations.

19 Shared-Disk

20 Parallel DBMS Architectures
Shared-disk has a number of advantages: lower cost, The cost of the interconnect is significantly less than with shared-memory since standard bus technology may be used. high extensibility, Given that each processor has enough main memory, interference on the shared disk can be minimized. Thus, extensibility can be better, typically up to a hundred processors. load balancing, easy migration from centralized systems. availability, Since memory faults can be isolated from other nodes, availability can be higher.

21 Shared-Nothing In the shared-nothing approach each processor has exclusive access to its main memory and disk unit(s). Similar to shared-disk, each processor memory- disk node is under the control of its own copy of the operating system. Each node can be viewed as a local site (with its own database and software) in a distributed database system. Therefore, most solutions designed for distributed databases such as database fragmentation, distributed transaction management and distributed query processing may be reused. Using a fast interconnect, it is possible to accommodate large numbers of nodes. This architecture is often called Massively Parallel Processor (MPP).

22 Shared-Nothing The first major parallel DBMS product was Teradata’s Database Computer that could accommodate a thousand processors in its early version. Other major DBMS vendors such as IBM, Microsoft provide shared-nothing implementations.

23 Shared-Nothing

24 Shared-Nothing As demonstrated by the existing products, shared-nothing has three main virtues: lower cost, The cost advantage is better than that of shared-disk that requires a special interconnect for the disks. high extensibility By implementing a distributed database design that favors the smooth incremental growth of the system by the addition of new nodes, extensibility can be better (in the thousands of nodes). high availability By replicating data on multiple nodes, high availability can also be achieved.

25 Thank You


Download ppt "PMIT-6102 Advanced Database Systems"

Similar presentations


Ads by Google