A View over Distributed databases Vinod Bobba Illinois State university Billy Lim ITK 478 1/16/2019
Agenda Introduction Point of view Conclusion Distributed over Centralized Fragmentation Distributed Concurrency Control Conclusion 1/16/2019
Distributed databases Defining a global database as though it were centralized Distributing portions of it at a variety of interconnected sites Types Homogeneous Heterogeneous Gateway protocols 1/16/2019
Distributed over centralized Reflects organizational structure No single point failure Performance Modifications are easy 1/16/2019
Data Replication Disadvantages Reliability and Fast response May avoid complicated distributed transaction integrity routines De-couples nodes Disadvantages Additional requirements for storage space. Additional time for update operations. Integrity exposure of getting incorrect data if replicated data is not updated simultaneously. Therefore, better when used for non-volatile data. 1/16/2019
Fragmentation Importance Horizontal Fragmentation Breaking relations Horizontal Fragmentation Vertical Fragmentation 1/16/2019
Fragmentation Allocating Fragments Minimize cost of transmission Redundant Nonredundant Minimize cost of transmission Query type Linier Integer Formulation 1/16/2019
Distributed Concurrency Control Transactions Transaction manager Lock Management Centralized Primary Copy Fully Distributed Deadlock detection- wait for Phantom Deadlocks 1/16/2019
Distributed Concurrency Control Distributed Recovery Abort Transactions Commit protocols Two phase commit Coordinates activities at different sites involved in the transaction 1/16/2019
Better Performance Exploit Parallelism Commercial Systems Interquery Intraquery Commercial Systems Concurrency Control and recovery protocols are required – Synchronization 1/16/2019
Problems Need Network Scaling Complex data New database technologies Advanced replica control protocols Advanced transaction model Processing methods 1/16/2019
Pros Increased Reliability and availability Local control over data Modular growth Low communication cost Faster response for certain queries 1/16/2019
Cons Processing overhead Data integrity exposure Slower response for certain queries Software cost and complexity 1/16/2019
Q&A Invite Questions 1/16/2019