EM416 Choosing the Correct Data Movement Technology Chris Kleisath Director of Engineering iAnywhere Solutions

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

OneBridge Mobile Data Suite Product Positioning. Target Plays IT-driven enterprise mobility initiatives Extensive support for integration into existing.
C6 Databases.
Message Queues COMP3017 Advanced Databases Dr Nicholas Gibbins –
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Pocket PowerBuilder And Database Applications. Ian Thain Pocket PowerBuilder Evangelist PTOG Evangelist Team, Sybase Inc.
“Turn you Smart phone into Business phone “
Transaction Management and Concurrency Control
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Overview Distributed vs. decentralized Why distributed databases
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Interpret Application Specifications
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
Distributed Databases
Query Processing in Mobile Databases
DISTRIBUTED DATABASES AND DDBMS.  Understand the concept of “Distributed Data”  Describe various Distributed Data and DDBMS implementations  Explain.
SQL Server Replication By Karthick P.K Technical Lead, Microsoft SQL Server.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
Distributing Data for Availability and Scalability Don Vilen Program Manager SQL Server Microsoft Corporation.
What is Architecture  Architecture is a subjective thing, a shared understanding of a system’s design by the expert developers on a project  In the.
Database Design – Lecture 16
Distributed File Systems
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
1cs Intersection of Concurrent Accesses A fundamental property of Web sites: Concurrent accesses by multiple users Concurrent accesses intersect.
Overview – Chapter 11 SQL 710 Overview of Replication
Module 6: Implementing SQL Server Replication in an Enterprise Environment.
Module 11: Introducing Replication. Overview Introduction to Distributed Data Introduction to SQL Server Replication SQL Server Replication Agents SQL.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Chapter 12: Designing a Data- Archiving Solution MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design Study Guide (70-443)
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Transactions and Locks A Quick Reference and Summary BIT 275.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
EM401 Overview of MobiLink Synchronization Jim Graham Director of Engineering iAnywhere Solutions
Distributed database system
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
DATABASE CONNECTIVITY TO MYSQL. Introduction =>A real life application needs to manipulate data stored in a Database. =>A database is a collection of.
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
Introduction to Active Directory
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
IAnywhere Solutions Mobile Computing on Linux Eyun Lindberg
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Distributed Databases
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Netscape Application Server
CHAPTER 3 Architectures for Distributed Systems
Working at a Small-to-Medium Business or ISP – Chapter 7
Working at a Small-to-Medium Business or ISP – Chapter 7
Chapter 16: Distributed System Structures
Basic Concepts in Data Management
Working at a Small-to-Medium Business or ISP – Chapter 7
Distributed Databases
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Transactions, Properties of Transactions
Presentation transcript:

EM416 Choosing the Correct Data Movement Technology Chris Kleisath Director of Engineering iAnywhere Solutions

Overview When choosing a data movement technology you need to consider: The business requirements for the distributed database. The technological limitations of your environment. The development and administrative resources available.

Central Datastore Mobile Computing Embedded Computing Workgroup Computing Enterprise Data Movement Wired and Wireless Communication Links

Distributed Systems Using any data movement technology means we have a distributed database. This almost always implies some form of distributed application.

What is a Distributed System? C.J. Date’s working definition: “A distributed database system consists of a collection of sites, connected together via some sort of network, in which: Each site is a database system in its own right Sites have agreed to work together (if necessary), so that a user at any site can access data anywhere in the network exactly as if the data were all stored at the user’s own site.”

Distributed Systems Practical Factors Not all systems require that all data be available to all sites. Not all systems require that all data be consistent between all sites all of the time. The degree to which your system must meet the ideal definition is the single biggest factor in choosing your data movement technology.

Issues when distributing data Local autonomy Data partitioning (fragmentation) Consistency Transaction control Accessibility (connection) Topology

Local autonomy Each site should operate independently of the other sites. No site should depend on another site for its successful functioning. A centralized database provides the lowest level of local autonomy. Decentralized systems provide the highest level of local autonomy.

Data Partitioning Also known as fragmentation. Only the data needed by a site is present at the site. The database at a site is a “complete” subset of the data. Some data will need to be duplicated between sites.

Data Partitioning Update Anywhere Primary keys must be unique across the entire distributed system. If multiple sites insert into the same table. Requires a conflict detection and resolution mechanism. If multiple sites are able to change the same row.

Tight vs. Loose Consistency Which version of the data is being used? Waterloo Acct Account AcctNoBal Paris Acct Account AcctNoBal

Consistency Tight consistency requires all data to be in a consistent state. Loose consistency permits data to be “out-of-date”. Latency is the measure of how long it takes the data to become consistent. In some cases it is never consistent since there are always changes that have not been moved.

Transaction Control Your chosen technology must pass the ACID test Atomicity, Consistency, Isolation, Durability Only committed data should move. Committed data must move. Failure to successfully move committed data must be detectable. Changes must be applied in the same order on all databases.

Accessibility What kind of “network” do you have between the sites? High-speed LAN/WAN Low-speed Dial-up (RAS) Wireless Indirect ( , ftp) Internet (HTTP) Sneaker-net

Topology What kind of relationship exists between the sites. Peer-to-peer Each site can transfer data to any other site. No centralized master copy can exist. Conflict resolution is extremely difficult There is no place to detect and resolve the conflict

Topology Hierarchical Each site passes data up and down the hierarchy. A central master copy (consolidated database) exists. Data must pass through the consolidated to move to another site. Conflict detection and resolution is implemented on the consolidated.

Other Issues Number of sites Some technologies are better suited to mass deployment. Vendors Are the databases at each site the same product? Is the technology commercially available and supported?

Factors Summary Each of the following factors will influence your choice of data movement technologies: Local autonomy Data partitioning (fragmentation) Consistency Transaction control Accessibility (connection) Topology

Types of Data Movement All technologies can be categorized as one of: Online Synchronization Replication

Online Changes are made “simultaneously” on all databases. Waterloo Acct Account AcctNoBal Paris Acct Account AcctNoBal Withdraw $100 Please wait while your account is being updated…

Online In its “simplest” form the application updates all of the databases directly. Underlying technology is normally “Two Phase Commits” Sybase products: EAServer Not really data movement but is appropriate in some systems.

Online Characteristics: Local Autonomy Very low level of local autonomy. If one site is down the entire system is down. Waterloo Acct Account AcctNoBal Paris Acct Account AcctNoBal Withdraw $100 Sorry the System is Unavailable X

Online Characteristics: Data Partitioning Data can be partitioned as required. If the data is partitioned the application must update the row(s) everywhere. Since transactions are applied at all databases simultaneously no primary key or conflict issues arise.

Online Characteristics: Consistency Use when tight consistency is an absolute requirement. Transactions will succeed or fail on all databases.

Online Characteristics: Transaction Control A Distributed Transaction Server (DTS) should be used. Ensures the transaction is applied on all sites or not at all. Very expensive to code yourself. Both ASA and ASE provide support for a DTS

Online Characteristics: Accessibility Requires a reliable network connection between sites. Transactions will fail if one database is unavailable. Application speed will be affected by network speed.

Online Characteristics Typically a peer-to-peer topology. Since all databases are updated at once no master copy is required.

Online Characteristics Other Issues “Simple” to understand Looks just like a centralized database. Very few sites can be supported Consider the cost of updating many databases at once. Vendors Heterogeneous environments are “easily” supported.

Synchronization Current state of the data is moved between databases. Can be a complete refresh or only the rows that have changed. Sybase products: MobiLink

Synchronization Product sku_keyqty_oh Product sku_keyqty_oh 10 9 X X X 10 8 X UPDATE Product SET qty_oh = 8 WHERE sku_key = 1234

Synchronization Characteristics Local Autonomy High local autonomy Site database must have all of the data required for the application to run.

Synchronization Characteristics Data Partitioning Data is usually partitioned. Each site has common data and site specific data. Update anywhere requires: Unique primary keys. Conflict detection and resolution mechanism.

Synchronization Characteristics Consistency Low to high consistency. Data is only consistent immediately after synchronization. Frequency of synchronization affects level of consistency but in all cases there is some latency.

Synchronization Characteristics Transaction Control Transaction boundaries are not maintained. Some operation sequences can not be synchronized. (i.e. insert then delete of a row with the same primary key value) Most synchronization technologies “batch” the operations. e.g. all deletes, then inserts, then updates

Synchronization Characteristics Accessibility Requires a stable network connection during the synchronization process. Connection speed affects the amount of data that can be reasonably synchronized.

Synchronization Characteristics Topology Both peer-to-peer and hierarchical topologies are possible. Peer-to-peer is difficult if update anywhere is permitted. Which copy of the data is correct? Who resolves an update conflict?

Synchronization Characteristics Other Issues Heterogeneous environments can be supported. Be aware of compatibility issues. E.g. Oracle allows 1 varchar column/table, ASA has no limit. Because each site synchronizes independently many sites can be supported.

MobiLink ASA, ASE, Microsoft, Oracle, IBM ASA, PalmOS, CE, Pagers, Phones HTTP, TCPIPHotSync, Wireless Serial

MobiLink Characteristics Complete local autonomy. Complete control over data partitioning on the consolidated through the use of scripts. Uses the consolidated database’s scripting language or Java. No partitioning allowed on the remote.

MobiLink Characteristics Session based. Only changed records are synchronized. Connection only required while synchronizing. Bi-directional by default. Medium to high latency. Low to medium data volume.

MobiLink Characteristics Hierarchical topology. Consolidated can be any ODBC-based database Sybase, Microsoft, Oracle, IBM ASA and/or UltraLite remotes. Optimized for thousands of remotes. Scalable based on consolidated database’s capabilities.

MobiLink Synchronization Components TCP/IP ODBC MobiLink Client (ASA or UltraLite) Consolidated Data Store Consolidated Database Server Remote Data Store Remote Database Server (ASA or UltraLite) MobiLink Server TCP/IP

MobiLink Synchronization Server Provides interface between consolidated database and remote server. Works with ODBC-based host databases. Responsible for ensuring the synchronization process completes. Supports multiple simultaneous synchronizations.

MobiLink Consolidated Synchronization Logic SQL statements executed against the consolidated database. Written in language of consolidated database or Java. Guides the synchronization server. Controls the flow of data in both directions. Handles conflicts.

MobiLink Remote Synchronization Logic ASA and UltraLite keep track of changes to the data. A synchronization component is provided to: Scan for changes to create the upload stream Receive the download stream and apply the changes to the remote

Replication Transactions (changes) are moved between the databases. Uses store and forward mechanism. Site(s) must have a common starting point. Sybase Products: SQL Remote Replication Server

Replication Product sku_keyqty_oh Product sku_keyqty_oh 10 9 X X X UPDATE Product SET qty_oh = 8 WHERE sku_key = 1234 UPDATE Product SET qty_oh = 9 WHERE sku_key = X X X

Replication Characteristics Local Autonomy High local autonomy. Database must have all of the data required for the application to run.

Replication Characteristics Data Partitioning Data is usually partitioned. Each site has common data and site specific data. Update anywhere requires: Unique primary keys. Conflict detection and resolution mechanism.

Replication Characteristics Consistency Low to high consistency is possible. Speed of store and forward messaging system determines how consistent the database is. Some latency is always present.

Replication Characteristics Transaction Control Mechanism must exist to guarantee transactions are: Sent and applied in the correct order. No transactions are skipped

Replication Characteristics Accessibility Whether a direct connection is required or not is dependant on the latency requirements. Not required in high latency implementations.

Replication Characteristics Topology Both peer-to-peer and hierarchical topologies can be used. Conflict resolution normally requires a hierarchical model.

Replication Characteristics Other Issues Only transactions are moved therefore: It is possible to support many sites. Throughput is usually independent of database size.

Replication Server Adaptive Server Replication Agent Replication Agent Replication Server DirectCONNECT (Native drivers) DirectCONNECT (Native drivers) Adaptive Server/Enterprise Adaptive Server/Anywhere Oracle Informix OS/390 DB2 Replication Toolkit for MVS Adaptive Server/Enterprise Adaptive Server/Anywhere Oracle Informix OS/390 DB2 Replication Toolkit for MVS Replicate Sites Primary Sites DirectCONNECT/ Anywhere (ODBC) DirectCONNECT/ Anywhere (ODBC)

Replication Server Characteristics Transactions are sent to Replication Server which stores and forwards them to the interested sites. Assumes there is normally a high speed connection. Near real time (low latency). High data volumes. Moderate number of sites. Heterogeneous databases supported. Uni-directional by default.

Replication Server Replication Agent Replication Agent Replication Server Replicate Sites Primary Sites

Replication Server Components Primary Site Origin of the data being moved. Multiple vendors RDBMS supported. Keeps a record of all transactions. Normally this is in the transaction log but it depends on the RDBMS.

Replication Server Components Replication Agent Scans the primary site’s record of transactions. Passes the committed transactions, in the order they were applied, to Replication Server.

Replication Server Components Replication Server Receives transactions from the Replication Agents. Stores the transactions until they are successfully applied on all replicate sites. Maintains a connection to all replicate sites. Automatically recovers when a connection is dropped and restored Determines which site(s) require the transaction and applies them in the correct order.

Replication Server Components Replication Server Prevents “circular” transactions. Provides user programmable “function strings” to allow manipulation of the transaction. Data conversions (e.g. date formats) Conversion of SQL in heterogeneous environments. Detects SQL errors.

Replication Server Components Replicate Site Applies SQL sent by Replication Server. A replicate site can also be defined as a primary site if bi-directional replication is required.

SQL Remote ASE MAPIVIM FILE FTP SMTP ASA MAPIVIM FILE FTP SMTP ASA OR

SQL Remote Characteristics Complete local autonomy. Partitioning based on: Column values Subqueries Where clauses Message based (no connection) MAPI (Microsoft), VIM (Lotus), SMTP, FTP and File Very loose consistency.

SQL Remote Characteristics Built in guaranteed message delivery. Hierarchical Consolidated is either ASA or ASE Remotes are ASA Homogeneous. Many (thousands) of remotes. Low to medium data volumes.

SQL Remote Components Message Agent Remote Data Store Remote Database Server Message System Message Agent Consolidated Data Store Consolidated Database Server

SQL Remote Components Consolidated Database Contains a copy of all data that is replicating. Performs conflict detection and resolution. Transactions are recorded in the transaction log. Maintains additional data in the transaction log about what transactions are eligible to replicate and how they are partitioned.

SQL Remote Components Message Agent Scans the transaction log for committed transactions that are eligible to replicate. Builds messages for the sites that have subscribed to the transactions. Interfaces with the message system. Guarantees that transactions are: Sent in the correct order. Applied in the correct order and only applied once. No transactions are skipped.

SQL Remote Components Message Agent Receives transactions from the message system. Applies the transactions. Prevents “circular” transactions. Detects update conflicts. Detects SQL errors.

SQL Remote Components Message System Provides the store and forward technology. Support for: MAPI SMTP VIM File FTP

SQL Remote Components Remote Database Contains data the site is subscribed to. Transactions are recorded in the transaction log. Maintains additional data in the transaction log about what transactions are eligible to replicate and how they are partitioned.

Which Technology should I Choose? Depends on the business requirements and technological infrastructure available. Consistency and latency are the biggest factors.

Use EAServer When … Absolute consistency is required (Zero latency). Transactions must fail when one of the site databases is unavailable. There are very few sites.

Use MobiLink When … Latency is permitted. Local autonomy is required. A reliable connection exists. You have low to medium data volumes. There are heterogeneous databases. You do not require transaction boundaries to be maintained. You have a hierarchical topology. You have many remotes. You must know when your changes have been synchronized.

Use Replication Server When … Near real-time consistency is required. You have high data volumes. Local autonomy is required. There are heterogeneous databases. You require transaction boundaries to be maintained. You have a peer-to-peer topology. You can implement a hierarchical topology A small number of sites.

Use SQL Remote when … Latency is not a factor. No direct connection exists (or is not permitted) or the connection is unreliable,. Local autonomy is required. You have low to medium data volumes. Homogeneous (ASA & ASE) databases. Hierarchical topology. You have many remotes. You require transaction boundaries to be maintained.

Or Combine them All four products are compatible. Use the strengths of each to solve your business problems. Replication Server or EAServer between main geographical databases. MobiLink or SQL Remote for mass deployed devices.

Summary Sybase has many different methods of maintaining data in distributed databases. Your business requirements dictate which method is best. All the technologies can be used together on the same database.

EM 416 Choosing the Correct Data Movement Technology Chris Kleisath Director of Engineering iAnywhere Solutions