Presentation is loading. Please wait.

Presentation is loading. Please wait.

EM416 Choosing the Correct Data Movement Technology Chris Kleisath Director of Engineering iAnywhere Solutions

Similar presentations


Presentation on theme: "EM416 Choosing the Correct Data Movement Technology Chris Kleisath Director of Engineering iAnywhere Solutions"— Presentation transcript:

1 EM416 Choosing the Correct Data Movement Technology Chris Kleisath Director of Engineering iAnywhere Solutions kleisath@ianywhere.com

2 Overview When choosing a data movement technology you need to consider: The business requirements for the distributed database. The technological limitations of your environment. The development and administrative resources available.

3 Central Datastore Mobile Computing Embedded Computing Workgroup Computing Enterprise Data Movement Wired and Wireless Communication Links

4 Distributed Systems Using any data movement technology means we have a distributed database. This almost always implies some form of distributed application.

5 What is a Distributed System? C.J. Date’s working definition: “A distributed database system consists of a collection of sites, connected together via some sort of network, in which: Each site is a database system in its own right Sites have agreed to work together (if necessary), so that a user at any site can access data anywhere in the network exactly as if the data were all stored at the user’s own site.”

6 Distributed Systems Practical Factors Not all systems require that all data be available to all sites. Not all systems require that all data be consistent between all sites all of the time. The degree to which your system must meet the ideal definition is the single biggest factor in choosing your data movement technology.

7 Issues when distributing data Local autonomy Data partitioning (fragmentation) Consistency Transaction control Accessibility (connection) Topology

8 Local autonomy Each site should operate independently of the other sites. No site should depend on another site for its successful functioning. A centralized database provides the lowest level of local autonomy. Decentralized systems provide the highest level of local autonomy.

9 Data Partitioning Also known as fragmentation. Only the data needed by a site is present at the site. The database at a site is a “complete” subset of the data. Some data will need to be duplicated between sites.

10 Data Partitioning Update Anywhere Primary keys must be unique across the entire distributed system. If multiple sites insert into the same table. Requires a conflict detection and resolution mechanism. If multiple sites are able to change the same row.

11 Tight vs. Loose Consistency Which version of the data is being used? Waterloo Acct2001000 Account AcctNoBal Paris Acct2001000 Account AcctNoBal

12 Consistency Tight consistency requires all data to be in a consistent state. Loose consistency permits data to be “out-of-date”. Latency is the measure of how long it takes the data to become consistent. In some cases it is never consistent since there are always changes that have not been moved.

13 Transaction Control Your chosen technology must pass the ACID test Atomicity, Consistency, Isolation, Durability Only committed data should move. Committed data must move. Failure to successfully move committed data must be detectable. Changes must be applied in the same order on all databases.

14 Accessibility What kind of “network” do you have between the sites? High-speed LAN/WAN Low-speed Dial-up (RAS) Wireless Indirect (email, ftp) Internet (HTTP) Sneaker-net

15 Topology What kind of relationship exists between the sites. Peer-to-peer Each site can transfer data to any other site. No centralized master copy can exist. Conflict resolution is extremely difficult There is no place to detect and resolve the conflict

16 Topology Hierarchical Each site passes data up and down the hierarchy. A central master copy (consolidated database) exists. Data must pass through the consolidated to move to another site. Conflict detection and resolution is implemented on the consolidated.

17 Other Issues Number of sites Some technologies are better suited to mass deployment. Vendors Are the databases at each site the same product? Is the technology commercially available and supported?

18 Factors Summary Each of the following factors will influence your choice of data movement technologies: Local autonomy Data partitioning (fragmentation) Consistency Transaction control Accessibility (connection) Topology

19 Types of Data Movement All technologies can be categorized as one of: Online Synchronization Replication

20 Online Changes are made “simultaneously” on all databases. Waterloo Acct2001000 Account AcctNoBal Paris Acct2001000 Account AcctNoBal Withdraw $100 Please wait while your account is being updated…

21 Online In its “simplest” form the application updates all of the databases directly. Underlying technology is normally “Two Phase Commits” Sybase products: EAServer Not really data movement but is appropriate in some systems.

22 Online Characteristics: Local Autonomy Very low level of local autonomy. If one site is down the entire system is down. Waterloo Acct2001000 Account AcctNoBal Paris Acct2001000 Account AcctNoBal Withdraw $100 Sorry the System is Unavailable X

23 Online Characteristics: Data Partitioning Data can be partitioned as required. If the data is partitioned the application must update the row(s) everywhere. Since transactions are applied at all databases simultaneously no primary key or conflict issues arise.

24 Online Characteristics: Consistency Use when tight consistency is an absolute requirement. Transactions will succeed or fail on all databases.

25 Online Characteristics: Transaction Control A Distributed Transaction Server (DTS) should be used. Ensures the transaction is applied on all sites or not at all. Very expensive to code yourself. Both ASA and ASE provide support for a DTS

26 Online Characteristics: Accessibility Requires a reliable network connection between sites. Transactions will fail if one database is unavailable. Application speed will be affected by network speed.

27 Online Characteristics Typically a peer-to-peer topology. Since all databases are updated at once no master copy is required.

28 Online Characteristics Other Issues “Simple” to understand Looks just like a centralized database. Very few sites can be supported Consider the cost of updating many databases at once. Vendors Heterogeneous environments are “easily” supported.

29 Synchronization Current state of the data is moved between databases. Can be a complete refresh or only the rows that have changed. Sybase products: MobiLink

30 Synchronization 123410 Product sku_keyqty_oh 123410 Product sku_keyqty_oh 10 9 X 10 9 8 X X 10 8 X UPDATE Product SET qty_oh = 8 WHERE sku_key = 1234

31 Synchronization Characteristics Local Autonomy High local autonomy Site database must have all of the data required for the application to run.

32 Synchronization Characteristics Data Partitioning Data is usually partitioned. Each site has common data and site specific data. Update anywhere requires: Unique primary keys. Conflict detection and resolution mechanism.

33 Synchronization Characteristics Consistency Low to high consistency. Data is only consistent immediately after synchronization. Frequency of synchronization affects level of consistency but in all cases there is some latency.

34 Synchronization Characteristics Transaction Control Transaction boundaries are not maintained. Some operation sequences can not be synchronized. (i.e. insert then delete of a row with the same primary key value) Most synchronization technologies “batch” the operations. e.g. all deletes, then inserts, then updates

35 Synchronization Characteristics Accessibility Requires a stable network connection during the synchronization process. Connection speed affects the amount of data that can be reasonably synchronized.

36 Synchronization Characteristics Topology Both peer-to-peer and hierarchical topologies are possible. Peer-to-peer is difficult if update anywhere is permitted. Which copy of the data is correct? Who resolves an update conflict?

37 Synchronization Characteristics Other Issues Heterogeneous environments can be supported. Be aware of compatibility issues. E.g. Oracle allows 1 varchar column/table, ASA has no limit. Because each site synchronizes independently many sites can be supported.

38 MobiLink ASA, ASE, Microsoft, Oracle, IBM ASA, PalmOS, CE, Pagers, Phones HTTP, TCPIPHotSync, Wireless Serial

39 MobiLink Characteristics Complete local autonomy. Complete control over data partitioning on the consolidated through the use of scripts. Uses the consolidated database’s scripting language or Java. No partitioning allowed on the remote.

40 MobiLink Characteristics Session based. Only changed records are synchronized. Connection only required while synchronizing. Bi-directional by default. Medium to high latency. Low to medium data volume.

41 MobiLink Characteristics Hierarchical topology. Consolidated can be any ODBC-based database Sybase, Microsoft, Oracle, IBM ASA and/or UltraLite remotes. Optimized for thousands of remotes. Scalable based on consolidated database’s capabilities.

42 MobiLink Synchronization Components TCP/IP ODBC MobiLink Client (ASA or UltraLite) Consolidated Data Store Consolidated Database Server Remote Data Store Remote Database Server (ASA or UltraLite) MobiLink Server TCP/IP

43 MobiLink Synchronization Server Provides interface between consolidated database and remote server. Works with ODBC-based host databases. Responsible for ensuring the synchronization process completes. Supports multiple simultaneous synchronizations.

44 MobiLink Consolidated Synchronization Logic SQL statements executed against the consolidated database. Written in language of consolidated database or Java. Guides the synchronization server. Controls the flow of data in both directions. Handles conflicts.

45 MobiLink Remote Synchronization Logic ASA and UltraLite keep track of changes to the data. A synchronization component is provided to: Scan for changes to create the upload stream Receive the download stream and apply the changes to the remote

46 Replication Transactions (changes) are moved between the databases. Uses store and forward mechanism. Site(s) must have a common starting point. Sybase Products: SQL Remote Replication Server

47 Replication 123410 Product sku_keyqty_oh Product sku_keyqty_oh 10 9 X 10 9 8 X X UPDATE Product SET qty_oh = 8 WHERE sku_key = 1234 UPDATE Product SET qty_oh = 9 WHERE sku_key = 1234 123410 9 X 10 9 8 X X

48 Replication Characteristics Local Autonomy High local autonomy. Database must have all of the data required for the application to run.

49 Replication Characteristics Data Partitioning Data is usually partitioned. Each site has common data and site specific data. Update anywhere requires: Unique primary keys. Conflict detection and resolution mechanism.

50 Replication Characteristics Consistency Low to high consistency is possible. Speed of store and forward messaging system determines how consistent the database is. Some latency is always present.

51 Replication Characteristics Transaction Control Mechanism must exist to guarantee transactions are: Sent and applied in the correct order. No transactions are skipped

52 Replication Characteristics Accessibility Whether a direct connection is required or not is dependant on the latency requirements. Not required in high latency implementations.

53 Replication Characteristics Topology Both peer-to-peer and hierarchical topologies can be used. Conflict resolution normally requires a hierarchical model.

54 Replication Characteristics Other Issues Only transactions are moved therefore: It is possible to support many sites. Throughput is usually independent of database size.

55 Replication Server Adaptive Server Replication Agent Replication Agent Replication Server DirectCONNECT (Native drivers) DirectCONNECT (Native drivers) Adaptive Server/Enterprise Adaptive Server/Anywhere Oracle Informix OS/390 DB2 Replication Toolkit for MVS Adaptive Server/Enterprise Adaptive Server/Anywhere Oracle Informix OS/390 DB2 Replication Toolkit for MVS Replicate Sites Primary Sites DirectCONNECT/ Anywhere (ODBC) DirectCONNECT/ Anywhere (ODBC)

56 Replication Server Characteristics Transactions are sent to Replication Server which stores and forwards them to the interested sites. Assumes there is normally a high speed connection. Near real time (low latency). High data volumes. Moderate number of sites. Heterogeneous databases supported. Uni-directional by default.

57 Replication Server Replication Agent Replication Agent Replication Server Replicate Sites Primary Sites

58 Replication Server Components Primary Site Origin of the data being moved. Multiple vendors RDBMS supported. Keeps a record of all transactions. Normally this is in the transaction log but it depends on the RDBMS.

59 Replication Server Components Replication Agent Scans the primary site’s record of transactions. Passes the committed transactions, in the order they were applied, to Replication Server.

60 Replication Server Components Replication Server Receives transactions from the Replication Agents. Stores the transactions until they are successfully applied on all replicate sites. Maintains a connection to all replicate sites. Automatically recovers when a connection is dropped and restored Determines which site(s) require the transaction and applies them in the correct order.

61 Replication Server Components Replication Server Prevents “circular” transactions. Provides user programmable “function strings” to allow manipulation of the transaction. Data conversions (e.g. date formats) Conversion of SQL in heterogeneous environments. Detects SQL errors.

62 Replication Server Components Replicate Site Applies SQL sent by Replication Server. A replicate site can also be defined as a primary site if bi-directional replication is required.

63 SQL Remote ASE MAPIVIM FILE FTP SMTP ASA MAPIVIM FILE FTP SMTP ASA OR

64 SQL Remote Characteristics Complete local autonomy. Partitioning based on: Column values Subqueries Where clauses Message based (no connection) MAPI (Microsoft), VIM (Lotus), SMTP, FTP and File Very loose consistency.

65 SQL Remote Characteristics Built in guaranteed message delivery. Hierarchical Consolidated is either ASA or ASE Remotes are ASA Homogeneous. Many (thousands) of remotes. Low to medium data volumes.

66 SQL Remote Components Message Agent Remote Data Store Remote Database Server Message System Message Agent Consolidated Data Store Consolidated Database Server

67 SQL Remote Components Consolidated Database Contains a copy of all data that is replicating. Performs conflict detection and resolution. Transactions are recorded in the transaction log. Maintains additional data in the transaction log about what transactions are eligible to replicate and how they are partitioned.

68 SQL Remote Components Message Agent Scans the transaction log for committed transactions that are eligible to replicate. Builds messages for the sites that have subscribed to the transactions. Interfaces with the message system. Guarantees that transactions are: Sent in the correct order. Applied in the correct order and only applied once. No transactions are skipped.

69 SQL Remote Components Message Agent Receives transactions from the message system. Applies the transactions. Prevents “circular” transactions. Detects update conflicts. Detects SQL errors.

70 SQL Remote Components Message System Provides the store and forward technology. Support for: MAPI SMTP VIM File FTP

71 SQL Remote Components Remote Database Contains data the site is subscribed to. Transactions are recorded in the transaction log. Maintains additional data in the transaction log about what transactions are eligible to replicate and how they are partitioned.

72 Which Technology should I Choose? Depends on the business requirements and technological infrastructure available. Consistency and latency are the biggest factors.

73 Use EAServer When … Absolute consistency is required (Zero latency). Transactions must fail when one of the site databases is unavailable. There are very few sites.

74 Use MobiLink When … Latency is permitted. Local autonomy is required. A reliable connection exists. You have low to medium data volumes. There are heterogeneous databases. You do not require transaction boundaries to be maintained. You have a hierarchical topology. You have many remotes. You must know when your changes have been synchronized.

75 Use Replication Server When … Near real-time consistency is required. You have high data volumes. Local autonomy is required. There are heterogeneous databases. You require transaction boundaries to be maintained. You have a peer-to-peer topology. You can implement a hierarchical topology A small number of sites.

76 Use SQL Remote when … Latency is not a factor. No direct connection exists (or is not permitted) or the connection is unreliable,. Local autonomy is required. You have low to medium data volumes. Homogeneous (ASA & ASE) databases. Hierarchical topology. You have many remotes. You require transaction boundaries to be maintained.

77 Or Combine them All four products are compatible. Use the strengths of each to solve your business problems. Replication Server or EAServer between main geographical databases. MobiLink or SQL Remote for mass deployed devices.

78 Summary Sybase has many different methods of maintaining data in distributed databases. Your business requirements dictate which method is best. All the technologies can be used together on the same database.

79 EM 416 Choosing the Correct Data Movement Technology Chris Kleisath Director of Engineering iAnywhere Solutions kleisath@ianywhere.com


Download ppt "EM416 Choosing the Correct Data Movement Technology Chris Kleisath Director of Engineering iAnywhere Solutions"

Similar presentations


Ads by Google