Presentation on theme: "Rev. 8.2012 SYBASE IMDB DREAMS COME TRUE Andrew Melkonyan Senior Database Architect Ness Pro Division, NESS Professional Services, Israel."— Presentation transcript:
Rev. 8.2012 SYBASE IMDB DREAMS COME TRUE Andrew Melkonyan Senior Database Architect Ness Pro Division, NESS Professional Services, Israel
ARE YOU MULTITASKING? How often do you switch to another task waiting for DB operation to finish? The truth is, it has nothing to do with multitasking. It is a waste of time. A necessary evil. Hmm, is it?… Imagine loading your half-terabyte database in 5 minutes instead of two hours… Imagine running reorg rebuild or building clustered index on a 10 GB table within 2 minutes instead of half a day… Imagine your batches that cursor on 100K-s of inserts & 100K-s of updates return within minutes, instead of hours… days… We are not talking about dreams. We are talking about IMDB – an internal feature available to any ASE customer out of the box.
IMDB IS NOT A DREAM. IT IS REAL. Lets inspect some numbers:
A BIT OF HISTORY The idea of In-memory DB (Sybase IMDB) or Main Memory DB (MMDB technology) is not new. Neither it is unique to Sybase. First technological attempts go back to mid 80s. Today, vendors other than Sybase have come out with their MMDB solutions (Oracle, IBM, Open-Source Community): MM-DBMS, MARS, HALO, OBE, TPK, System M, Fast Path TimesTen, IBMs SolidDB, Open source Fast DB, Monet DB, H2 and HSQLDB, SAP HANA. The need for fast-performing DBMS is well understood.
SYBASE: INNOVATIVE APPROACH What is unique to Sybase is that Sybase has chosen not to treat this technology as a separate product, with its SQL semantics, its connectivity drivers, it own optimizer or it own data/index structure. Having done that may have brought Sybase MMDB solution more performance advantages in certain circumstances. As it is, Sybase greatest advantage is that its IMDB solution is integral to ASE. As such it may be implemented within seconds with almost zero learning curve.
AD IMDB: RELAXED ACID RDBMS In fact, rather than inventing new architecture Sybase rethought basic RDBMS principle of ACID transactional integrity and made it customizable within ASE server itself, with full ACID set as default. The ACID principle has been 'sacred' for RDBMS for years: A = Atomic: For all modifications in a transaction these are either all performed or all rolled back C = Consistent: Transactions must respect integrity constraints I = Isolated: Other users can never see a transaction in an intermediate state D = Durable: Committed transactions are guaranteed, persistent and cannot be undone From ASE 15.5 on, ASE allows to relax Atomicity and Durability in the existing ASE DB environment. This flexibility allowed two high- performance configurations to surface: IMDB and RDDB.
IMDB VS. DRDB Traditional DRDB 1.Format physical layer (~disk) 2.DISK INIT initialize physical layer 3.CREATE DB build a DB on its top Data resides on a physical disk and is accessed through memory layer (LIO – PIO chain). IMDB alternative 1.Allocate ASE memory space (sp_cacheconfig) 2.DISK INIT it as if it were disk 3.CREATE DB build a DB on its top Data resides completely in memory. No I/O. No intermediate layer to the data.
AD IMDB: RELAXED ACID RDBMS Getting rid of physical storage has its price: Once shut down, the data is gone. Transactional integrity is not guaranteed (no rollback or commit, transactional integrity preserved per-statement only). On startup, IMDB is recreated from its template (which may include both data and SQL code – full DRDB template, must be careful with this! – or may be an empty slate). No possibility to dump transaction log or perform incremental transaction log load. IMDB is fully consistent on index/page level, very fast – can be loaded within minutes, but its data is volatile.
IMDB: INTERNAL OPTIMIZATIONS In addition to having zero I/O footprint, IMDB avoids some bottlenecks typical to DRDB databases: IMDB caches have no LRU/MRU chain, no wash marker, no buffer pools, no APF and no buffer replacement policy. redo log is not needed since transaction log needs not be recovered. Transactions fitting in ULC are discarded: no log semaphore contention. Relaxed LOG/ULC pinning (for data page written to disk, its log pages must be written first and ULC may need to be flushed to log; for concurrent updates to DATAROW-locked page, we must flush ULC first). No disk writes due to page splits. No frequent updates to the database timestamp needed for recovery. Optimizations around deletes using an index scan for DOL. Optimizations around bulk inserts into DOL with non-unique indices. Minimally logged DML support in IMDB and RDDB.
IMDB: REPLICATED ARCHITECTURE Since IMDB dies on shutdown, support for various replicated architectures have been developed in the recent ASE/RS versions. As of today, IMDB supports functioning as both primary and replicate site. To the left is architecture with IMDB functioning as Primary ASE:
IMDB: REPLICATED ARCHITECTURE Below is replicate architecture with IMDB serving as Replicate ASE (replication to the template DB is used for recovery from IMDB shutdown):
IMDB: ONLY THREE STEPS AWAY These are the only steps needed to setup IMDB database: Setup IMDB cache: sp_cacheconfig 'IMDB_CACHE', '2048M', 'inmemory_storage', 'cache_partitions=4 Initialize IMDB Disk: disk init name = 'imdb_datadev1', physname = 'IMDB_CACHE', size = '1024M', type = 'inmemory disk init name = 'imdb_logdev1', physname = 'IMDB_CACHE', size = '512M', type = 'inmemory' Create IMDB Database: create inmemory [temporary] database IMDB [use ASE_DB as template] on imdb_datadev1 = '1024M' log on imdb_logdev1 = '512M' with durability = no_recovery [, dml_logging = minimal --select into is a must for this]
IMDB IS NOT A DREAM. IT IS REAL. Load your data in minutes and start performing in seconds:
ARE YOU INTRIGUED? Imagine the following situation: your batch process runs 24 hours processing 100s of tables with 100GB+ of data & indices. What is the most of its time spent on? I/O, poor indexing, transaction log management, data cache management, page splits, &c. Setting up the IMDB database will take you 5 minutes. Loading 100GB DB will take a couple more minutes. BCP in (in the worst case) 100GB+ will take you less than an hour. Running the same batch on the freshly created IMDB will probably end well before the same batch running on DRDB has done half of its work. And if while running the batch you suddenly discover that you are missing some crucial index and the optimizer goes wild building it will be a matter of minutes. This is amazing flexibility and amazing freedom compared to the fear of ever touching poorly performing production batch…
ARE YOU INTRIGUED? Imagine the following situation: you are facing a new project for which mining the current data is crucial. You have hundreds of gigs of it. You need to be able to group, test, select appropriate index structures. Each empirical test on DRDB takes lots and lots of time… It is a matter of having enough RAM and an available IMDB database to turn this into a playground for kids. Mining data within seconds or minutes. Testing index structures. Rebuilding. No fear. No waste of time. You dont have to think twice before changing existing legacy SQL code which was written for small tables and now runs badly against huge tables. You may just test and decide based on empirical data. Again, no fear of change. No waste of time…
ARE YOU INTRIGUED? IMDB opens the horizons for performance and allows flexibility and freedom unthought-of before. You are dictating the rules, not your storage. It is DB than serves your needs, not you who is serving the needs of your DB…