Presentation on theme: "Howard Fosdick (630)-279-4286 (C) 2004 FCI Worlds Largest Databases."— Presentation transcript:
Howard Fosdick (630)-279-4286 (C) 2004 FCI Worlds Largest Databases
Who Am I? Hands-on DBA (and SA) for … Oracle, DB2, SQL Server Unix, Linux, Windows Founder IDUG, MWDUG, CAMP Author, Speaker Independent Contractor (630)-279-4286 email@example.com
Outline 1. Whats a Big Database 2. DSS 3. OLTP 4. Observations
Statistics Sources 1. Winter Corp. -- Database Top Ten -- Yearly survey -- Vendor neutral -- Free at: www.wintercorp.comwww.wintercorp.com 2. Survey.com -- High-End BI/DW Competitive Analysis -- Survey of 150 companies w/ big warehouses -- Free at: www.survey.comwww.survey.com Thank You to both sources
Classifying Large Databases DSSOLTP Decision Support Systems (DSS) Online Analytical Processing (OLAP) Data Warehouses (DW) Multi-dimensional Databases (MDD) + Query oriented, mainly Read-only Online Transaction Processing (OLTP) + Update with short transactions (transaction = small CPU & data resources) Commercial IT vs. Scientific/Research databases
Whats a Large Database ? Database Size - User data - User data plus metadata & indexes - DASD farm Users - Concurrent users - Total user population Load - Concurrent queries - Queries / day or hour (simple vs complex queries) VLDB = Very Large Database Good definitions and measurements are key to success
Data Warehouses VS. Data Marts DWDM Application neutral Service multiple organizational needs Largest systems are usually data warehouses Application specific Organizationally focused
Whats Driving the Growth of Large Data Warehouses ? Web Sites -- - Clickstream data Retail -- - Transaction Level Detail (TLD) !!!!! Super Big Groceries !!!!! Preferred Customer Card #283736 Hello, Im Scot94 03/04/04 02:38 3284 03 2918 33 Store 493 Loc 229 PRETTY-LADY HAIRCLR 1 5.99 AARP MAGAZINE 1 4.95 DIAPERS 2 10.00 BEER SIX-PACK 1 3.45 Tax 2.40 BAL 36.79 Cash 40.00 Change 3.21 Save this Receipt – Get $2.00 off on Prozac When You Buy Super-Baby Food ! Understanding customer behavior means $$$ !
Whats Driving the Growth of Large Data Warehouses ? Necessary Preconditions -- Cheap Hardware Higher reliability / availability (based on dynamic hardware swapping) Better Software Lax privacy laws in USA EU curtails cross-usage of data EU has stronger privacy laws
Large DSS Systems Sun E12/15K HP Superdome IBM Regatta Unix mainframe Storage Area Network Query Users EMC Hitachi HP LSI Unix mainframes – + Dynamically add/drop CPUs, RAM (Sun calls it partitioning) + High reliability (as good as clusters or Mainframes) + Capacity on Demand SANs – + Flash (snap) backup (OS-level backup) + Large Cache + Intelligent data placement/movement
Where did IBM Mainframes Go ? Big Silicon Big Iron + Hello Linux ! + Good for -- + Consolidation platform + Legacy systems + Virtualization (multi-OS platform) Poof! -- Goodbye… -- Largest databases -- Smaller mainframes (VM, VSE) -- Reliability advantage eroded -- High cost per CPU 19942004
Oracle Rising Joined the Top Ten list 3 to 5 years ago 8i added essential DSS technologies... + Partitions + New ROW ID (for bigger databases) + Thorough Parallelism (DML, DDL, utilities) + Index improvements (bit mapped IXs, function-based, desc, others) + Resource Manager (proactive) + Materialized Views + Large memory mgmt + Optimizer is Partition-aware + Online DDL operations and Utilities
Why Not Oracle Clustering ? + Great for non-disruptive scaling of existing systems... But the biggest systems tend not to use it -- Unix mainframe no longer requires clustering for reliability, availability or easy scalability -- Clustering means complexity in minimizing the… -- Locking issues 9i improved this via Cache Fusion – but SMP Unix mainframe will still be favored
Wheres SQL Server 2000 ? Big in OLTP but lacks essential DSS technologies... -- Parallelism restricted to SELECTs -- Needs it for other DML, DDL, utilities -- Partitions -- Wintel restriction (Features = partitioning, database mirroring, mirrored backups, online Indexing & Restore, fast recovery, ANSI 1999 T-SQL, CLR support, native XML, XML Query, better.NET support, Reporting Services, Service Broker (async messaging), extensible data types…) Yukon ? -- Many new features... ready for Top Ten DSS ?
Wheres Open Source ? Linux + 2.6 kernel now out + More CPUs (to 16) + More RAM (> 4+ Gig) + Better threading, file system support MySQL and PostgresQL -- Top out at 500,000 page views per day (EWeek 2003) (or 15 per second) + Improving rapidly Prediction – open source will support big databases but not Top Ten list sites
Risks of Large DWs 40% of IT projects fail due to … Management (time & budget issues) Large warehouses are unforgiving -- Survey.com Design issues critical Database Design Query design (and EXPLAINs) ETL design and scheduling Pre-program wherever possible (control users and the resources they use) Monitoring and alerts Scale gradually (staggered loads on a schedule…) Benchmarks (after each Scaling Point)
Risks of Large DWs Partitioning data properly is critical For better physical management (utilities) Optimizers use this info Parallelism via multiple partitions How to partition Depends on data usage Examples: geographical, hash, unique id, ranges…
Architectures Large SMP mainframe Shared-disk Clusters Shared-nothing (Massively Parallel Processing or MPP) The architectural debate means far less than it used to !
Vendor Architectures Product : Architecture : Implementation : DB2 UDB for z/OSShared-disk clusteringDB2 Data Sharing on Sysplex DB2 UDB for LUWShared nothingDB2 UDB ESE partitioning feature OracleShared-disk clustering or SMP Real Application Clusters (RAC) -- previously known as Oracle Parallel Server (OPS) SQL Server 2000Shared nothing or SMPCustomer-developed partitioning based on SQL Server features TeradataShared nothing Teradata on NCR MPP
DBMS Licensing Costs Open Source (MySQL, PostgreSQL) SQL Server 2000 DB2 UDB Oracle Teradata Database pricing varies by the options selected and by the deal an IT organization cuts with the vendor. Your mileage may vary! Biggest DSS Systems $$$$$ Biggest OLTP Systems TCO ? + Low-cost SQL Server supports the biggest OLTP systems -- Pressure on Teradata to keep its niche + Open Source DBMSs have a role but its not Top Ten databases $
Multi- Machine Mixed Systems 45 Linux w/ MySQL servers (Transactional updates) EWeek, 2/23/04 Sabre / Travelocity 17 Himalaya Non-stop w/ Master database (Fare look-up and routing)
Multi- Machine Mixed Systems Omaha Steaks 17 Linux w/ MySQL servers (Shopping cart) (Transactional updates) * 50,000 to 68,000 daily sessions * 1 year in Production / 8 Million sessions ISeries DB2 EWeek 2003
Conclusions Databases are growing exponentially IT is closing in on Scientific/Research databases Multiple machine mixed systems are becoming popular (Monolithic central databases are no longer the only game in town) Mixed use databases are becoming more common Multiple applications Read and update Open Source supports large systems -- but not Top Ten VLDBs are instructive – but unique in some ways