Howard Fosdick (630)-279-4286 (C) 2004 FCI Worlds Largest Databases.

Slides:



Advertisements
Similar presentations
From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
ScaleDB Transactional Shared Disk storage engine for MySQL
1/17/20141 Leveraging Cloudbursting To Drive Down IT Costs Eric Burgener Senior Vice President, Product Marketing March 9, 2010.
© 2009 IBM Corporation Data Warehouse Solutions on System z - Doing more with what you have! - Doing more with what you have! Beth Hamel Product Manager.
Foundations of Relational Implementation (1) IS 240 – Database Management Lecture #13 – Prof. M. E. Kabay, PhD, CISSP Norwich University
Clustering Technology For Scaleability Jim Gray Microsoft Research
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Database Systems: Design, Implementation, and Management
Extreme Performance with Oracle Data Warehousing
Our Digital World Second Edition
1 Introduction to Data Management. Understand: meaning of data management history of managing data challenges in managing data approaches to managing.
Information Systems Today: Managing in the Digital World
Virtualization & Disaster Recovery
Seungmi Choi PlanetLab - Overview, History, and Future Directions - Using PlanetLab for Network Research: Myths, Realities, and Best Practices.
Supervisor : Prof . Abbdolahzadeh
Describing Complex Products as Configurations using APL Arrays.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 2: Capacity.
Database System Concepts and Architecture
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Introduction to Computer Administration Introduction.
Chapter 9: The Client/Server Database Environment
Hardware & the Machine room Week 5 – Lecture 1. What is behind the wall plug for your workstation? Today we will look at the platform on which our Information.
1. SQL Server 2014 In-Memory by Design Arthur Zubarev June 21, 2014.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
Chapter 13 The Data Warehouse
Chapter 14 Designing Distributed and Internet Systems
Page 1 GADD Software & GADD Analytics 1.6 Public version, 2015, gaddsoftware.com GADD Analytics.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv.
Big Data Working with Terabytes in SQL Server Andrew Novick
A comparison of MySQL And Oracle Jeremy Haubrich.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Introduction to DBA.
A Fast Growing Market. Interesting New Players Lyzasoft.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
® IBM India Research Lab © 2006 IBM Corporation Challenges in Building a Strategic Information Integration Infrastructure Mukesh Mohania IBM India Research.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 1: The new mainframe.
5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
An Introduction to Infrastructure Ch 11. Issues Performance drain on the operating environment Technical skills of the data warehouse implementers Operational.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
IT – DBMS Concepts Relational Database Theory.
MySQL Introduction to the MySQL products. Agenda Company Overview Open Source & MySQL Momentum Why MySQL? MySQL OEM, Community & Enterprise offerings.
Database Systems – Data Warehousing
CA ARCserve and CA XOsoft Simplified Pricing Program October 2007.
Database Edition for Sybase Sales Presentation. Market Drivers DBAs are facing immense time pressure in an environment with ever-increasing data Continuous.
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.
Introduction to Database Management. 1-2 Outline  Database characteristics  DBMS features  Architectures  Organizational roles.
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Criteria for D/W Platform Selection Simple Architecture –Easy to deploy the solution with minimal efforts Scalable (Scale Out - Scale Up) –Ability to handle.
 2009 Calpont Corporation 1 Calpont Open Source Columnar Storage Engine for Scalable MySQL Data Warehousing April 22, 2009 MySQL User Conference Santa.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Mapping the Data Warehouse to a Multiprocessor Architecture
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Database Growth: Problems & Solutions.
Thomas Baus Senior Sales Consultant Oracle/SAP Global Technology Center Mail: Phone:
Retele de senzori Curs 1 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Supervisor : Prof . Abbdolahzadeh
Intro to MIS – MGS351 Databases and Data Warehouses
IBM INFORMIX online Training in Hyderabad
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Capitalize on modern technology
Mapping the Data Warehouse to a Multiprocessor Architecture
Ch 4. The Evolution of Analytic Scalability
Introduction to Teradata
Backup Monitoring – EMC NetWorker
Moving your on-prem data warehouse to cloud. What are your options?
Presentation transcript:

Howard Fosdick (630) (C) 2004 FCI Worlds Largest Databases

Who Am I? Hands-on DBA (and SA) for … Oracle, DB2, SQL Server Unix, Linux, Windows Founder IDUG, MWDUG, CAMP Author, Speaker Independent Contractor (630)

Outline 1. Whats a Big Database 2. DSS 3. OLTP 4. Observations

Statistics Sources 1. Winter Corp. -- Database Top Ten -- Yearly survey -- Vendor neutral -- Free at: 2. Survey.com -- High-End BI/DW Competitive Analysis -- Survey of 150 companies w/ big warehouses -- Free at: Thank You to both sources

Classifying Large Databases DSSOLTP Decision Support Systems (DSS) Online Analytical Processing (OLAP) Data Warehouses (DW) Multi-dimensional Databases (MDD) + Query oriented, mainly Read-only Online Transaction Processing (OLTP) + Update with short transactions (transaction = small CPU & data resources) Commercial IT vs. Scientific/Research databases

Whats a Large Database ? Database Size - User data - User data plus metadata & indexes - DASD farm Users - Concurrent users - Total user population Load - Concurrent queries - Queries / day or hour (simple vs complex queries) VLDB = Very Large Database Good definitions and measurements are key to success

II. Worlds Biggest DSS Systems

Data Warehouses VS. Data Marts DWDM Application neutral Service multiple organizational needs Largest systems are usually data warehouses Application specific Organizationally focused

Whats Driving the Growth of Large Data Warehouses ? Web Sites -- - Clickstream data Retail -- - Transaction Level Detail (TLD) !!!!! Super Big Groceries !!!!! Preferred Customer Card # Hello, Im Scot94 03/04/04 02: Store 493 Loc 229 PRETTY-LADY HAIRCLR AARP MAGAZINE DIAPERS BEER SIX-PACK Tax 2.40 BAL Cash Change 3.21 Save this Receipt – Get $2.00 off on Prozac When You Buy Super-Baby Food ! Understanding customer behavior means $$$ !

Whats Driving the Growth of Large Data Warehouses ? Necessary Preconditions -- Cheap Hardware Higher reliability / availability (based on dynamic hardware swapping) Better Software Lax privacy laws in USA EU curtails cross-usage of data EU has stronger privacy laws

Worlds Largest DSS Systems Way bigger than just 3 years ago All Unix mainframes All use SANs (Storage Area Networks) (aka ESS) No IBM Mainframes No Windows or Wintel No SQL Server No Linux or Open Source databases NCR/Teradata niche market at 2.7% (Gartner 05/28/03) Goodbye Informix! © 2003 Winter Corp. Database Size = disk storage for user tables, indices, aggregates

Large DSS Systems Sun E12/15K HP Superdome IBM Regatta Unix mainframe Storage Area Network Query Users EMC Hitachi HP LSI Unix mainframes – + Dynamically add/drop CPUs, RAM (Sun calls it partitioning) + High reliability (as good as clusters or Mainframes) + Capacity on Demand SANs – + Flash (snap) backup (OS-level backup) + Large Cache + Intelligent data placement/movement

Example Evolution – Scaling a Unix Mainframe 8 16 Gig RAM 12 concurrent users Gig RAM Gig RAM 25 concurrent users 35 concurrent users Other upgrades: Oracle 8i -> 9i Sun E10K -> E12K

Worlds Largest DSS Systems -- Windows Way smaller than Unix systems Way bigger than just 3 years ago Oracle vs SQL Server (like market share battle for Windows DBMSs) Also use SANs (Storage Area Networks) No IBM DB2 UDB No Teradata © 2003 Winter Corp.

Worlds Largest DSS Systems -- By Peak Workload © 2003 Winter Corp.

Where did IBM Mainframes Go ? Big Silicon Big Iron + Hello Linux ! + Good for -- + Consolidation platform + Legacy systems + Virtualization (multi-OS platform) Poof! -- Goodbye… -- Largest databases -- Smaller mainframes (VM, VSE) -- Reliability advantage eroded -- High cost per CPU

Oracle Rising Joined the Top Ten list 3 to 5 years ago 8i added essential DSS technologies... + Partitions + New ROW ID (for bigger databases) + Thorough Parallelism (DML, DDL, utilities) + Index improvements (bit mapped IXs, function-based, desc, others) + Resource Manager (proactive) + Materialized Views + Large memory mgmt + Optimizer is Partition-aware + Online DDL operations and Utilities

Example Oracle Warehouses AmazonBest BuyColgateTelecom Italia Mobile System HP SuperdomeSun 15KIBM p690 Regatta HP AlphaServer Architecture SMP Cluster Storage EMC IBMEMC Processors node cluster Oracle Version 9i8i 9i 8i DB Size 13 T6.3 T3.8 T16 T Number of Tables ,0001,200 Detail Data Clickstream data Sales Transaction data Varied detail data Call detail records User Population80016,0006, Concurrent Users DBAs 22n/a3 Peak Workload 4300 queries / day 150,000 queries / 4 hour period 14,200 steps / day 700 M records loaded / day © 2003 Winter Corp.

Why Not Oracle Clustering ? + Great for non-disruptive scaling of existing systems... But the biggest systems tend not to use it -- Unix mainframe no longer requires clustering for reliability, availability or easy scalability -- Clustering means complexity in minimizing the… -- Locking issues 9i improved this via Cache Fusion – but SMP Unix mainframe will still be favored

Wheres SQL Server 2000 ? Big in OLTP but lacks essential DSS technologies Parallelism restricted to SELECTs -- Needs it for other DML, DDL, utilities -- Partitions -- Wintel restriction (Features = partitioning, database mirroring, mirrored backups, online Indexing & Restore, fast recovery, ANSI 1999 T-SQL, CLR support, native XML, XML Query, better.NET support, Reporting Services, Service Broker (async messaging), extensible data types…) Yukon ? -- Many new features... ready for Top Ten DSS ?

Wheres Open Source ? Linux kernel now out + More CPUs (to 16) + More RAM (> 4+ Gig) + Better threading, file system support MySQL and PostgresQL -- Top out at 500,000 page views per day (EWeek 2003) (or 15 per second) + Improving rapidly Prediction – open source will support big databases but not Top Ten list sites

Risks of Large DWs 40% of IT projects fail due to … Management (time & budget issues) Large warehouses are unforgiving -- Survey.com Design issues critical Database Design Query design (and EXPLAINs) ETL design and scheduling Pre-program wherever possible (control users and the resources they use) Monitoring and alerts Scale gradually (staggered loads on a schedule…) Benchmarks (after each Scaling Point)

Risks of Large DWs Partitioning data properly is critical For better physical management (utilities) Optimizers use this info Parallelism via multiple partitions How to partition Depends on data usage Examples: geographical, hash, unique id, ranges…

III. Worlds Biggest OLTP Systems

Worlds Largest OLTP Systems © 2003 Winter Corp. Wintel mainframes arrive ! SQL Server arrives Use SANs CA can do the job (but has tiny overall database market share) Oracle has big systems -- but not in the top ten

Worlds Largest OLTP Systems -- Unix -- Windows © 2003 Winter Corp. © 2003 Winter Corp.

Worlds Largest OLTP Systems -- By Number of Rows © 2003 Winter Corp. © 2003 Winter Corp.

OLTP Observations Wintel mainframes w/ SQL Server displace MVS/CICS SQL Server dominates Wintel OLTP Great for pre-programmed, resource-limited txns Oracle dominates Unix OLTP

IV. Observations

Architectures Large SMP mainframe Shared-disk Clusters Shared-nothing (Massively Parallel Processing or MPP) The architectural debate means far less than it used to !

Vendor Architectures Product : Architecture : Implementation : DB2 UDB for z/OSShared-disk clusteringDB2 Data Sharing on Sysplex DB2 UDB for LUWShared nothingDB2 UDB ESE partitioning feature OracleShared-disk clustering or SMP Real Application Clusters (RAC) -- previously known as Oracle Parallel Server (OPS) SQL Server 2000Shared nothing or SMPCustomer-developed partitioning based on SQL Server features TeradataShared nothing Teradata on NCR MPP

DBMS Licensing Costs Open Source (MySQL, PostgreSQL) SQL Server 2000 DB2 UDB Oracle Teradata Database pricing varies by the options selected and by the deal an IT organization cuts with the vendor. Your mileage may vary! Biggest DSS Systems $$$$$ Biggest OLTP Systems TCO ? + Low-cost SQL Server supports the biggest OLTP systems -- Pressure on Teradata to keep its niche + Open Source DBMSs have a role but its not Top Ten databases $

DW Labor Costs © 2002 Survey.com Like TCO, Labor Costs may be an un-measurable … Figures applicable across sites ? Every vendor claims lowest labor costs Terabytes per DBA may be non-linear! 1 or 2 DBAs for a 24/7 site ? Development staff will be larger than Maintenance staff Your mileage will vary

Multi- Machine Mixed Systems 45 Linux w/ MySQL servers (Transactional updates) EWeek, 2/23/04 Sabre / Travelocity 17 Himalaya Non-stop w/ Master database (Fare look-up and routing)

Multi- Machine Mixed Systems Omaha Steaks 17 Linux w/ MySQL servers (Shopping cart) (Transactional updates) * 50,000 to 68,000 daily sessions * 1 year in Production / 8 Million sessions ISeries DB2 EWeek 2003

Conclusions Databases are growing exponentially IT is closing in on Scientific/Research databases Multiple machine mixed systems are becoming popular (Monolithic central databases are no longer the only game in town) Mixed use databases are becoming more common Multiple applications Read and update Open Source supports large systems -- but not Top Ten VLDBs are instructive – but unique in some ways

? ? ? ? ? questions... ? ? ? ?