Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Commodity Database Servers Jim Gray Microsoft Research

Similar presentations


Presentation on theme: "1 Commodity Database Servers Jim Gray Microsoft Research"— Presentation transcript:

1 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com http://Research.Microsoft.com/~Gray/talks

2 2 Outline Status report on Commodity Server Performance Why Most VLDBs will be Multi-Media Servers Preview of Microsofts SQL Server 7

3 3 Status Report on Commodity Server Performance Standards: –TPC, –SpecWeb,... Product benchmarks: e.g. –SAP, –PeopleSoft,… Both indicate that –NT is 18 months behind Unix-SMP performance –but clusters can make up the difference

4 4 TPC-C SMP HP 9000 16 cpu, Sybase 11 52.1 ktpmC, 82$/tpmC NEC 8 cpu SQL Server 14.9 ktpmC, 60$/tpmC Cluster IBM SP2 12x8 cpu Oracle 8.2 57 ktpmC, 148$/tpmc Predict: large & inexpensive NT cluster number this year.

5 5 TPC-D Performance Champions:NCR/Teradata –1 TB:32x4 node clusters –300 GB: 24x4 node cluster –100 GB: 8x4 cluster All use Teradata software on NCR World-Mark Intel-based hardware

6 6 Outline Status report on Commodity Server Performance Why Most VLDBs will be Multi-Media Servers Preview of Microsofts SQL Server 7

7 7 VLDB Reality Test California DMV –~ 20 million cars, drivers, doctors, barbers,.. –Some drivers have moving violations –DMV knows about 1.5 KB about each one –30 GB total. Microsoft : too big says DoJ –40B$ revenue (in company life time) –~1 billion unit sales: @ 100 B = 100 GB –~100 M customers: @1 KB = 100 GB Wall Mart (no one bigger!) –Sells 10 B items per year –100 bytes/item => 1 TB ATT –300 M calls per day (peak day) –10 B calls per year –100 b/call = 1 TB

8 8 VLDB Reality Test Its HARD to find 1 TB of transaction data –100 M web hits/day –250 B/hit –1TB/year Its HARD to find 1TB of text data –100 M web pages –10 KB/page –= 1 TB How do they do it? Lots of indices? –No: that is only 3x Precomputed Aggregates? –Yes: OLAP benchmark Start at 30 MB Use 2.7 GB or 6GB database –But: this is dumb Email? –Microsoft: 6 TB –Hotmail: 3.5 TB –AOL?

9 9 Data Tidal Wave Seagate 47GB drive @ 3k$ –100 GB penny per MB drive coming in 2000 10 $/GB = 10 k$/ Terabyte! (in y2k) –Everyone can afford one Whats a terror bite ? –If you sell ten billion items a year (e.g Wal-Mart) –And you record 100 bytes on each one –Then you got a Terror Bite Where will the terror bytes come from? –Multimedia (like the TerraServer) and...

10 10 Multi Media: Very Large DBs Photo is 100 KB, not 100 B –So, photo DBs are 1,000x larger Examples: –Scanned documents –Photo records of products/people/places –Surveillance –Scientific monitoring

11 11 Some TerrorByte Databases EOS/DIS (picture of planet each week) –15 PB by 2007 Federal Reserve Clearing house: images of checks –15 PB by 2006 (7 year history) Sloan Digital Sky Survey: –40 TB raw, 2 TB cooked TerraServer:

12 12 Scaleup - Big Database Build a 1 TB SQL Server database –Show off Windows NT and SQL Server scalability –Stress test the product Data must be –1 TB –Unencumbered –Interesting to everyone everywhere –And not offensive to anyone anywhere Loaded –1.1 M place names from Encarta World Atlas –1 M Sq Km from USGS (1 meter resolution) –2 M Sq Km from Russian Space agency (2 m) Will be on web (worlds largest atlas) Sell images with commerce server. USGS CRDA: 3 TB more coming.

13 13 TerraServer Worlds Largest PC! 324 disks (2.9 terabytes) 8 x 440Mhz Alpha CPUs 10 GB DRAM NT EE & SQL 7.0 Photo of the planet USGS and Russian images

14 14 Background Earth is 500 Tera-meters square –USA is 10 tm 2 100 TM 2 land in 70ºN to 70ºS We have pictures of 6% of it –3 tsm from USGS –2 tsm from Russian Space Agency Compress 5:1 (JPEG) to 1.5 TB. Slice into 10 KB chunks Store chunks in DB Navigate with –Encarta Atlas globe gazetteer –StreetsPlus in the USA 40x60 km 2 jump image 20x30 km 2 browse image 10x15 km 2 thumbnail 1.8x1.2 km 2 tile Someday –multi-spectral image –of everywhere –once a day / hour

15 15 USGS Digital Ortho Quads (DOQ) US Geologic Survey 3 TeraBytes Most data not yet published Based on a CRADA –TerraServer makes data available. USGS DOQ 1x1 meter 4 TB Continental US New Data Coming

16 16 Russian Space Agency (SovInfomSputnik) SPIN-2 (Aerial Images is Worldwide Distributor) 1.5 Meter Geo Rectified imagery of (almost) anywhere Almost equal-area projection De-classified satellite photos (from 200 KM), More data coming (1 m) Want to sell imagery on Internet. Putting 2 tm 2 onto TerraServer. SPIN-2

17 17 http://www.TerraServer.com Demo Microsoft BackOffice SPIN-2

18 18 1TB Database Server AlphaServer 8400 4x400. 10 GB RAM 324 StorageWorks disks 10 drive tape library (STC Timber Wolf DLT7000 ) SPIN-2 Hardware

19 19 broswer HTML Java Viewer The Internet Web Client Microsoft Automap ActiveX Server Internet Info Server 4.0 Image Delivery Application SQL Server 7 Microsoft Site Server EE Internet Information Server 4.0 Image Provider Site(s) Terra-Server DB Automap Server Sphinx (SQL Server) Terra-Server Stored Procedures Internet Information Server 4.0 Image Server Active Server Pages MTS Terra-Server Web Site Software

20 20 Backup and Recovery –STC 9717 Tape robot –Legato NetWorker –Sphinx Backup/Restore Utility –Clocked at 80 MBps!! SQL Server Enterprise Mgr –DBA Maintenance –SQL Performance Monitor System Management & Maintenance

21 21 TerraServer File Group Layout Convert 324 disks to 28 RAID5 sets plus 28 spare drives Make 4 NT volumes (RAID 50) 595 GB per volume Build 30 20GB files on each volume DB is File Group of 120 files E: F: G: H:

22 22 Gazetteer Design Classic Snowflake Schema Fast First hint to Optimizer

23 23 Image Data Design Image pyramid stored in DBMS (250 M recs)

24 24 Image Delivery and Load DLT Tape tar \ DropN DoJob Wait 4 Load LoadMgr DB 100mbit EtherSwitch 108 9.1 GB Drives Enterprise Storage Array Alpha Server 8400 108 9.1 GB Drives 108 9.1 GB Drives STC DLT Tape Library 60 4.3 GB Drives Alpha Server 4100 ESA Alpha Server 4100 LoadMgr DLT Tape NT Backup ImgCutter \ DropN \Images 10: ImgCutter 20: Partition 30: ThumbImg 40: BrowseImg 45: JumpImg 50: TileImg 55: Meta Data 60: Tile Meta 70: Img Meta 80: Update Place... LoadMgr

25 25 SQL 7 Testimonial We started using it March 4 1997 –SQL 7 Pre-Alpha –SQL 7 Alpha –SLQ 7 Beta 1 –SQL 7 Beta Loaded the DB twice –(we made application mistakes) Now doing it right Reliability: Great! SQL 7 never lost data Ease of use: Great! Functionality: Great!

26 26 Outline Status report on Commodity Server Performance Why Most VLDBs will be Multi-Media Servers Preview of Microsofts SQL Server 7

27 27 SQL 7: Easy & FunctionalEasy Dynamic self management Dynamic self management Multi-site management Multi-site management Alert/response management Alert/response management Job scheduling and execution Job scheduling and execution Scriptable management Scriptable management profiling/tuning tools profiling/tuning tools Fully Unicode Fully Unicode English Language Query English Language Query Integrated text search engine Integrated text search engine Scalability Data Warehousing

28 28 Made It Easier! (fewer knobs) Desktop & Workgroups –Auto Configure Engine / Dynamic Disk/memory –Reduce Learning Curve, Increase Productivity –Self-Managing SQLAgent, Wizards, Task Pads Large Organizations –Deploy/manage hundreds of SQL Servers –Lower TOC for Large Environments –Multi-Server Operations/ Lights-out Environment

29 29 Admin servers from one place Automate simple stuff Wizards for common stuff Manage arrays of servers –operations, security,… –Replication –Import/export Interface is scriptable –COM object model –Script with Java, VB,... Scheduling and Multi-step jobs Multi-Site Management

30 30 DBA and Developer Tools Built-in GUI –data/schema design –data query & edit –intgrated with programming tools SQL Server Profiler –Selected server events and trace criteria –Capture output to screen or replay SQL Server Expert –Analyzes actual server usage history –Makes recommendations to improve performance –Recommends Index design –Recommends operations procedures

31 31 Wizards and GUIs Wizards galore (over 50 at last count) MS Access as a query interface Built-in data access tools (integrated with tools) Graphical show plan

32 32 Many New Wizards... Create a Database Scheduled Backup Create a Maintenance Plan Create a Scheduled Job Create an Alert Security Wizard Import Data to SQL Server Export Data From SQL Server Clustering (Wolfpack) Index Tuning Wizard - Web Assistant - Register Servers - Configure Replication - Create Publication - Create Pull Subscription - Create Push Subscription - Replication Partitioning - Create an Index - Create a Stored Procedure - Create a View - More to come...

33 33 Distributed Management Objects (SQL-DMO) COM Interfaces for administering SQL Server –Embedded Administration (no UI) All Administration Functions Supported –Server, Database Configurations, Settings –Object Creation, Security, Replication, Scripting,.. –40+ Objects, 1000+ properties and methods Integration Interface for ISV Administration –I.e., Baan using DMO for Scripted App Install Scripting Via VBA and Jscript + DCOM

34 34 DMO: Object Model (Overview) Users Databases Logins DB Options Configurations Alerts Operators Tasks Jobs SQLAgent Transaction Logs Publications Remote Login Linked Servers Columns Indexes View Stored Procs Table Files FileGroups Keys (PK/FK) Triggers Rules Defaults SQL Server

35 35 DMO Scripting Backup a Database Set MyServer = CreateObject("SQLDMO.SQLServer")Create Server Object Set MyBackup = CreateObject("SQLDMO.Backup") Create Backup Object MyServer.Name = MSSALES Identify Server MyServer.LoginSecure = True Windows NT Auth MyServer.Connect Connect MyBackup.Database = SALESII Database to backup MyBackup.Files = "\\MyServer\Backups\" _ Backup Location + MyBackup.Database +.bak Name Backup File MyBackup.SQLBackup MyServer Back it Up MyServer.Disconnect Were Done!

36 36 Scalability Win9x/NTW version Win9x/NTW version Dynamic row-level locking Dynamic row-level locking Improved query optimizer Improved query optimizer Intra-query parallelism Intra-query parallelism 64-bit support 64-bit support Replication Replication Distributed query Distributed query High Availability Clusters High Availability Clusters Easy Scalability Data Warehousing

37 37 Scale Down to Windows 95-98 Full function ( same as NTW ) Self managing Many tools Integration with Next MS Access Great for imbedded apps

38 38 Replication Transactional and Merge Remote update ODBC and OLE DB subscribers Wizards Performance 2PC,RPC Subscriber DB2 CICS Subscriber Subscriber VSAM OS 390 DB2 Publisher Updating Subscriber (immediate updates) Distributor Subscriber

39 39 # of emp. per group# of emp. per group total inc. per grouptotal inc. per group Local Agg. 4 x 50 rows + +++ Disks 50,000 rows Global Agg. Result 50 rows + Parallel Query SMP & Disk Parallelism Plus Distributed Plus Hash Join (fanciest on the planet) Plus Optimized Partitioned views

40 40 Distributed Heterogeneous Queries Data Fusion / Integration Join spread sheets, databases, directories, Text DBs etc. Any source that exposes OLE DB interfaces SQL Server as gateway, even on the desktopDatabase (DB2, VSAM, Oracle, …) Spreadsheet Photos Mail Maps Documents and the Web Directory Service SQL 7.0 Query Processor

41 41 Utilities The Key to LARGE Databases Backup –Fuzzy –Parallel –Incremental –Restartable Recovery –Fast –File granularity Reorganize –shrinks file –reclusters file Auto-repair

42 42 Data Warehousing Warehousing Framework Warehousing Framework Visual data modeler Visual data modeler Microsoft repository Microsoft repository Data transformation services (DTS) Data transformation services (DTS) Plato & Dcube - Multi Dimensional Data Cubes Plato & Dcube - Multi Dimensional Data Cubes English query 2.0 English query 2.0 Built-in text-index engine Built-in text-index engine Easy Scalability Data Warehousing

43 43 Key Microsoft Data Warehouse Programs Data Warehouse Framework (DWF) –Process -- for building, using and managing –Pipeline -- for metadata flow –Protocols -- to integrate components Data Warehouse Alliance (DWA) –Partners -- ISVs pledged to the framework and its parts –Products -- complete spectrum from Microsoft and third-parties

44 44 Microsoft Data Warehousing Framework Operational Data (OLE-DB **) Operational Data (OLE-DB **) Data Warehouse Design (logical/physical schema*/ data flow**) Data Warehouse Design (logical/physical schema*/ data flow**) End-User Tools (Excel**, Access, English Query) End-User Tools (Excel**, Access, English Query) Data Warehouse Management (Console*, Scheduling**, Events**,Topology*,) Data Warehouse Management (Console*, Scheduling**, Events**,Topology*,) Data Transformations (DTS**) Data Transformations (DTS**) Data Marts (SQL Server** & OLAP Server**) Data Marts (SQL Server** & OLAP Server**) OLE-DB** BuildingUsing Managing ** available in SQL Server 7 (* partially) Meta-Data FlowData Flow Microsoft Repository** (Persistent Shared Meta-Data) DB Schema** Transformation** Scheduling OLAP Data Mart Design** (Cubes/Star schema) Data Mart Design** (Cubes/Star schema)

45 45 Alliance for Data Warehousing BMC Data Mirror Execusoft Informatica Microsoft Platinum Technology Praxis Prism Sagent SAS Sterling V-Mark Andyne Business Objects Cognos IQ Software Microsoft NCR Data Mining Pilot Platinum Technology Sagent SAS Seagate Wall Data DW BuildDW Access Technical and marketing relationship Supports SQL Server storage engine Third-party products tested with BackOffice

46 46 DW Alliance Milestones 9/96 - Launched with 8 founding members 3/97 - Design review 1/97 - 6/97 - Expanded to 21 members 7/97 - Repository design review –Team development of shared metadata 9/97 - OLE DB for OLAP API specification 1H98 - Integration development with Sphinx DTS and Replication APIs

47 47 Microsoft Repository Based on joint Sterling/Microsoft design (Shipped 97Q2) Wide distribution:VB, Visual Studio and Third-Parties Designed with over 60 vendors Extended to support DB schema, transformations, OLAP –Key element of the DW Framework UML is abstract model Everything viewable in UML terms UML Unified Modeling Language UMX Uml Extensions CDE Component Descriptions COM Component Object Model DBM Database Model DTM Data Type Model GEN Generic SQL Microsoft SQL Server OCL Oracle UML UMX CDE COM DBMDTMGEN SQL OCL

48 48 Repository & Data Warehousing Common infrastructure -- the meta-data pipeline Supports interoperability between data warehousing tools and products Process: –Initial spec developed with 12 vendors –Gathering feedback now –Final spec review in Redmond, 2/98

49 49 Data TransformationRepositoryMetadata Transforms Oracle > SQL Server Oracle > SQL Server Function Example() Transform() If DTSSource(CreditRating) = 1 then DTSDestination( Risk ") = Good" Else If DTSSource(Credit") = 2 DTSDestination( Risk ") = Average Else If DTSSource(Credit") = 3 DTSDestination( Risk ") = Bad Else Example = DTS_SkipRow End if End Function TransformationObjects ActiveX Scripts SQLAgentMultiserverOperations Data Pump IDTSDataPump IUnknown Workflow system manages Data Pump –Pre-defined transforms using the DTS GUI –Procedural VB Script, JavaScript, VBA, any COM Multi-stream in, Multi-stream out

50 50 Transformations Data quality and validation –Missing values, scrubbing, exception handling Data integration –Heterogeneous query, join keys, elim. dups Transforms –Combine/decompose multiple columns to one Aggregation Central metadata –Business rules, data lineage

51 51 Flexible Architecture Debates between MOLAP and ROLAP vendors obscure customer needs Plato is the product that best supports MOLAP, ROLAP and Hybrid and offers the most seamless integration of all three Users & apps only see cubes MOLAP User View Data load Persistent Store User View Data access MD Cache ROLAP User View MD Cache Hybrid

52 52 Source table Partition 1 ROLAP Partition 2 Partition 3 ROLAP Europe USA Asia MD SQL SQL Plato and Dcube and HOLAPPlato server server Plato Designer Dcube Client app User 1 Dcube Client app User 2 CHEVY FORD 1990 1991 1992 1993 RED WHITE BLUE By Color By Make & Year By Color & Year By Make By Year Sum

53 53 How Plato Handles Data Explosion Fact Table Quarter Product Family Quarter Product Month Product Family Month Products Aggregation Wizard finds the aggregations that feed the most other aggregations

54 54 How Plato Handles Data Explosion Aggregation Wizard finds the 80-20 rule in the data –The 20 percent of all possible pre-aggregations that provide 80 percent of the performance gain –Analyses level counts for each dimensions and parent-child ratios for each level Independent of OLAP data model

55 55 OLE DB For OLAP OLE DB extensions to access MD data –Part of OLE DB 2.0 One new object: Dataset Enhancements to existing objects Heavily leverages OLE DB

56 56 OLE DB For OLAP Objects And Interfaces Command CoCreateInstance Enumerator Data source Session Schema Rowsets Flattened Rowset Range Rowset Dataset

57 57 English Query

58 58 OBJECT RELATIONAL The Next Great DBMS Wave All the DB vendors are adding objects Microsoft is adding DBs to Objects Integration with COM+ Gives user-defined types and objects Plug-ins will be Billion dollar industry –Blades for SQL Server razor

59 59 Outline Status report on Commodity Server Performance Why Most VLDBs will be Multi-Media Servers Preview of Microsofts SQL Server 7


Download ppt "1 Commodity Database Servers Jim Gray Microsoft Research"

Similar presentations


Ads by Google