Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management in a Highly Connected World James Hamilton Microsoft SQL Server March 3, 2000.

Similar presentations


Presentation on theme: "Data Management in a Highly Connected World James Hamilton Microsoft SQL Server March 3, 2000."— Presentation transcript:

1 Data Management in a Highly Connected World James Hamilton JamesRH@microsoft.com Microsoft SQL Server March 3, 2000

2 2 Agenda Client Tier Client Tier Number of devices Number of devices Device interconnect fabric Device interconnect fabric Standard programming infrastructure Standard programming infrastructure Client tier database issues Client tier database issues Resource requirements Resource requirements Implementation language Implementation language Administrative cost implications Administrative cost implications Development cost implications Development cost implications Middle Tier Middle Tier Server Tier Server Tier Summary Summary

3 3 How Many Clients? 1998 US WWW users (IDC) 1998 US WWW users (IDC) US: 51M; World wide: 131M US: 51M; World wide: 131M 2001 estimates: 2001 estimates: World Wide: 319M users World Wide: 319M users 515M connected devices 515M connected devices ½ billion connected Clients ½ billion connected Clients Conservative estimate based upon conventional device counts Conservative estimate based upon conventional device counts

4 4 Other Device Types TVs, VCRs, stoves, thermostats, microwaves, CD players, computers, garage door openers, lights, sprinklers, appliances, driveway de-icers, security systems, refrigerators, health monitoring, etc. TVs, VCRs, stoves, thermostats, microwaves, CD players, computers, garage door openers, lights, sprinklers, appliances, driveway de-icers, security systems, refrigerators, health monitoring, etc. Sony evangelizing IEEE 1394 Interconnect Sony evangelizing IEEE 1394 Interconnect http://www.sel.sony.com/semi/iee1394wp.html http://www.sel.sony.com/semi/iee1394wp.html http://www.sel.sony.com/semi/iee1394wp.html Microsoft & consortium evangelizing Universal Plug & Play Microsoft & consortium evangelizing Universal Plug & Play www.upnp.org www.upnp.org www.upnp.org WAP: Wireless Application Protocol WAP: Wireless Application Protocol http://www.wap.net/ http://www.wap.net/ http://www.wap.net/

5 5 Device Interconnect Infrastructure Power line control Power line control X10: http://www.x10.org X10: http://www.x10.orghttp://www.x10.org Sunbeam Thalia: Sunbeam Thalia: http://www.thaliaproducts.com/ http://www.thaliaproducts.com/ http://www.thaliaproducts.com/

6 6 Why Connect These Devices? TV guide & auto VCR programming TV guide & auto VCR programming CD label info & song list download CD label info & song list download Sharing data & resources Sharing data & resources Set clocks (flashing 12:00) Set clocks (flashing 12:00) Fire and burglar alarms Fire and burglar alarms Persist thermometer settings Persist thermometer settings Feedback & data sharing based systems: Feedback & data sharing based systems: Temperature control & power blind interaction Temperature control & power blind interaction Occupancy directed heating and lighting Occupancy directed heating and lighting

7 7 Device Communication Implications The need is there The need is there Infrastructure is going in: Infrastructure is going in: Wireless Wireless Power line communications Power line communications Unused twisted pair (phone) bandwidth Unused twisted pair (phone) bandwidth Connectable devices & infrastructure arriving & being deployed Connectable devices & infrastructure arriving & being deployed On order of billions of client devices On order of billions of client devices

8 8 Device Interconnect Example

9 9

10 10 Device Interconnect Example

11 11 Device interconnect Example 660 Gallon MarineAquarium X10 Backbone 130 Gallon F/WAquarium F/WAquarium Bedroom Living Room Den FiltrationPlant HomeSprinklers Windows NT Server Ethernet Backbone Ethernet Hub Deck 56k bps line

12 12 Improvements For Example Cooperation of lighting, A/C and power blind systems Cooperation of lighting, A/C and power blind systems Alarms and remote notification for failures in: Alarms and remote notification for failures in: Circulations pump Circulations pump Heating & cooling Heating & cooling Salinity & other water chemistry changes Salinity & other water chemistry changes Filtration system Filtration system Feedback directed systems Feedback directed systems

13 13 Palmtop Resource Trends Palmtops Ive purchased through the years Palmtops Ive purchased through the years All about same cost & physical size All about same cost & physical size 199219941990199619982000 0.1 1 10 100 2002 Sharp IQ7000 (0.125M) Sharp IQ8300 (0.25M) HP 95LX (0.5M) HP 100LX (1M) HP 200LX (2M) Everex A20 (4m) Casio E105 (32M) Palmtop RAMMoores Law 32M

14 14 O/S Memory Requirements Windows Memory requirements over time Windows Memory requirements over time Desktop RAMMoores Law 198919911987199319951997 0.1 1 10 100 1999 Windows 1.0 (256K) Windows95 (4M) 1985 WFW 3.1 (3M) Windows 2.0 (512K) Windows 3.0 (2M) Windows98 (16M) 128m Windows 2000(64M)

15 15 Smartcard Resource Trends Source: Denis Roberson PIN/Card -Tech/ NCR 199019921996199820002002 Memory Size (Bits) 300 M 1 M 3 K 10 K You are here 2004

16 16 Devices Smaller Than PDAs Qualcomm PDQ Qualcomm PDQ 2 MB total memory 2 MB total memory Same mem curve as PDAs…just 2 to 5 years behind Same mem curve as PDAs…just 2 to 5 years behind Nokia 9000il Nokia 9000il 8 MB total Memory 8 MB total Memory

17 17 Digital Cameras MakeModelMemoryAgfaCL30 60 to 360MB Canon PowerShot S20 6 to 176MB Epson PhotoPC 850Z 10 to 120MB KodakDC-280 32 to 245MB OlympusD-340R 18 to 120MB Panasonic Palmcam PV-SD4090 450 to 1,500MB SanyoVPC-SX500 19 to 120MB

18 18 Resource Trend Implications Device resources at constant cost are growing at super-Moore rates Device resources at constant cost are growing at super-Moore rates Same but 2 to 3 yrs behind desktop system growth Same but 2 to 3 yrs behind desktop system growth Same is true of each class of devices Same is true of each class of devices Telephones trail PDAs but again grow at the same rate Telephones trail PDAs but again grow at the same rate Memory growth is not the problem Memory growth is not the problem However devices always smaller than desktops However devices always smaller than desktops Devices more specialized so resource consumption less … can still run standard vertical app slice Devices more specialized so resource consumption less … can still run standard vertical app slice

19 19 Standard Infrastructure at Client Clearly specialized user interface S/W needed Clearly specialized user interface S/W needed But we have the memory resources to support: But we have the memory resources to support: Standard communications stack (TCP/IP) Standard communications stack (TCP/IP) Standard O/S software Standard O/S software Standard data management S/W with query Standard data management S/W with query Transparent replication Transparent replication Symmetric multi-tiered infrastructure S/W: Symmetric multi-tiered infrastructure S/W: Leverage best development environments Leverage best development environments No need to rewrite millions of redundant lines of code No need to rewrite millions of redundant lines of code More heavily used & tested so less bugs More heavily used & tested so less bugs Better productivity in programming to richer platform Better productivity in programming to richer platform A full DBMS at client both practical & useful A full DBMS at client both practical & useful

20 20 Client-Side Database Issues Honey I shrunk the database (SIGMOD99): Honey I shrunk the database (SIGMOD99): DB Footprint DB Footprint Implementation Language Implementation Language Both issues either largely irrelevant or soon to be: Both issues either largely irrelevant or soon to be: Resource availability trends support standard infrastructure S/W Resource availability trends support standard infrastructure S/W Dominant costs: admin, operations & user training, and programming Dominant costs: admin, operations & user training, and programming Vertical slice of standard apps rather than full custom infrastructure Vertical slice of standard apps rather than full custom infrastructure

21 21 DB Implementation Language Special DB implementation language (Java) argument: Special DB implementation language (Java) argument: centers on auto-installation of S/W infrastructure centers on auto-installation of S/W infrastructure Auto-install is absolutely vital, but independent of implementation language Auto-install is absolutely vital, but independent of implementation language Auto-install not enough: client should be a cache of recently used S/W and data Auto-install not enough: client should be a cache of recently used S/W and data Full DBMS at client Full DBMS at client Client-side cache of recently accessed data Client-side cache of recently accessed data Optimizer selected access path choice: Optimizer selected access path choice: driven by accuracy & currency requirements driven by accuracy & currency requirements balanced against connectivity state & communications costs balanced against connectivity state & communications costs

22 22 Admin Costs Still Dominate 60s large system mentality still prevails: 60s large system mentality still prevails: Optimizing precious machine resources is false economy Optimizing precious machine resources is false economy Admin & education costs more important Admin & education costs more important TCO education from the PC world repeated TCO education from the PC world repeated Each app requires admin and user training…much cheaper to roll out 1 infrastructure across multiple form factors Each app requires admin and user training…much cheaper to roll out 1 infrastructure across multiple form factors Sony PlayStation has 3Mb RAM & Flash Sony PlayStation has 3Mb RAM & Flash Nokia 9000IL phone has 8Mb RAM Nokia 9000IL phone has 8Mb RAM Trending towards 64M palmtop in 2001 Trending towards 64M palmtop in 2001 Vertical app slice resource reqmt can be met Vertical app slice resource reqmt can be met

23 23 Dev Costs Over Memory Costs Specialty RTOS weak dev environments Specialty RTOS weak dev environments Quality & quantity of apps driven by: Quality & quantity of apps driven by: Dev environment quality Dev environment quality Availability of trained programmers Availability of trained programmers Requirement for custom client development & configuration greatly reduces deployment speed Requirement for custom client development & configuration greatly reduces deployment speed Same apps have wide range of device form factors Same apps have wide range of device form factors Symmetric client/server execution environ. Symmetric client/server execution environ. DB components and data treated uniformly DB components and data treated uniformly Both replicated to client as needed Both replicated to client as needed

24 24 Client Side Summary On order of billions connected client devices On order of billions connected client devices Most are non-conventional computing devices Most are non-conventional computing devices All devices include standard DB components All devices include standard DB components Standard physical & logical device interconnect standards will emerge Standard physical & logical device interconnect standards will emerge DB implementation language irrelevant DB implementation language irrelevant Device DB resource consumption much less important than ease of: Device DB resource consumption much less important than ease of: Installation Installation Administration Administration Programming Programming Symmetric client/server execution environment Symmetric client/server execution environment

25 25 Agenda Client Tier Client Tier Middle Tier Middle Tier High Availability via redundant data & metadata High Availability via redundant data & metadata Fault Isolation domains Fault Isolation domains XML XML Mid-tier Caching Mid-tier Caching Server Tier Server Tier Summary Summary

26 26 High Availability is Tough Availability Annual Lost Data Access Number of Nines 90% ~1 week 1 99% <4 days 2 99.9% <9 hours 3 99.99% ~1 hour 4 99.999% ~5 min 5 99.9999% ~30 sec 6

27 27 Server Availability: Heisenbugs Industry good at finding functional errors Industry good at finding functional errors Multi-user & application interactions hard: Multi-user & application interactions hard: Sequences of statistically unlikely events Sequences of statistically unlikely events Heisenbugs (http://research.microsoft.com/~gray/talks) Heisenbugs (http://research.microsoft.com/~gray/talks)http://research.microsoft.com/~gray/talks Testing for these is exponentially expensive Testing for these is exponentially expensive Server stack is nearing 100 MLOC Server stack is nearing 100 MLOC Long testing and beta cycles delay software release Long testing and beta cycles delay software release System size & complexity growth inevitable: System size & complexity growth inevitable: Re-try operation (Microsoft Exchange) Re-try operation (Microsoft Exchange) Re-run operation against redundant data copy (Tandem) Re-run operation against redundant data copy (Tandem) Fail fast design approach is robust but only acceptable with redundant access to redundant copies of data Fail fast design approach is robust but only acceptable with redundant access to redundant copies of data

28 28 The Inktomi Lesson Inktomi web search engine (Brewer --SIGMOD98) Inktomi web search engine (Brewer --SIGMOD98) Quickly evolving software: Quickly evolving software: Memory leaks, race conditions, etc. considered normal Memory leaks, race conditions, etc. considered normal Dont attempt to test & beta until quality high Dont attempt to test & beta until quality high System availability of paramount importance System availability of paramount importance Individual node availability unimportant Individual node availability unimportant Shared nothing cluster Shared nothing cluster Exploit ability to fail individual nodes: Exploit ability to fail individual nodes: Automatic reboots avoid memory leaks Automatic reboots avoid memory leaks Automatic restart of failed nodes Automatic restart of failed nodes Fail fast: fail & restart when redundant checks fail Fail fast: fail & restart when redundant checks fail Replace failed hardware weekly (mostly disks) Replace failed hardware weekly (mostly disks) Dark machine room Dark machine room No panic midnight calls to admins No panic midnight calls to admins Mask failures rather than futile attempt to avoid Mask failures rather than futile attempt to avoid

29 29 Apply to High Value TP Data? Inktomi model: Inktomi model: Scales to 100s of nodes Scales to 100s of nodes S/W evolves quickly S/W evolves quickly Low testing costs and no-beta requirement Low testing costs and no-beta requirement Exploits ability to lose individual node without impacting system availability Exploits ability to lose individual node without impacting system availability Ability to temporarily lose some data W/O significantly impacting query quality Ability to temporarily lose some data W/O significantly impacting query quality Cant loose data availability in most TP systems Cant loose data availability in most TP systems Redundant data allows node loss w/o data availability lost Redundant data allows node loss w/o data availability lost Inktomi model with redundant data & metadata a potential solution Inktomi model with redundant data & metadata a potential solution

30 30 Redundant Data & Metadata TP Point access to data nearly solved problem TP Point access to data nearly solved problem TP systems scale with user number, people on planet, or business size TP systems scale with user number, people on planet, or business size All trending at sub-Moore rates All trending at sub-Moore rates Data analysis systems growing far faster than Moores Law: Data analysis systems growing far faster than Moores Law: Gregs law: 2x every 9 to 12 (SIGMOD98Patterson) Gregs law: 2x every 9 to 12 (SIGMOD98Patterson) Seriously super-Moore implying that no single system can scale sufficiently: clusters are the only solution Seriously super-Moore implying that no single system can scale sufficiently: clusters are the only solution Storage trending to free with access speed limiting factor Storage trending to free with access speed limiting factor Detailed data distribution statistics need to be maintained Detailed data distribution statistics need to be maintained Improve access speed & availability using redundant data (indexes, materialized views, etc.) Improve access speed & availability using redundant data (indexes, materialized views, etc.) Async update for stats, indexes, mat views Async update for stats, indexes, mat views Data paths choice based upon need currency & accuracy Data paths choice based upon need currency & accuracy

31 31 Affordable Availability Web-enabled direct access model driving high availability requirements: Web-enabled direct access model driving high availability requirements: recent high profile failures at eTrade and Charles Schwab recent high profile failures at eTrade and Charles Schwab Web model enabling competition in information access Web model enabling competition in information access Drives much faster server side software innovation which negatively impacts quality Drives much faster server side software innovation which negatively impacts quality Dark machine room approach requires auto-admin and data redundancy Dark machine room approach requires auto-admin and data redundancy Inktomi model (Erik Brewer–SIGMOD98) Inktomi model (Erik Brewer–SIGMOD98) 42% of system failures admin error (Gray) 42% of system failures admin error (Gray) Paging admin at 2am to fix problem is dangerous Paging admin at 2am to fix problem is dangerous

32 32 Client Connection Model/Architecture Server Node Server Cloud Redundant data & metadata Redundant data & metadata Shared nothing Shared nothing Single system image Single system image Symmetric server nodes Symmetric server nodes Any client connects to any server Any client connects to any server All nodes SAN-connected All nodes SAN-connected

33 33 Client Compilation & Execution Model Server Cloud Server Thread Lex analyze Parse Normalize Optimize Code generate Query execute Query execution on many subthreads synchronized by root thread Query execution on many subthreads synchronized by root thread

34 34 Lose node: Lose node: Recompile Recompile Re-execute Re-execute Client Node Loss/Rejoin Server Cloud Execution in progress Execution in progress Rejoin: Rejoin: Node local recovery Node local recovery Rejoin cluster Rejoin cluster Recover global data at rejoining node Recover global data at rejoining node Rejoin cluster Rejoin cluster

35 35 Client Redundant Data Update Model Server Cloud Updates are standard parallel query plans Updates are standard parallel query plans Optimizer manages redundant access paths Optimizer manages redundant access paths Query plan responsible for access plan management: Query plan responsible for access plan management: No significant new technology No significant new technology Similar to materialized view & index updates today Similar to materialized view & index updates today

36 36 Fault Isolation Domains Trade single-node perf for redundant data checks: Trade single-node perf for redundant data checks: Complex error recovery more likely to be wrong than original forward processing code Complex error recovery more likely to be wrong than original forward processing code Many redundant checks are compiled out of retail versions when shipped Many redundant checks are compiled out of retail versions when shipped Fail fast rather than attempting to repair: Fail fast rather than attempting to repair: Bring down node for mem-based data structure faults Bring down node for mem-based data structure faults Dont patch inconsistent data … copies keep system available Dont patch inconsistent data … copies keep system available If anything goes wrong fire the node and continue: If anything goes wrong fire the node and continue: Attempt node restart Attempt node restart Auto-reinstall O/S, DB and recreate DB partition Auto-reinstall O/S, DB and recreate DB partition Mark node dead for later replacement Mark node dead for later replacement

37 37 Data Structure Matters Most internet content is unstructured text Most internet content is unstructured text restricted to simple Boolean search techniques restricted to simple Boolean search techniques Docs have structure, just not explicit Docs have structure, just not explicit Yahoo hand categorizes content Yahoo hand categorizes content indexing limited & human involvement doesnt scale well indexing limited & human involvement doesnt scale well XML is a good mix of simplicity, flexibility, & potential richness XML is a good mix of simplicity, flexibility, & potential richness Structure description language of internet Structure description language of internet DBMSs need to support as first class datatype DBMSs need to support as first class datatype Too few librarians in world Too few librarians in world so all information must be self-describing so all information must be self-describing

38 38 Relational to XML SELECT … FOR XML SELECT … FOR XML FOR XML RAW (return an XML rowset) FOR XML RAW (return an XML rowset) FOR XML AUTO (exploit RI, name matching, etc.) FOR XML AUTO (exploit RI, name matching, etc.) FOR XML EXPLICIT (maximal control) FOR XML EXPLICIT (maximal control) Annotated Schema Annotated Schema Mapping between XML and relational schema expressed in XML Mapping between XML and relational schema expressed in XML Templates Templates Encapsulated parameterized query Encapsulated parameterized query XSL/T support XSL/T support XPATH support XPATH support Direct URL access (SQL owned virtual root) Direct URL access (SQL owned virtual root) SELECT … FOR XML SELECT … FOR XML Annotated schema Annotated schema Templates Templates

39 39 XML to Relational XML bulk load XML bulk load Templates and Annotated Schema Templates and Annotated Schema SQL server hosted XML tree SQL server hosted XML tree Directly insert document into SQL Server hosted XML tree Directly insert document into SQL Server hosted XML tree Select from server hosted XML tree rowset & insert into SQL tables Select from server hosted XML tree rowset & insert into SQL tables XML Data type support XML Data type support Hierarchical full text search Hierarchical full text search

40 40 XML: Example http://SRV1/nwind?sql=SELECT+DISTINCT+Co ntactTitle+FROM+Customers+WHERE+Conta ctTitle+LIKE+'Sa%25'+ORDER+bY+Contact Title+FOR+XML+AUTO Result set:

41 41 Mid-Tier Cache Requirements Non-proprietary multi-lingual programming Non-proprietary multi-lingual programming Symmetric mid-tier & server programming model Symmetric mid-tier & server programming model Non-connected, stateless programming model Non-connected, stateless programming model High scale thread pool based High scale thread pool based Efficient main memory DB support Efficient main memory DB support Full query over local cache Full query over local cache Query over just cached data, or Query over just cached data, or Query over full corpus (server interaction reqd) Query over full corpus (server interaction reqd) Ability to handle network partitions & server failure Ability to handle network partitions & server failure Support for life-time attributed data: Support for life-time attributed data: Transactional (possibly multi-server) Transactional (possibly multi-server) Near real time Near real time Every N time units Every N time units Read only Read only

42 42 Agenda Client Tier Client Tier Middle Tier Middle Tier Server Tier Server Tier Affordable computing by the slice Affordable computing by the slice Everything online Everything online Disk are actually getting slower Disk are actually getting slower Processing moves to storage Processing moves to storage Approximate answers quickly Approximate answers quickly Semi-structured storage support Semi-structured storage support Administrative issues Administrative issues Summary Summary

43 43 Server-Side Changes Server databases more functionally rich than often required Server databases more functionally rich than often required Trend reversal: Trend reversal: Less at the server-tier with richer mid-tier Less at the server-tier with richer mid-tier Focus at back-end shifts to: Focus at back-end shifts to: Reliability, Availability, and Scalability Reliability, Availability, and Scalability Reducing administrative costs Reducing administrative costs Server side trends: Server side trends: Scalability over single-node performance Scalability over single-node performance Everything online Everything online Affordable availability in high scale systems Affordable availability in high scale systems

44 44 Compaq/Microsoft TPC-C Benchmark $20$20 $19$19 $53$53 Enterprise 6500 Solaris 2.6 Oracle 8i v 8.1.6 $13,153,324. $97.10/tpmC tpmC $98$98 Escala EPC2400 AIX 4.3.3 Oracle v8.1.6 $7,462,215 $54.94 tpmC ProLiant 8500 Cluster Windows 2000 SQL Server 2000 $4,341,603. $ 19.12 tpmC These are Top 5 benchmarks as of Feb 17, 2000. NOTE: All TPC-C results reported as of February 17, 2000 IBM RS/6000 S80 AIX 4.3.3 Oracle v 8.1.6 $7,156,910. $52.70/tpmC $55$55 135,815 135,461 135,815 227,079 ProLiant 8500 Cluster Windows 2000 SQL Server 2000 $2,880,431. $ 18.93 tpmC 152,207

45 45 Computing by the Slice Source: TPC report executive summary

46 46 Just Save Everything Able to store all Info produced on earth (Lesk): Able to store all Info produced on earth (Lesk): Paper sources: less than 160 TB Paper sources: less than 160 TB Cinema: less than 166 TB Cinema: less than 166 TB Images: 520,000 TB Images: 520,000 TB Broadcasting: 80,000 TB Broadcasting: 80,000 TB Sound: 60 TB Sound: 60 TB Telephony: 4,000,000 TB Telephony: 4,000,000 TB These data yield 5,000 petabytes These data yield 5,000 petabytes Others estimate upwards of 12,000 petabytes Others estimate upwards of 12,000 petabytes World wide 1998 storage production: 13,000 petabytes World wide 1998 storage production: 13,000 petabytes No need to manage deletion of old data No need to manage deletion of old data Most data never accessed by a human Most data never accessed by a human Access aggregations & analysis, not point fetch Access aggregations & analysis, not point fetch More storage than data allows for greater redundancy: More storage than data allows for greater redundancy: indexes, materialized views, statistics, & other metadata indexes, materialized views, statistics, & other metadata

47 47 Disk are Becoming Black Holes Seagate Cheetah 73 Seagate Cheetah 73 Fast: 10k RPM, 5.6 ms access, 16 MB cache Fast: 10k RPM, 5.6 ms access, 16 MB cache But Very large: 73.4 GB But Very large: 73.4 GB Result? Black hole: 2.4 accesses/sec/gb Result? Black hole: 2.4 accesses/sec/gb Large data caches required Large data caches required Employ redundant access paths Employ redundant access paths

48 48 Processing Moves Towards Storage Trends: Trends: I/O bus bandwidth is bottleneck I/O bus bandwidth is bottleneck Switched serial nets support very high bandwidth Switched serial nets support very high bandwidth Processor/memory interface is bottleneck Processor/memory interface is bottleneck Growing CPU/DRAM perf gap leading to most CPU cycles in stalls Growing CPU/DRAM perf gap leading to most CPU cycles in stalls Combine CPU, serial network, memory, & disk in single package Combine CPU, serial network, memory, & disk in single package E.g. David Patterson ISTORE project E.g. David Patterson ISTORE project

49 49 Processing Moves Towards Storage Each disk forms part of multi-thousand node cluster Each disk forms part of multi-thousand node cluster Redundant data masks failure Redundant data masks failure RAID-like approach RAID-like approach Each cyberbrick commodity H/W and S/W Each cyberbrick commodity H/W and S/W O/S, database, and other server software O/S, database, and other server software Each slice plugged in & personality set Each slice plugged in & personality set E.g. database or SAP app server) E.g. database or SAP app server) No other configuration required No other configuration required On failure of S/W or H/W, redundant nodes pick up workload On failure of S/W or H/W, redundant nodes pick up workload Replace failed components at leisure Replace failed components at leisure Predictive failure models Predictive failure models

50 50 Approximate Answers Quickly DB systems focus on absolute correctness DB systems focus on absolute correctness As size grows, correct answer increasingly expensive As size grows, correct answer increasingly expensive Text search systems depend upon quick approx answer Text search systems depend upon quick approx answer Approx answer with statistical confidence bound: Approx answer with statistical confidence bound: Steadily improve result until user satisfied Steadily improve result until user satisfied Ripple Joins for Online Aggregation (Hellerstein- SIGMOD99)Ripple Joins for Online Aggregation (Hellerstein- SIGMOD99) Allows rapid exploration of large search spaces: Allows rapid exploration of large search spaces: Conventional full accuracy only when needed Conventional full accuracy only when needed Run query on incomplete mid-tier cache? Run query on incomplete mid-tier cache?

51 51 Semi-Structured Storage Support Example applications: Example applications: Directory systems (e.g. Microsoft Active Directory) Directory systems (e.g. Microsoft Active Directory) Document management systems Document management systems Storage characteristics: Storage characteristics: Flexible & sparse schema support Flexible & sparse schema support Fine grained security Fine grained security Recursive query Recursive query Notification based extensibility common Notification based extensibility common XML support important XML support important Particularly difficult to support when native SQL access is also allowed Particularly difficult to support when native SQL access is also allowed Important area for RDBMS expansion Important area for RDBMS expansion

52 52 Examples: Performance W/O Admin Multiple cached plans for different parameter marker sub-domains Multiple cached plans for different parameter marker sub-domains Async statistics gathering Async statistics gathering Async optimization Async optimization Feedback-directed techniques: Feedback-directed techniques: Adapting number of histogram buckets Adapting number of histogram buckets Re-optimizing when cardinality errors discovered during execution Re-optimizing when cardinality errors discovered during execution re-optimize with additional data distribution info gained during previous execution re-optimize with additional data distribution info gained during previous execution Optimizer-created indexing structures: Optimizer-created indexing structures: Add indexes when needed (Exchange & AS/400) Add indexes when needed (Exchange & AS/400)

53 53 Summary After 30 years, DB technology more relevant than ever: After 30 years, DB technology more relevant than ever: Database innovations required at all tiers Database innovations required at all tiers All devices run standard DB components All devices run standard DB components Symmetric multi-tier programming model Symmetric multi-tier programming model Hierarchical caching model Hierarchical caching model Administration including installation disappears Administration including installation disappears All info online & machine accessible All info online & machine accessible Symmetric programming model on all tiers Symmetric programming model on all tiers Redundant data for availability & performance Redundant data for availability & performance Increased dependence on Approximate answers Increased dependence on Approximate answers Support for semi-structured apps Support for semi-structured apps Mid-tier & Client: data moves to the processors Mid-tier & Client: data moves to the processors Server-Tier: Processing moves to data Server-Tier: Processing moves to data

54 Data Management in a Highly Connected World James Hamilton JamesRH@microsoft.com Microsoft SQL Server March 3, 2000


Download ppt "Data Management in a Highly Connected World James Hamilton Microsoft SQL Server March 3, 2000."

Similar presentations


Ads by Google