1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003.

1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003

2 Outline 1.Core values for database research. u“Biggest” data. uQuery optimization. 2.Some interesting directions.

3 New Directions 1.Information integration. 2.Stream processing. 3.Semistructured data and XML. 4.Peer-to-peer and grid databases. 5.Data mining.

4 Core Database Values uObvious: we deal with the largest amount of data possible. uLess obvious: very-high-level languages are core. wBig data must be dealt with in uniform ways. uLeast obvious: query optimization essential for success. wCompare APL (failure) with SQL (success).

6 Information Integration uRelated sources of data need to be viewed as one whole. uApplications: catalogs (seeing products from many suppliers), digital libraries, scientific databases, enterprise-wide information resources, etc., etc.

7 Local and Global Schemas uSources each have their own local schema = ways their data is stored, organized, and represented. uIntegration requires a global schema and mechanisms to translate between the global schema and each local schema.

8 Two Approaches 1.Warehousing : Collect data from sources into a “warehouse” periodically. Do queries at the warehouse, while the sources execute transactions invisibly. 2.Mediation : Virtual warehouse processes queries by translating between common schema and local schemas at sources.

9 Warehouse Diagram Warehouse Wrapper Source 1Source 2

10 A Mediator Mediator Wrapper Source 1Source 2 User query Query Result

11 Two Mediation Approaches 1.Query-centric : Mediator processes queries into steps executed at sources. wEnosys sells first example as BEA’s “liquid data.” 2.View-centric : Sources are defined in terms of global relations; mediator finds all ways to build query from views.

12 Very Simple Example uSuppose Dell wants to buy a bus and a disk that share the same protocol. uGlobal schema: Buses(manf,model,protocol) and Disks(manf,model,protocol). uLocal schemas: each bus or disk manufacturer has a (model,protocol) relation --- manf is implied.

13 Example: Query-Centric uMediator might start by querying each bus manufacturer for model-protocol pairs. wThe wrapper would turn them into triples by adding the manf component. uThen, for each protocol returned, mediator would query disk manufacturers for disks with that protocol. wAgain, wrapper adds manf component.

14 Example: View-Centric uSources’ capabilities are defined in terms of the global predicates. wE.g., Hitachi’s disk database could be defined by HitachiView(M,P) = Disks(’Hitachi’,M,P). uMediator discovers all combinations of a bus and disk “view,” joined for equal protocols. w“Answering queries using views” --- the theory for finding all solutions to a query.

15 Research Issues uOptimization, optimization, optimization. wIn query centric systems: how do we choose a plan? E.g., is it better to ask about buses first, or disks? wIn view-centric systems, how do we select a sufficient set of solutions to get most or all of the possible answers?

17 Stream Management Systems uAdds to the relation a stream datatype = infinite sequence of tuples that arrive at a port one-at-a-time. uApplications: Telecom billing, intrusion detection, monitoring Web hits, sensor networks, etc., etc.

18 Stream-DBMS Architecture Conventional relations Scratch storage Query processor Stream inputs Stream outputs Ad-hoc queries Standing queries

19 Stanford Approach (Widom, Motwani) uCentral idea is the window, a relation that is formed from a stream by some rule. wExamples: “last 10 tuples,” “all tuples in the past 24 hours.” uQuery language is SQL-like, with diction for converting a stream to a window to a relation.

20 Example: SELECT … FROM Stream1 [last 10] as Window1,… WHERE Window1.a = 5 AND …

21 Research Challenges uAgain, optimization is central. wNew language constructs and data types make old ideas less useful. uSemantics is not 100% clear. wExample: when you join two windows created with different time limits, what does the result represent in terms of the original streams? It matters if you want to apply algebraic laws to expressions.

22 MIT-Centered Approach (Stonebraker, others) uDefine useful operations on streams. uQuery language is sequence of operations. uOptimization is still the key issue, here in an algebraic setting.

23 Peer-to-Peer and Grids uPeer-to-peer systems are application- level attempts to share information and/or processes. uGrid computing is an attempt to bring P2P support to the operating-system level.

24 P2P Applications 1.File sharing as in Napster, Kazaa, etc. 2.Specific scientific applications: Seti@home, Folding@home. 3.Distributed databases, e.g., digital libraries. 4.Replication within an intranet for high availability.

25 Additional Grid Goals 1.Scientific applications routinely solved using a network of workstations. 2.Reselling of unused cycles. 3.Global resources, e.g., buy your storage over the Internet rather than manage your own local disks.

26 Peer-to-Peer Databases uData is distributed among independent sources. uSimilar to information-integration, but much looser constraints on cooperation. uApplications: sharing of library resources, protection from failures by replication, etc., etc.

27 P2P DBMS Architecture Me Peers My data My clients My requests Requests from others

28 P2P Research Issues 1.Protocols and strategies for trading storage. How do I accept bids for someone to make a copy of my data? Will they keep it forever? Storage auction strategies? 2.Query and search strategies. How far to search? How to manage competing requests?

29 P2P Research Status uEarly successful P2P systems: SETI@home, folding@home. uNapster and others made a lot of progress in architectures, but gave the field a “bad name.” uAnalysis of data-retrieval optimization just beginning.

30 Semistructured Data uThis data model uses trees or graphs instead of relations. uKey application: information integration, where global data is perceived as “flexible objects,” with a variety of fields and structures. uEvolved into XML, XSL, XPATH, XQUERY, etc.

31 Example: Semistructured Data Graph Bud A.B. Gold1995 Maple St.Joe’s M’lob beer bar manf servedAt name addr prize yearaward root The bar object for Joe’s Bar The beer object for Bud Notice unusual data

32 XML and Semistructured Data uXML (Extensible Markup Language) uses a semistructured data model to represent documents. Example: Joe’s Maple St. … …

33 XML Applications uSharing data in standard format. uUsed in Enterprise Information Integration systems as global schema. uStoring data with no fixed schema.

34 Querying XML Data uXQUERY is new standard for querying XML documents. wVery-high-level, similar to SQL. uResearch just beginning on how to optimize queries about XML documents. wTechniques not similar to those applied successfully in SQL systems.

1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003.

Similar presentations

Presentation on theme: "1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003.

Similar presentations

Presentation on theme: "1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003."— Presentation transcript:

Similar presentations

About project

Feedback