1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

XML DOCUMENTS AND DATABASES
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Mining Data Streams.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.
Databases. Database Information is not useful if not organized In database, data are organized in a way that people find meaningful and useful. Database.
Zero-programming Sensor Network Deployment 學生:張中禹 指導教授:溫志煜老師 日期: 5/7.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 16 Designing.
1 Information Integration Mediators Semistructured Data Answering Queries Using Views.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
Ch 12 Distributed Systems Architectures
Chapter 14 The Second Component: The Database.
Databases and Database Management System. 2 Goals comprehensive introduction to –the design of databases –database transaction processing –the use of.
1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
Automatic Data Ramon Lawrence University of Manitoba
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
1 Information Integration Mediators Warehousing Answering Queries Using Views.
1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
Client-Server Processing and Distributed Databases
Database Systems Chapter 1 The Worlds of Database Systems.
4/20/2017.
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
Chapter 11 Databases.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Cli/Serv.: JXTA/151 Client/Server Distributed Systems v Objective –explain JXTA, a support environment for P2P services and applications ,
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
CPS 216: Advanced Database Systems Shivnath Babu Fall 2006.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Jeopardy K201 – The Computer In Business Exam Review 2 Manjit Trehan.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
Constraints on Relations Foreign Keys Local and Global Constraints Triggers Following lecture slides are modified from Jeff Ullman’s slides
1 CS1368 Introduction* Relational Model, Schemas, SQL Semistructured Model, XML * The slides in this lecture are adapted from slides used in Standford's.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
1 Introduction Relational Model, Schemas, SQL Semistructured Model, XML The slides were made by Jeffrey D. Ullman for the Introduction to Databases course.
1 DocFlow - kick off Monitoring 1 Distributed Monitoring in P2P Systems Serge Abiteboul, Bogdan Marinoiu INRIA-Futurs and Univ. Paris 11.
1 Information Integration Mediators Warehousing Answering Queries Using Views Slides are modified from Dr. Ullman’s notes.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
Semistructured Data Extensible Markup Language Document Type Definitions Zaki Malik November 04, 2008.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Copyright 2004 John Wiley & Sons, Inc Information Technology: Strategic Decision Making For Managers Henry C. Lucas Jr. John Wiley & Sons, Inc Dinesh.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Chapter 1 Overview of Databases and Transaction Processing.
Christoph F. Eick: Final Words COSC Topics Covered in COSC 3480  Data models (ER, Relational, XML)  Using data models; learning how to store real.
Where Is Database Research Headed?
Foreign Keys Local and Global Constraints Triggers
Relational Algebra Chapter 4, Part A
The Top 10 Reasons Why Federated Can’t Succeed
Chapter 15 QUERY EXECUTION.
CPSC-310 Database Systems
1.1 The Evolution of Database Systems
Relational Algebra Chapter 4, Sections 4.1 – 4.2
MANAGING DATA RESOURCES
2/18/2019.
Database Architecture
Information Integration
Presentation transcript:

1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003

2 Outline 1.Core values for database research. u“Biggest” data. uQuery optimization. 2.Some interesting directions.

3 New Directions 1.Information integration. 2.Stream processing. 3.Semistructured data and XML. 4.Peer-to-peer and grid databases. 5.Data mining.

4 Core Database Values uObvious: we deal with the largest amount of data possible. uLess obvious: very-high-level languages are core. wBig data must be dealt with in uniform ways. uLeast obvious: query optimization essential for success. wCompare APL (failure) with SQL (success).

5 New Directions 1.Information integration. 2.Stream processing. 3.Semistructured data and XML. 4.Peer-to-peer and grid databases. 5.Data mining.

6 Information Integration uRelated sources of data need to be viewed as one whole. uApplications: catalogs (seeing products from many suppliers), digital libraries, scientific databases, enterprise-wide information resources, etc., etc.

7 Local and Global Schemas uSources each have their own local schema = ways their data is stored, organized, and represented. uIntegration requires a global schema and mechanisms to translate between the global schema and each local schema.

8 Two Approaches 1.Warehousing : Collect data from sources into a “warehouse” periodically. Do queries at the warehouse, while the sources execute transactions invisibly. 2.Mediation : Virtual warehouse processes queries by translating between common schema and local schemas at sources.

9 Warehouse Diagram Warehouse Wrapper Source 1Source 2

10 A Mediator Mediator Wrapper Source 1Source 2 User query Query Result

11 Two Mediation Approaches 1.Query-centric : Mediator processes queries into steps executed at sources. wEnosys sells first example as BEA’s “liquid data.” 2.View-centric : Sources are defined in terms of global relations; mediator finds all ways to build query from views.

12 Very Simple Example uSuppose Dell wants to buy a bus and a disk that share the same protocol. uGlobal schema: Buses(manf,model,protocol) and Disks(manf,model,protocol). uLocal schemas: each bus or disk manufacturer has a (model,protocol) relation --- manf is implied.

13 Example: Query-Centric uMediator might start by querying each bus manufacturer for model-protocol pairs. wThe wrapper would turn them into triples by adding the manf component. uThen, for each protocol returned, mediator would query disk manufacturers for disks with that protocol. wAgain, wrapper adds manf component.

14 Example: View-Centric uSources’ capabilities are defined in terms of the global predicates. wE.g., Hitachi’s disk database could be defined by HitachiView(M,P) = Disks(’Hitachi’,M,P). uMediator discovers all combinations of a bus and disk “view,” joined for equal protocols. w“Answering queries using views” --- the theory for finding all solutions to a query.

15 Research Issues uOptimization, optimization, optimization. wIn query centric systems: how do we choose a plan? E.g., is it better to ask about buses first, or disks? wIn view-centric systems, how do we select a sufficient set of solutions to get most or all of the possible answers?

16 New Directions 1.Information integration. 2.Stream processing. 3.Semistructured data and XML. 4.Peer-to-peer and grid databases. 5.Data mining.

17 Stream Management Systems uAdds to the relation a stream datatype = infinite sequence of tuples that arrive at a port one-at-a-time. uApplications: Telecom billing, intrusion detection, monitoring Web hits, sensor networks, etc., etc.

18 Stream-DBMS Architecture Conventional relations Scratch storage Query processor Stream inputs Stream outputs Ad-hoc queries Standing queries

19 Stanford Approach (Widom, Motwani) uCentral idea is the window, a relation that is formed from a stream by some rule. wExamples: “last 10 tuples,” “all tuples in the past 24 hours.” uQuery language is SQL-like, with diction for converting a stream to a window to a relation.

20 Example: SELECT … FROM Stream1 [last 10] as Window1,… WHERE Window1.a = 5 AND …

21 Research Challenges uAgain, optimization is central. wNew language constructs and data types make old ideas less useful. uSemantics is not 100% clear. wExample: when you join two windows created with different time limits, what does the result represent in terms of the original streams? It matters if you want to apply algebraic laws to expressions.

22 MIT-Centered Approach (Stonebraker, others) uDefine useful operations on streams. uQuery language is sequence of operations. uOptimization is still the key issue, here in an algebraic setting.

23 Peer-to-Peer and Grids uPeer-to-peer systems are application- level attempts to share information and/or processes. uGrid computing is an attempt to bring P2P support to the operating-system level.

24 P2P Applications 1.File sharing as in Napster, Kazaa, etc. 2.Specific scientific applications: 3.Distributed databases, e.g., digital libraries. 4.Replication within an intranet for high availability.

25 Additional Grid Goals 1.Scientific applications routinely solved using a network of workstations. 2.Reselling of unused cycles. 3.Global resources, e.g., buy your storage over the Internet rather than manage your own local disks.

26 Peer-to-Peer Databases uData is distributed among independent sources. uSimilar to information-integration, but much looser constraints on cooperation. uApplications: sharing of library resources, protection from failures by replication, etc., etc.

27 P2P DBMS Architecture Me Peers My data My clients My requests Requests from others

28 P2P Research Issues 1.Protocols and strategies for trading storage. How do I accept bids for someone to make a copy of my data? Will they keep it forever? Storage auction strategies? 2.Query and search strategies. How far to search? How to manage competing requests?

29 P2P Research Status uEarly successful P2P systems: uNapster and others made a lot of progress in architectures, but gave the field a “bad name.” uAnalysis of data-retrieval optimization just beginning.

30 Semistructured Data uThis data model uses trees or graphs instead of relations. uKey application: information integration, where global data is perceived as “flexible objects,” with a variety of fields and structures. uEvolved into XML, XSL, XPATH, XQUERY, etc.

31 Example: Semistructured Data Graph Bud A.B. Gold1995 Maple St.Joe’s M’lob beer bar manf servedAt name addr prize yearaward root The bar object for Joe’s Bar The beer object for Bud Notice unusual data

32 XML and Semistructured Data uXML (Extensible Markup Language) uses a semistructured data model to represent documents. Example: Joe’s Maple St. … …

33 XML Applications uSharing data in standard format. uUsed in Enterprise Information Integration systems as global schema. uStoring data with no fixed schema.

34 Querying XML Data uXQUERY is new standard for querying XML documents. wVery-high-level, similar to SQL. uResearch just beginning on how to optimize queries about XML documents. wTechniques not similar to those applied successfully in SQL systems.