Research Principles Revealed Jennifer Widom Stanford University.

Slides:



Advertisements
Similar presentations
Creating Tables. 2 home back first prev next last What Will I Learn? List and provide an example of each of the number, character, and date data types.
Advertisements

1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Uncertainty Lineage Data Bases Very Large Data Bases
Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani.
Alert: Transforming Passive to Active DBMS IBM Almaden Research Centner CS561.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 28 Database Systems I The Relational Data Model.
Chapter 1: Data Models and DBMS Architecture Title: What Goes Around Comes Around Authors: M. Stonebraker, J. Hellerstein Pages: 2-40.
1 ICS 223: Transaction Processing and Distributed Data Management Winter 2008 Professor Sharad Mehrotra Information and Computer Science University of.
1 SQL: Structured Query Language (‘Sequel’) Chapter 5 (cont.)
Databases and Database Management System. 2 Goals comprehensive introduction to –the design of databases –database transaction processing –the use of.
Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom.
Representation Formalisms for Uncertain Data Jennifer Widom with Anish Das Sarma Omar Benjelloun Alon Halevy Trio and other participants in the Trio Project.
Cs3431 Triggers vs Constraints Section 7.5. cs3431 Triggers (Make DB Active) Trigger: A procedure that starts automatically if specified changes occur.
Triggers and Active Databases Alexandra Klenova Meghan Russ Josh Sunshine Abe Weinograd.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
MS Access: Database Concepts Instructor: Vicki Weidler.
MIS 710 Module 0 Database fundamentals Arijit Sengupta.
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
Database Technical Session By: Prof. Adarsh Patel.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
XML (with a bias towards query language issues) A boring research topic? A new frontier? A means to keep standards people busy? Prepared by S. Abiteboul.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
The TSIMMIS Approach to Mediation: Data Models and Languages Hector Garcia-Molina Yannis Papakonstantinou Dallan Quass Anand Rajaraman Yehoshua Sagiv Jeffrey.
DBMS 2001Notes 1: Introduction1 Principles of Database Management Systems (Tietokannanhallintajärjestelmät) Pekka Kilpeläinen Fall 2001.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Data-Centric Human Computation Jennifer Widom Stanford University.
Information System Development Courses Figure: ISD Course Structure.
Presenter: Shanshan Lu 03/04/2010
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Design Patterns IX Interpreter, Mediator, Template Method recap.
Intro: 1 What is a Database? Collection of Dynamic Data –Large Large of yesteryear now fits on a PC (small DBs) Many applications require even more (terabytes,
Goals for Presentation Explain the basics of software development methodologies Explain basic XP elements Show the structure of an XP project Give a few.
Jennifer Widom NoSQL Systems Motivation. Jennifer Widom NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  The concept of Data, Information and Knowledge  The fundamental terms:  Database and database system  Database.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
Good Papers and Good Research Jennifer Widom Stanford University Shamelessly drawn from Research Principles Revealed “Research Principles Revealed” Codd.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Jennifer Widom Relational Databases The Relational Model.
Databases Databases are collections of information; our study repeats a theme: Tell the computer the structure, and it can help you! © 2004, Lawrence Snyder.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
2014 Inflection Points in Research and Teaching Jennifer Widom Stanford University October 10, 2014 #GHC
CS422 Principles of Database Systems Stored Procedures and Triggers Chengyu Sun California State University, Los Angeles.
CS3431-B111 CS3431 – Database Systems I Logistics Instructor: Mohamed Eltabakh
1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab.
Mining Data Streams (Part 1)
CS 405G: Introduction to Database Systems
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Introduction to Database Systems CSE 444
Semi-Structured Data and Agile Application Development
Relational Algebra Chapter 4, Part A
Deco: Declarative Crowdsourcing
Course Overview - Database Systems
Teaching slides Chapter 8.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
View and Index Selection Problem in Data Warehousing Environments
The Trio System for Data, Uncertainty, and Lineage: Overview and Demo
Introduction to Database Systems CSE 444
Database Management Systems
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

Research Principles Revealed Jennifer Widom Stanford University

2 But First, Some Thanks  Four Extra-Special People  Superb Students  Terrific Collaborators

3 Extra-Special #1 Laura Haas Hired a PL/logic person with minimal DB experience The Perfect Manager – Mentored instead of managed – Ensured I could devote nearly all of my time to research – Sported a great button

4 Extra-Special #2 Stefano Ceri Incredible run of summer collaborations (IBM and Stanford) Jennifer  Stefano  Success Details Intuition

5 Extra-Special #3 and #4 Hector Garcia-Molina and Jeff Ullman Colleagues, mentors, book co-authors Neighbors, baby-sitters, sailing crew, kids sports photographers, … { Hector, Jeff, Jennifer } Research collaborations in all 2 3 subsets

6 Superb Ph.D. Students

7 Terrific Collaborators* Serge Abiteboul Brian Babcock Elena Baralis Omar Benjelloun Sudarshan Chawathe Bobbie Cochrane Shel Finkelstein Alon Halevy Rajeev Motwani Anand Rajaraman Shuky Sagiv Janet Wiener * Significant # co-authored papers in DBLP

8 Now to the “Technical” Part …

9 Research Principles Revealed 1. Topic Selection 2. The Research 3. Dissemination Disclaimer These principles work for me. Your mileage may vary! Disclaimer These principles work for me. Your mileage may vary!

10 Major Research Areas Active Databases Data Warehousing Semistructured Data “Lore” Data Streams Uncertainty and Lineage “Trio” Constraints Triggers Incremental View Maintenance Incremental View Maintenance Triggers Incremental View Maintenance Lineage

11 Incremental View Maintenance Major Research Areas Active Databases Data Warehousing Semistructured Data “Lore” Data Streams Uncertainty and Lineage “Trio” Constraints TriggersLineage Incremental View Maintenance

12 Finding Research Areas I’m not a visionary (In fact, I’m “anti-visionary”) Never know what my next area will be Some combination of “gut feeling” and luck

13 Finding Research Areas Active Databases Data Warehousing Semistructured Data Data Streams Uncertainty and Lineage Data Integration

14 Finding Research Areas Uncertainty and Lineage

15 Finding Research Topics One recipe for a successful database research project Pick a simple but fundamental assumption underlying traditional database systems Drop it Must reconsider all aspects of data management and query processing – Many Ph.D. theses – Prototype from scratch

16 Example “simple but fundamental assumptions” Schema declared in advance Persistent data sets Tuples contain values Reconsidering “all aspects” Data model Query language Storage and indexing structures Query processing and optimization Concurrency control, recovery Application and user interfaces Finding Research Topics Semistructured data Data streams Uncertain data

17 The Research Itself Critical triple for any new kind of database system Do all of them In this order Cleanly and carefully (a research luxury)  Solid foundations, then implementation Data Model Query Language System

18 Cleanly and carefully Nailing Down a New Data Model

19 Example: “A data stream is an unbounded sequence of [tuple timestamp] pairs” Temperature Sensor 1: [(72) 2:05] [(75) 2:20] [(74) 2:21] [(74) 2:24] [(81) 2:45] … Temperature Sensor 2: [(73) 2:03] [(76) 2:20] [(73) 2:22] [(75) 2:22] [(79) 2:40] … Nailing Down a New Data Model

20 Nailing Down a New Data Model Example: “A data stream is an unbounded sequence of [tuple timestamp] pairs” Temperature Sensor 1: [(72) 2:05] [(75) 2:20] [(74) 2:21] [(74) 2:24] [(81) 2:45] … Temperature Sensor 2: [(73) 2:03] [(76) 2:20] [(73) 2:22] [(75) 2:22] [(79) 2:40] …  Duplicate timestamps in streams?  If yes, is order relevant?

21 Nailing Down a New Data Model Example: “A data stream is an unbounded sequence of [tuple timestamp] pairs” Temperature Sensor 1: [(72) 2:05] [(75) 2:20] [(74) 2:21] [(74) 2:24] [(81) 2:45] … Temperature Sensor 2: [(73) 2:03] [(76) 2:20] [(73) 2:22] [(75) 2:22] [(79) 2:40] …  Are timestamps coordinated across streams? Duplicates? Order relevant?

22 Nailing Down a New Data Model Example: “A data stream is an unbounded sequence of [tuple timestamp] pairs” Temperature Sensor 1: [(72) 2:05] [(75) 2:20] [(74) 2:21] [(74) 2:24] [(81) 2:45] … Temperature Sensor 2: [(73) 2:03] [(76) 2:20] [(73) 2:22] [(75) 2:22] [(79) 2:40] … Sample Query (continuous) “Average discrepancy between sensors” Result depends heavily on model

23 Data Model for Trio Project Relative expressiveness Closure properties Only “complete” model Only understandable models In the end, lineage saved the day In the end, lineage saved the day Possible models R

24 The Research Triple Data Model Query Language System Query Language

25 Query Language Design  Notoriously difficult to publish  But potential for huge long-term impact  Semantics can be surprisingly tricky Cleanly and carefully  Solid foundations, then implementation Data Model Query Language System

26 The IBM-Almaden Years Developing an active rule (trigger) system “We finished our rule system ages ago” Transition tables, Conflicts, Confluence, … “Write Code!”

27 The IBM-Almaden Years Developing an active rule (trigger) system “We finished our rule system ages ago” “ Yeah, but what does it do?”

28 The IBM-Almaden Years Developing an active rule (trigger) system “ Yeah, but what does it do?” “Umm … I’ll need to run it to find out”

29 The IBM-Almaden Years Developing an active rule (trigger) system “Umm … I’ll need to run it to find out” Disclaimer These principles work for me. Your mileage may vary. Disclaimer These principles work for me. Your mileage may vary.

30 Tricky Semantics Example #1 Semistructured data (warm-up) Query: SELECT Student WHERE Advisor=‘Widom’ 123 Susan CS ● ● ● Error? Empty result? Warning?

31 Tricky Semantics Example #1 Semistructured data (warm-up) Query: SELECT Student WHERE Advisor=‘Widom’ 123 Susan CS ● ● ● Lore Empty result Warning

32 Tricky Semantics Example #1 Semistructured data (warm-up) Query: SELECT Student WHERE Advisor=‘Widom’ 123 Garcia Widom ● ● ● Lore Implicit ∃

33 Tricky Semantics Example #2 Trigger 1: WHEN X makes sale > 500 THEN increase X’s salary by 1000 Trigger 2: WHEN average salary increases > 10% THEN increase everyone’s salary by 500 Inserts: Sale(Mary,600) Sale(Mary,800) Sale(Mary,550) How many increases for Mary? If each causes average > 10%, how many global raises? What if global raise causes average > 10%?

34 Tricky Semantics Example #3 Temperature Sensor: [(72) 2:00] [(74) 2:00] [(76) 2:00] [(60) 8:00] [(58) 8:00] [(56) 8:00] Query (continuous): Average of most recent three readings

35 Tricky Semantics Example #3 Temperature Sensor: [(72) 2:00] [(74) 2:00] [(76) 2:00] [(60) 8:00] [(58) 8:00] [(56) 8:00] Query (continuous): Average of most recent three readings System A: 74, 58

36 Tricky Semantics Example #3 Temperature Sensor: [(72) 2:00] [(74) 2:00] [(76) 2:00] [(60) 8:00] [(58) 8:00] [(56) 8:00] Query (continuous): Average of most recent three readings System A: 74, 58 System B: 74, 70, 64.7, 58

37 Tables: Sigmod(year,loc,…) Climate(loc,temp,…) Query: Temperature at SIGMOD 2010 The “It’s Just SQL” Trap Sigmod (year, loc) 2010 London ∥ New York Climate (loc, temp) London[ 55 – 68 ] New York[ 64 – 79 ] SELECT S.temp FROM Sigmod S, Climate C WHERE S.loc = C.loc AND S.year = 2010 SELECT S.temp FROM Sigmod S, Climate C WHERE S.loc = C.loc AND S.year = 2010

38 The “It’s Just SQL” Trap Syntax is one thing (actually it’s nothing) Semantics is another, as we’ve seen ― Semistructured ― Continuous ― Uncertain ―

39 Result Taming the Semantic Trickiness  Reuse existing (relational) semantics whenever possible Uncertain data — semantics of query Q D D D 1, D 2, …, D n possible instances Q on each instance representation of instances Q(D 1 ), Q(D 2 ), …, Q(D n ) (implementation) 30 years of refinement

40 Taming the Semantic Trickiness  Reuse existing (relational) semantics whenever possible Semantics of stream queries Streams Relations Window Istream / Dstream 30 years of refinement Relational

41 Taming the Semantic Trickiness  Reuse existing (relational) semantics whenever possible Active databases: “transition tables” Lore: semantics based on OQL 3 years of refinement

42 System The Research Triple Data Model Query Language System Impact “Write Code!”

43 Truth in Advertising System Data Model Query Language System As research evolves, always revisit all three Cleanly and carefully!

44 Disseminating Research Results  If it’s important, don’t wait No place for secrecy (or laziness) in research Every place for being first with new idea or result Post on Web, inflict on friends SIGMOD/VLDB conferences are not the only place for important work Send to workshops, SIGMOD Record, … Make software available and easy to use Decent interfaces, run-able over web

45 Summary: Five Points 1 Don’t dismiss the types (intuition  visionary) And don’t forget the 2 Data Model + Query Language + System Solid foundations, then implementation 3 QL semantics: surprisingly tricky Reuse existing (relational) semantics whenever possible Intuition Details

46 Summary: Five Points 4 Don’t be secretive or lazy Disseminate ideas, papers, and software 5 If all else fails, try stirring in the key ingredient: Incremental View Maintenance

47 Thank You Serge Abiteboul Brian Babcock Elena Baralis Omar Benjelloun Sudarshan Chawathe Bobbie Cochrane Shel Finkelstein Alon Halevy Rajeev Motwani Anand Rajaraman Shuky Sagiv Janet Wiener