Goals of the Course zPurpose: yPrinciples of building database applications yFoundations of database management systems. yIssues in building database systems. yHave fun: databases are not just bunches of tuples. yNot an introduction to the nitty gritty of any specific commerical system.
Grading zPaper homeworks: 25% yVery little regurgitation. yMeant to be challenging (I.e., fun). zTwo programming projects: 40% yWork in pairs. yBuild a database application yBuild an XML query processor zFinal Exam: 25% (currently scheduled for Dec. 14 th ). zIntangibles (e.g., participation): 10%
Textbook zTwo volume collection, available as a pair in the bookstore: zA First Course on Database Systems: y Ullman & Widom zDatabase System Implementation: yGarcia-Molina, Ullman and Widom. zA few comments about the books.
Other Useful Texts zDatabase Management Systems: Ramakrishnan and Gehrke zFoundations of Databases (Abiteboul, Hull & Vianu) zParallel and Distributed DBMS (Ozsu and Valduriez) zTransaction Processing (Gray and Reuter) zDatabase Systems (Silberschatz, Korth and Sudarshan) zPrinciples of Transaction Processing (Bernstein and Newcomer) zReadings in Database Systems (Stonebraker and Hellerstein) zProceedings of SIGMOD, VLDB, PODS conferences.
Real Prerequisites zOperating systems zData structures and algorithms zDistributed systems zComplexity theory zMathematical Logic zKnowledge Representation z User interface design z Programming languages z Artificial Intelligence (Search) z Greek, Hebrew, French
Why use a DBMS? Suppose we are building a system to store the information pertaining to the university. Several questions arise: zhow do we store the data? (file organization, etc.) zhow do we query the data? (write programs…) zmake sure that updates don’t mess things up? zProvide different views on the data? (registrar versus students) zhow do we deal with crashes? Way too complicated! Go buy a database system!
Why Use a DBMS? Large amounts of data (Giga’s, Tera’s) Data is very structured Persistent data Valuable data Performance requirements Concurrent access to the data Restricted access to data All programs manipulate data, so why use a database? Many data manipulation tasks involve recurring operations:
Functionality of a DBMS zPersistent storage management zTransaction management zResiliency: recovery from crashes. zSeparation between logical and physical views of the data. yHigh level query and data manipulation language. yEfficient query processing zInterface with programming languages
Bird’s Eye View of zHow to build a database application zThe different components of a database system.
Building an Application with a Database System zRequirements modeling (conceptual, pictures) yDecide what entities should be part of the application and how they should be linked. zSchema design and implementation yDecide on a set of tables, attributes. yDefine the tables in the database system. yPopulate database (insert tuples). zWrite application programs using the DBMS yway easier now that the data management is taken care of.
address namefield Professor Advises Takes Teaches Course Student namecategory quarter name ssn Conceptual Modeling
Schema Design and Implementation Table Students zNote: Separation of the logical view from the physical view of the data. zNormalization (theory).
Querying a Database zFind all the students who have taken CSE444 in Fall, zS(tructured) Q(uery) L(anguage) yselect E.name yfrom Enroll E ywhere E.course=CS444 and y E.quarter=“Fall, 1997” zQuery processor figures out how to answer the query efficiently. zAn acquired taste… zOther query languages exist (OO, OR, datalog)
Writing Application Code zUse ODBC/JDBC. zCreate a connection with a database. zEmbed SQL in application code. zSpecify transaction borders zMay need physical tuning of the database.
Storage Management zBecomes a hard problem because of the interaction with the other levels of the DBMS: yWhat are we storing? yEfficient indexing, single and multi- dimensional yExploit “semantic” knowledge zIssue: interaction with the operating system. Should we rely on the OS?
TP and Recovery zFor efficient use of resources, we want concurrent access to data. zSystems sometimes crash. ACID zA “real” database guarantees ACID: yAtomicity: all or nothing of a transaction. yConsistency: always leave the DB consistent. yIsolation: every transaction runs as if it’s the only one in the system. yDurability: if committed, we really mean it. zDo we really want ACID?
Data Integration Uniform query capability across autonomous, heterogeneous data sources on LAN, WAN, or Internet
XML: Semi-structured Data yEmerging format for data exchange on the web and between applications. eXtensible Markup Language:
Database Industry zRelational databases are a great success of theoretical ideas. zOracle has a market cap of over $200B zOther players: IBM, MS, Sybase, Informix zTrends: ywarehousing and decision support ydata integration yXML, XML, XML.
Course Outline (cont) zQuery execution: (Zack Ives) yAlgorithms for joins, selections, projections. zQuery Optimization zData Integration zsemi-structured data zTransaction processing and recovery (Phil Bernstein)
Projects zGoal: identify and solve a problem in database systems. z(almost) anything goes. zGroups of 2-3 zGroups assembled end of week 2; zProposals, end of week 3. zTouch base with me: every two weeks. zExample projects on web site. zStart Early.