Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC 534A – Background Rachel Pottinger January 13 and 18, 2005.

Similar presentations


Presentation on theme: "CPSC 534A – Background Rachel Pottinger January 13 and 18, 2005."— Presentation transcript:

1 CPSC 534A – Background Rachel Pottinger January 13 and 18, 2005

2 Administrative notes Please note you’re supposed to sign up for one paper presentation and one discussion… for different papers Please sign up for the mailing list WebCT has been populated – make sure you can access it HW 1 is on the web, due beginning of class a week from today

3 Overview of the next two classes Relational databases Entity Relationship (ER) diagrams Object Oriented Databases (OODBs) XML Other data types Database internals (Briefly) An extremely brief introduction to category theory Metadata management examples are interspersed

4 Relational Database Basics What’s in a relational database? Relational Algebra SQL Datalog

5 Relational Data Representation PName Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $ photography Canon MultiTouch $ household Hitachi Tuples or rows Attribute names or columns Relation or table

6 Relational schema representation Every attribute has an atomic type (e.g., Char, integer) Relation Schema: Column headings: relation name + attribute names + attribute types Product(VarChar PName, real Price, VarChar Category, VarChar Manfacturer) often types are left off: Product(PName, Price, Category, Manfacturer) Relation instance: The values in a table. Database Schema: a set of relation schemas in the database. Database instance: a relation instance for every relation in the schema.

7 Querying – Relational Algebra Select (  )- chose tuples from a relation Project (  )- chose attributes from relation Join ( ⋈ ) - allows combining of 2 relations Set-difference (  ) Tuples in relation 1, but not in relation 2. Union (  ) Cartesian Product (×) Each tuple of R1 with each tuple in R2

8 Find products where the manufacturer is GizmoWorks PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks Selection: σ Manufacturer = GizmoWorks Product

9 Find the Name, Price, and Manufacturers of products whose price is greater than 100 PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product PNamePriceManufacturer SingleTouch$149.99Canon MultiTouch$203.99Hitachi Selection + Projection: π Name, Price, Manufacturer (σ Price > 100 Product)

10 Find the product names and price of products that cost less than $200 and have manufacturers where there is a Company that has a CName that matches the manufacturer, and its country is Japan PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CnameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan PNamePrice SingleTouch$ π PName, Price ((σ Price < 200 Product) ⋈ Manufacturer = Cname (σ Country = ‘Japan’ Company))

11 When are two relations related? You guess they are I tell you so Constraints say so A key is a set of attributes whose values are unique; we underline a key Product(PName, Price, Category, Manfacturer) Foreign keys are a method for schema designers to tell you so A foreign key states that an attribute is a reference to the key of another relation ex: Product.Manufacturer is foreign key of Company Gives information and enforces constraint

12 SQL Data Manipulation Language (DML) Query one or more tables Insert/delete/modify tuples in tables Data Definition Language (DDL) Create/alter/delete tables and their attributes Transact-SQL Idea: package a sequence of SQL statements  server

13 Querying – SQL Standard language for querying and manipulating data Structured Query Language Many standards out there: ANSI SQL SQL92 (a.k.a. SQL2) SQL99 (a.k.a. SQL3) Vendors support various subsets of these What we discuss is common to all of them

14 SQL basics Basic form: (many many more bells and whistles in addition) Select attributes From relations (possibly multiple, joined) Where conditions (selections)

15 SQL – Selections SELECT * FROM Company WHERE country=“Canada” AND stockPrice > 50 Some things allowed in the WHERE clause: attribute names of the relation(s) used in the FROM. comparison operators: =, <>,, = apply arithmetic operations: stockPrice*2 operations on strings (e.g., “||” for concatenation). Lexicographic order on strings. Pattern matching: s LIKE p Special stuff for comparing dates and times.

16 SQL – Projections SELECT name AS company, stockPrice AS price FROM Company WHERE country=“Canada” AND stockPrice > 50 SELECT name, stock price FROM Company WHERE country=“Canada” AND stockPrice > 50 Select only a subset of the attributes Rename the attributes in the resulting table

17 SQL – Joins SELECT name, store FROM Person, Purchase WHERE name=buyer AND city=“Vancouver” AND product=“gizmo” Product ( name, price, category, maker) Purchase (buyer, seller, store, product) Company (name, stock price, country) Person( name, phone number, city)

18 Selection: σ Manufacturer = GizmoWorks (Product) PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks What’s the SQL?

19 Selection + Projection: π Name, Price, Manufacturer ( σ Price > 100 Product) PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product PNamePriceManufacturer SingleTouch$149.99Canon MultiTouch$203.99Hitachi What’s the SQL?

20 π PName, Price (( σ Price <= 200 Product) ⋈ Manufacturer = Cname ( σ Country = ‘Japan’ Company)) PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CnameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan PNamePrice SingleTouch$ What’s the SQL?

21 More SQL – Outer Joins What happens if there’s no value available? PNameCountry GizmoUSA PowergizmoUSA SingleTouchJapan MultiTouchJapan PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CnameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan Select pname, Country From Product, Company Where Manufacturer = Cname Foo$1.99GadgetsBar Select pname, Country From Product outer join Company on Manufacturer = Cname FooNULL

22 Querying – Datalog Enables expressing recursive queries More convenient for analysis Some people find it easier to understand Without recursion but with negation it is equivalent in power to relational algebra and SQL Limited version of Prolog (no functions)

23 Datalog Rules and Queries A datalog rule has the following form: head :- atom1, atom2, …, atom,… You can read this as then :- if... ExpensiveProduct(N) :- Product(N,M,P) & P > $100 CanadianProduct(N) :- Product(N,M,P) & Company(M, “Canada”, SP) IntlProd(N) :- Product(N,M,P) & NOT Company(M, “Canada”, SP) Distinguished variable Existential variables Subgoal or EDB Negated subgoal - also denoted by ¬ constant Arithmetic comparison or interpreted predicate Head or IDB

24 Conjunctive Queries A subset of Datalog Only relations appear in the right hand side of rules No negation Functionally equivalent to Select, Project, Join queries Very popular in modeling relationships between databases

25 Selection: σ Manufacturer = GizmoWorks (Product) PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks What’s the Datalog?

26 Selection + Projection: π Name, Price, Manufacturer ( σ Price > 100 Product) PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product PNamePriceManufacturer SingleTouch$149.99Canon MultiTouch$203.99Hitachi What’s the Datalog?

27 π Pname,Price (( σ Price <= 200 Product) ⋈ Manufacturer = Cname ( σ Country = ‘Japan’ Company)) PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CnameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan PNamePrice SingleTouch$ What’s the Datalog?

28 Bonus Relational Goodness: Views Views are relations, except that they are not physically stored. (Materialized views are stored) They are used mostly in order to simplify complex queries and to define conceptually different views of the database to different classes of users. Used also to model relationships between databases View: purchases of telephony products: CREATE VIEW telephony-purchases AS SELECT product, buyer, seller, store FROM Purchase, Product WHERE Purchase.product = Product.name AND Product.category = “telephony”

29 Summarizing Relational DBs Relational perspective: Data is stored in relations. Relations have attributes. Data instances are tuples. SQL perspective: Data is stored in tables. Tables have columns. Data instances are rows. Query languages Relational algebra – mathematical base for understanding query languages SQL – very widely used Datalog – based on Prolog, very popular with theoreticians Views allow complex queries to be written simply

30 Relational Metadata problems

31 Fodors wunder ground AAA Expedia Data Integration: Planning a Beach Vacation BeachGood Weather Cheap Flight weather. com Orbitz

32 Data Integration System Architecture Local Database 1 Local Database N Mediated Schema Local Schema 1Local Schema N Orbitz Expedia Virtual database User Query “Airport”

33 Data Translation Data exists in two different schemas. You have data in one, and you want to put data into the other How are the schemas related to one another? How do you change the data from one to another?

34 Data Warehousing Data Warehouses store vast quantities of data for fast query processing, but only batch updating. Import schemas of data sources Identify overlapping attributes, etc. Build data cleaning scripts Build data transformation scripts Enable data lineage tracing

35 Schema Evolution and Data Migration Schemas change over time; data must change with it. How do we deal with schema changes? How can we make it easy for the data to migrate How do we handle applications built on the old schema that store in the new database?

36 Outline Relational databases Entity Relationship (ER) diagrams Object Oriented Databases (OODBs) XML Other data types Database internals (Briefly) An extremely brief introduction to category theory

37 Entity / Relationship Diagrams Entities Attributes Relationships between entities Product address buys

38 Keys in E/R Diagrams Every entity set must have a key Product namecategory price

39 address namesin Person buys makes employs Company Product namecategory stockprice name price

40 Multiplicity of E/R Relations one-one: many-one many-many abcdabcd abcdabcd abcdabcd

41 address namesin Person buys makes employs Company Product namecategory stockprice name price What does this say ?

42 Roles in Relationships Purchase What if we need an entity set twice in one relationship? Product Person Store salesperson buyer

43 Attributes on Relationships Purchase Product Person Store date

44 Product namecategory price isa Educational ProductSoftware Product Age Groupplatforms Subclasses in E/R Diagrams

45 Keys in E/R Diagrams address nameSIN Person Product namecategory price No formal way to specify multiple keys in E/R diagrams Underline:

46 From E/R Diagrams to Relational Schema Entity set  relation Relationship  relation

47 Entity Set to Relation Product namecategory price Product(name, category, price) name category price gizmo gadgets $19.99

48 Relationships to Relations makes Company Product namecategory Stock price name Makes(product-name, product-category, company-name, year) Product-name Product-Category Company-name Starting-year gizmo gadgets gizmoWorks 1963 Start Year price (watch out for attribute name conflicts)

49 Relationships to Relations makes Company Product namecategory Stock price name No need for Makes. Modify Product: name category price StartYear companyName gizmo gadgets gizmoWorks Start Year price

50 Multi-way Relationships to Relations Purchase Product Person Store name price sinname address Purchase(,, )

51 Summarizing ER diagrams Entities, relationships, and attributes Also has inheritance Used to design schemas, then relational derived from it

52 Metadata problems: Mapping ER to Relational Database design Map ER model to SQL schema Reverse engineer SQL schema to ER model

53 Metadata problems: Round Trip Engineering Design in ER Implement in relational Modify the relational schema How do we change the ER diagram?

54 View integration Define use-case scenario Identify views for each use-case Integrate views into a conceptual schema

55 CPSC 534a Background: Part 2 Rachel Pottinger January 18, 2005

56 Administrative notes Please sign up for papers if you haven’t already (if there’s enough time, we’ll do this at the end of class) Remember that the first reading responses are due 9pm Wednesday Mail me if you can’t access WebCT Remember the 1 st homework is due beginning of class Thursday General theory – trying to make sure you understand basics and have thought about it – not looking for one, true, answer. State any assumptions you make If you can’t figure out a detail on how to transform ER to relational based on class discussion, write an explanation as to what you did and why. Any other questions? Office hours?

57 Outline Relational databases Entity Relationship (ER) diagrams Object Oriented Databases (OODBs) XML Other data types Database internals (Briefly) An extremely brief introduction to category theory

58 Object-Oriented DBMS’s Started late 80’s Main idea: Toss the relational model ! Use the OO model – e.g. C++ classes Standards group: ODMG = Object Data Management Group. OQL = Object Query Language, tries to imitate SQL in an OO framework.

59 The OO Plan ODMG imagines OO-DBMS vendors implementing an OO language like C++ with extensions (OQL) that allow the programmer to transfer data between the database and “host language” seamlessly. A brief diversion: the impedance mismatch

60 OO Implementation Options Build a new database from scratch (O 2 ) Elegant extension of SQL Later adopted by ODMG in the OQL language Used to help build XML query languages Make a programming language persistent (ObjectStore) No query language Niche market ObjectStore is still around, renamed to Exelon, stores XML objects now

61 ODL ODL is used to define persistent classes, those whose objects may be stored permanently in the database. ODL classes look like Entity sets with binary relationships, plus methods. ODL class definitions are part of the extended, OO host language.

62 ODL – remind you of anything? interface Student extends Person (extent Students) { attribute string major; relationship Set takes inverse stds;} interface Person (extent People key sin) { attribute string sin; attribute string dept; attribute string name;} interface Course (extent Crs key cid) { attribute string cid; attribute string cname; relationship Person instructor; relationship Set stds inverse takes;}

63 Why did OO Fail? Why are relational databases so popular? Very simple abstraction; don’t have to think about programming when storing data. Very well optimized Relational db are very well entrenched – not enough advantages, and no good exit strategy… Metadata failure

64 Merging Relational and OODBs Object-oriented models support interesting data types – not just flat files. Maps, multimedia, etc. The relational model supports very-high- level queries. Object-relational databases are an attempt to get the best of both. All major commercial DBs today have OR versions – full spec in SQL99, but your mileage may vary.

65 Outline Relational databases Entity Relationship (ER) diagrams Object Oriented Databases (OODBs) XML Other data types Database internals (Briefly) An extremely brief introduction to category theory

66 XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (from document community - works great for them; from db perspective, very nasty). After the roots: a format for sharing data

67 Why XML is of Interest to Us XML is just syntax for data Note: we have no syntax for relational data But XML is not relational: semistructured This is exciting because: Can translate any data to XML Can ship XML over the Web (HTTP) Can input XML into any application Thus: data sharing and exchange on the Web

68 XML Data Sharing and Exchange application relational data Transform Integrate Warehouse XML DataWEB (HTTP) application legacy data object-relational Think of all the metadata problems!

69 From HTML to XML HTML describes the presentation

70 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999

71 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content

72 XML Document Mary Maple 345 Seattle John Thailand Mary Maple 345 Seattle John Thailand person elements name elements attributes

73 XML Terminology Elements enclosed within tags: … nested within other elements: … can be empty abbreviated as can have Attributes … XML document has as single ROOT element

74 Buzzwords What is XML? W3C data exchange format Hierarchical data model Self-describing Semi-structured

75 XML as a Tree !! Mary Maple 345 Seattle John Thailand Mary Maple 345 Seattle John Thailand data person Mary name address streetnocity Maple345 Seattle name address John Thai phone id o555 Element node Text node Attribute node Minor Detail: Order matters !!!

76 XML is self-describing Schema elements become part of the data In XML,, are part of the data, and are repeated many times Relational schema: persons(name,phone) defined separately for the data and is fixed Consequence: XML is much more flexible

77 Relational Data as XML John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 person name phone “John”3634“Sue”“Dick” persons XML:

78 XML is semi-structured Missing elements: Could represent in a table with nulls John 1234 Joe John 1234 Joe  no phone ! namephone John1234 Joe-

79 XML is semi-structured Repeated elements Impossible in tables: Mary Mary  two phones ! namephone Mary ???

80 XML is semi-structured Elements with different types in different objects Heterogeneous collections: can contain both s and s John Smith 1234 John Smith 1234  structured name !

81 Summarizing XML XML has first class elements and second class attributes XML is semi-structured XML is nested XML is a tree XML is a huge buzzword Will XML replace relational databases?

82 Outline Relational databases Entity Relationship (ER) diagrams Object Oriented Databases (OODBs) XML Other data types Database internals (Briefly) An extremely brief introduction to category theory

83 Other data formats Makefiles Forms Application code

84 Other Metadata Applications Message Mapping Map messages from one format to another Scientific data management Merge schemas from related experiments Manage transformations of experimental data Track evolution of schemas and transformations DB Application development Map SQL schema to default form Map business rule to SQL constraints and form validation code Manage dependencies between code and schemas and forms

85 Outline Relational databases Entity Relationship (ER) diagrams Object Oriented Databases (OODBs) XML Other data types Database internals (Briefly) An extremely brief introduction to category theory

86 How SQL Gets Executed: Query Execution Plans Select Pname, Price From Product, Company Where Manufacturer = Cname AND Price <= 200 AND Country = ‘Japan’ ProductCompany ⋈ Manufacturer = Cname σ Price < 200 ^ Country = ‘Japan’ π Pname, Price Query optimization also specifies the algorithms for each operator; then queries can be executed

87 Overview of Query Optimization Plan: Tree of ordered Relational Algebra operators and choice of algorithm for each operator Two main issues: For a given query, what plans are considered? Algorithm to search plan space for cheapest (estimated) plan. How is the cost of a plan estimated? Ideally: Want to find best plan. Practically: Avoid worst plans. Some tactics Do selections early Use materialized views

88 Query Execution Now that we have the plan, what do we do with it? How do deal with paging in data, etc. New research covers new paradigms where interleaved with optimization

89 Transactions Address two issues: Access by multiple users Remember the “client-server” architecture: one server with many clients Protection against crashes

90 Transactions Transaction = group of statements that must be executed atomically Transaction properties: ACID ATOMICITY = all or nothing CONSISTENCY = leave database in consistent state ISOLATION = as if it were the only transaction in the system DURABILITY = store on disk !

91 Transactions in SQL In “ad-hoc” SQL: Default: each statement = one transaction In “embedded” SQL: BEGIN TRANSACTION [SQL statements] COMMIT or ROLLBACK (=ABORT)

92 Transactions: Serializability Serializability = the technical term for isolation An execution is serial if it is completely before or completely after any other function’s execution An execution is serializable if it equivalent to one that is serial DBMS can offer serializability guarantees

93 Serializability Enforced with locks, like in Operating Systems ! But this is not enough: LOCK A [write A=1] UNLOCK A LOCK B [write B=2] UNLOCK B LOCK A [write A=1] UNLOCK A LOCK B [write B=2] UNLOCK B LOCK A [write A=3] UNLOCK A LOCK B [write B=4] UNLOCK B LOCK A [write A=3] UNLOCK A LOCK B [write B=4] UNLOCK B User 1 User 2 What is wrong ? time

94 Outline Relational databases Entity Relationship (ER) diagrams Object Oriented Databases (OODBs) XML Other data types Database internals (Briefly) An extremely brief introduction to category theory

95 Category Theory “Category theory is a mathematical theory that deals in an abstract way with mathematical structures and relationships between them. It is half-jokingly known as ‘generalized abstract nonsense’.” [wikipedia] There is a lot of scary category theory out there. You only need to know a few terms.

96 Category Theory Started in 1945 General mathematical theory of structures and systems of structures. Reveals how structures of different kinds are related to one another, as well as the universal components of a family of structures of a given kind. It is considered by many as being an alternative to set theory as a foundation for mathematics. It is very, very, very abstract

97 Category Theory Definitions C is a graph. There are two classes: the objects or nodes obj(C) and the morphisms or edges or arrows mor(C). Any morphism f  mor(C) has a source and target object f : a  b. For any composable pair of morphisms f : a  b and g : b  c, there is a composition morphism (g f) : a  c. A functor translates objects and morphisms from one category to another Diagrams commute if one can follow any path through the diagram and obtain the same result by composition f g ab c g f ab cd

98 Why you need a bit of category theory Lots of people like to use the term “morphism” It’s motivation behind a number of views – understanding this can make reading papers easier If you’re theoretically minded, it can give you a good way to think about the problem

99 Overall background recap There are many different data models. We covered: Relational Databases Entity-Relationship Diagrams Object Oriented Databases XML Changing around schemas within data models creates metadata problems. So does changing schemas between data models Databases have some (largely hidden) internal processes; some of these will be related to in other papers we’ve read Theory can be handy to ground your reading.

100 Now what? Time to read papers Prepare paper responses – it’ll help you focus on the paper, and allow for the discussion leader to prepare better discussion You all have different backgrounds, interests, and insights. Bring them into class!


Download ppt "CPSC 534A – Background Rachel Pottinger January 13 and 18, 2005."

Similar presentations


Ads by Google