Presentation is loading. Please wait.

Presentation is loading. Please wait.

2003.10.02 - SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003

Similar presentations


Presentation on theme: "2003.10.02 - SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003"— Presentation transcript:

1 2003.10.02 - SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003 http://www.sims.berkeley.edu/academics/courses/is202/f03/ SIMS 202: Information Organization and Retrieval Lecture 12: Database Design

2 2003.10.02 - SLIDE 2IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

3 2003.10.02 - SLIDE 3IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

4 2003.10.02 - SLIDE 4IS 202 – FALL 2003 Models (1) Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

5 2003.10.02 - SLIDE 5IS 202 – FALL 2003 Database System Life Cycle Growth, Change, & Maintenance 6 Operations 5 Integration 4 Design 1 Conversion 3 Physical Creation 2

6 2003.10.02 - SLIDE 6IS 202 – FALL 2003 Another View of the Life Cycle Operations 5 Conversion 3 Physical Creation 2 Growth, Change 6 Integration 4 Design 1

7 2003.10.02 - SLIDE 7IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

8 2003.10.02 - SLIDE 8IS 202 – FALL 2003 Entity An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information –Persons (e.g.: customers in a business, employees, authors) –Things (e.g.: purchase orders, meetings, parts, companies) Employee

9 2003.10.02 - SLIDE 9IS 202 – FALL 2003 Attributes Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (This is the Metadata for the entities) Employee Last Middle First Name SSN Age Birthdate Projects

10 2003.10.02 - SLIDE 10IS 202 – FALL 2003 Relationships Relationships are the associations between entities They can involve one or more entities and belong to particular relationship types –One to One –One to Many –Many to Many

11 2003.10.02 - SLIDE 11IS 202 – FALL 2003 Relationships Class Attends Student Part Supplies project parts Supplier Project

12 2003.10.02 - SLIDE 12IS 202 – FALL 2003 Types of Relationships Concerned only with cardinality of relationship Truck Assigned EmployeeProject Assigned EmployeeProject Assigned Employee 11 n n 1 m Chen ER notation

13 2003.10.02 - SLIDE 13IS 202 – FALL 2003 More Complex Relationships Project Evaluation Employee Manager 1/n/n 1/1/1 n/n/1 Project Assigned Employee 4(2-10) 1 SSNProjectDate Manages Employee Manages Is Managed By 1 n

14 2003.10.02 - SLIDE 14IS 202 – FALL 2003 Weak Entities Owe existence entirely to another entity Order-line Contains Order Invoice # Part# Rep# QuantityInvoice#

15 2003.10.02 - SLIDE 15IS 202 – FALL 2003 Supertype and Subtype Entities Clerk Is one of Sales-rep Invoice Other Employee Sold Manages

16 2003.10.02 - SLIDE 16IS 202 – FALL 2003 Many to Many Relationships Employee Project Is Assigned Project Assignment Assigned SSN Proj# SSN Proj# Hours

17 2003.10.02 - SLIDE 17IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

18 2003.10.02 - SLIDE 18IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

19 2003.10.02 - SLIDE 19IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

20 2003.10.02 - SLIDE 20IS 202 – FALL 2003 Requirements Analysis Conceptual Requirements –Systems Analysis Process Examine all of the information sources used in existing applications Identify the characteristics of each data element –Numeric –Text –Date/time –Etc. Examine the tasks carried out using the information Examine results or reports created using the information

21 2003.10.02 - SLIDE 21IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

22 2003.10.02 - SLIDE 22IS 202 – FALL 2003 Conceptual Design Conceptual Model –Merge the collective needs of all applications –Determine what Entities are being used Some object about which information is to maintained –What are the Attributes of those entities? Properties or characteristics of the entity What attributes uniquely identify the entity –What are the Relationships between entities How the entities interact with each other?

23 2003.10.02 - SLIDE 23IS 202 – FALL 2003 Developing a Conceptual Model Overall view of the database that integrates all the needed information discovered during the requirements analysis Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details Can also be represented using other modeling tools (such as UML)

24 2003.10.02 - SLIDE 24IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

25 2003.10.02 - SLIDE 25IS 202 – FALL 2003 Logical Design Logical Model –How is each entity and relationship represented in the Data Model of the DBMS Hierarchic? Network? Relational? Object-Oriented?

26 2003.10.02 - SLIDE 26IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

27 2003.10.02 - SLIDE 27IS 202 – FALL 2003 Physical Design Internal Model –Choices of index file structure –Choices of data storage formats –Choices of disk layout

28 2003.10.02 - SLIDE 28IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

29 2003.10.02 - SLIDE 29IS 202 – FALL 2003 Database Application Design External Model –User views of the integrated database –Making the old (or updated) applications work with the new database design

30 2003.10.02 - SLIDE 30IS 202 – FALL 2003 Terms and Concepts Key –An attribute or set of attributes used to identify or locate records in a file Primary Key –An attribute or set of attributes that uniquely identifies each record in a file Candidate Key –An attribute or set of attributes that might be used as a primary key

31 2003.10.02 - SLIDE 31IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

32 2003.10.02 - SLIDE 32IS 202 – FALL 2003 Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data Normalization is a multi-step process beginning with an “unnormalized” relation

33 2003.10.02 - SLIDE 33IS 202 – FALL 2003 Normal Forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

34 2003.10.02 - SLIDE 34IS 202 – FALL 2003 Normalization Boyce- Codd and Higher Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency

35 2003.10.02 - SLIDE 35IS 202 – FALL 2003 Unnormalized Relations First step in normalization is to convert the data into a two-dimensional table In unnormalized relations data can repeat within a column (The following is a highly contrived example that actually bears only a slight resemblance to the current implementation of the Phone/Photo project database)

36 2003.10.02 - SLIDE 36IS 202 – FALL 2003 Unnormalized Relations

37 2003.10.02 - SLIDE 37IS 202 – FALL 2003 First Normal Form To move to First Normal Form a relation must contain only atomic values at each row and column –No repeating groups –A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation

38 2003.10.02 - SLIDE 38IS 202 – FALL 2003 First Normal Form

39 2003.10.02 - SLIDE 39IS 202 – FALL 2003 1NF Storage Anomalies Insertion: A new person has not yet taken a picture -- hence no Picture # -- Since Picture # is part of the key we can’t insert Insertion: If People is are known and likely to be photographed, but haven’t been yet -- there is be no way to include that person in the database Update: If a Person changes status (e.g. Mary Jones becomes a Student) we have to change multiple rows in the database Deletion (type 1): Deleting a Person record may also delete all info about People in the pictures Deletion (type 2): When there are functional dependencies (like Object and Object_features) changing one item eliminates other information

40 2003.10.02 - SLIDE 40IS 202 – FALL 2003 Second Normal Form A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key –That is, every nonkey attribute needs the full primary key for unique identification

41 2003.10.02 - SLIDE 41IS 202 – FALL 2003 Second Normal Form Person Table

42 2003.10.02 - SLIDE 42IS 202 – FALL 2003 Second Normal Form People Table

43 2003.10.02 - SLIDE 43IS 202 – FALL 2003 Second Normal Form Picture Table

44 2003.10.02 - SLIDE 44IS 202 – FALL 2003 1NF Storage Anomalies Removed Insertion: Can now enter new Persons who haven’t yet taken pictures Insertion: Can now enter People who haven’t been photographed Deletion (type 1): If Charles Brown withdraws his photos the corresponding tuples from Person and Picture tables can be deleted without losing information on David Rosen Update: If John White takes a third picture, and has changed status (e.g., graduate), we only need to change the Person table in one place

45 2003.10.02 - SLIDE 45IS 202 – FALL 2003 2NF Storage Anomalies Insertion: Cannot enter the fact that a particular object has a particular feature unless it is associated with a particular picture Deletion: If John White describes some other object that Beth Little has while shopping, we lose the fact that the bookbag is blue Update: If the features of an object change change we have to update multiple occurrences of object features

46 2003.10.02 - SLIDE 46IS 202 – FALL 2003 Third Normal Form A relation is said to be in Third Normal Form if there are no transitive functional dependencies between nonkey attributes –When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency The Obect_Feature column in the Picture table is determined by the Object –Object_Feature is transitively functionally dependent on Object so Picture is not 3NF

47 2003.10.02 - SLIDE 47IS 202 – FALL 2003 Third Normal Form Picture Table

48 2003.10.02 - SLIDE 48IS 202 – FALL 2003 Third Normal Form Object Table

49 2003.10.02 - SLIDE 49IS 202 – FALL 2003 2NF Storage Anomalies Removed Insertion: We can now enter the fact that an object has a particular feature Deletion: If John White describes some other object that Beth Little has while shopping, we don’t lose the fact that the bookbag is blue Update: The features for each object appear only once

50 2003.10.02 - SLIDE 50IS 202 – FALL 2003 Boyce-Codd Normal Form Most 3NF relations are also BCNF relations A 3NF relation is NOT in BCNF if: –Candidate keys in the relation are composite keys (they are not single attributes) –There is more than one candidate key in the relation, and –The keys are not disjoint, that is, some attributes in the keys are common

51 2003.10.02 - SLIDE 51IS 202 – FALL 2003 Most 3NF Relations Are Also BCNF – Is This One?

52 2003.10.02 - SLIDE 52IS 202 – FALL 2003 BCNF Relations

53 2003.10.02 - SLIDE 53IS 202 – FALL 2003 Additional Issues Why separate Person and People? –They are really all People/Persons in different roles Shouldn’t a picture have a unique ID regardless of Who is in it? Can’t we have multiple people in the same picture, multiple objects, etc.? Can’t objects have multiple characteristics?

54 2003.10.02 - SLIDE 54IS 202 – FALL 2003 BCNF Relations

55 2003.10.02 - SLIDE 55IS 202 – FALL 2003 BCNF Added Capabilities Can now have a picture with no (identified) people in it Can have multiple objects, activities, and people associated with each picture

56 2003.10.02 - SLIDE 56IS 202 – FALL 2003 Fourth Normal Form Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial Eliminate non-trivial multivalued dependencies by projecting into simpler tables

57 2003.10.02 - SLIDE 57IS 202 – FALL 2003 Fifth Normal Form A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation

58 2003.10.02 - SLIDE 58IS 202 – FALL 2003 Fifth Normal Form Relations People Table

59 2003.10.02 - SLIDE 59IS 202 – FALL 2003 Normalizing to Death Normalization splits database information across multiple tables To retrieve complete information from a normalized database, the JOIN operation must be used JOIN tends to be expensive in terms of processing time, and very large joins are very expensive

60 2003.10.02 - SLIDE 60IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

61 2003.10.02 - SLIDE 61IS 202 – FALL 2003 Questions: Brooke Maury Discussion Questions on Hoffer & McFadden: If the goal of the relational database model is to encode a ‘conceptual’ design into a logical design, is it possible that improved technology and the development of new modeling techniques will supplant the RDBMS? Specifically, what impact will XML and the development of document engineering have on organizing information in multiple normalized tables? Conversely, what does the relational model have that would be lost if a conceptual design was encoded in another model?

62 2003.10.02 - SLIDE 62IS 202 – FALL 2003 Questions: Brooke Maury The drive to develop the RDBM was in part motivated by a need to minimize the space required and improve the performance of database systems by removing redundancies. What impact will very inexpensive data storage and computing power have on the relational database model and the third normal form especially?

63 2003.10.02 - SLIDE 63IS 202 – FALL 2003 Questions: Shane Ahern Discussion Questions for "Logical Database Design and the Relational Model" Is the normalization process described really necessary? When I design a database schema, I find that by thinking of tables in terms of they entities they represent (employees, sales, events), I avoid most of the problems of normalization that the process seeks to address (i.e. salesperson and region in Sales table, salesperson is clearly a distinct entity from sales). If the formal process described in the article is not followed, are there potential pitfalls that might lead to problems with your database schema?

64 2003.10.02 - SLIDE 64IS 202 – FALL 2003 Questions: Shane Ahern The article points out that "the relational model does not yet directly support supertype/subtype relationships." Once the tables in a relational database have been decomposed to third normal form, the database is efficient from systems point-of-view, but the tables no longer represent a representation of the data that is intuitive to humans. The object-oriented model more accurately mirrors the way we think about the concepts that we wish to store in databases. So perhaps object-oriented database systems are worth considering. What about XML databases?

65 2003.10.02 - SLIDE 65IS 202 – FALL 2003 Questions: Arthur Law The three models that we have been presented with, Entity Relationship Model, NIAM Model, and Object Oriented Model all enforce a specific thought process in the organization and relationship between items in a database. With all of our recent discussion of computers understanding natural language are these methods now out of date with how we should be organizing information? Should we use artificial intelligence or learning algorithms to statistically determine the relationship between entities or is there still value in using these models?

66 2003.10.02 - SLIDE 66IS 202 – FALL 2003 Questions: Arthur Law Each model is approximately one decade apart in development and a quick Google search shows that companies are using databases with one of the three models. However, as new models arise there doesn't seem too much interest in migrating from one data model to another. Which makes sense given that an organization using a given model probably finds that it works. Now with the proliferation of XML, we see more information being shared between organizations, so are we fated for an expensive and lengthy translation process between databases? Or should all DB administrators be responsible for upgrading to the latest model?

67 2003.10.02 - SLIDE 67IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions Next Time/Readings

68 2003.10.02 - SLIDE 68IS 202 – FALL 2003 Next Time Guest Lecture – Bob Glushko on XML and “Document Engineering” Readings on Class website No assigned discussion questions (but bring your questions on the readings)


Download ppt "2003.10.02 - SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003"

Similar presentations


Ads by Google