DATABASES AND DATA WAREHOUSES A Gold Mine of Information
Published byModified over 4 years ago
Presentation on theme: "DATABASES AND DATA WAREHOUSES A Gold Mine of Information"— Presentation transcript:
1 DATABASES AND DATA WAREHOUSES A Gold Mine of Information CHAPTER 4DATABASES AND DATA WAREHOUSESA Gold Mine of Information
2 Today, Organizations Need... 4-2IntroductionToday, Organizations Need...Information to compete effectivelyInformation just to stay alive in the information ageInformation organized in such a way that you can easily and quickly get to itInformation-processing tools that help you work with information
3 YOUR FOCUS IN THIS CHAPTER 4-3IntroductionYOUR FOCUS IN THIS CHAPTERThe Difference Between Logical and Physical Views of InformationDatabases and Database Management SystemsHow You Can Develop Database ApplicationsData Warehouses and Data Mining Tools
4 THREE THINGS ORGANIZATIONS DO WITH INFORMATION 4-4Information RevisitedTHREE THINGS ORGANIZATIONS DO WITH INFORMATION1.Process information in the form of transactions2.Use information to make a decision3.Manage information while it’s used
5 PROCESSING INFORMATION IN THE FORM OF TRANSACTIONS 4-5Information RevisitedPROCESSING INFORMATION IN THE FORM OF TRANSACTIONSSuch as payroll processing, order processing, and handling your registration requests for classes.This is called ONLINE TRANSACTION PROCESSING (OLTP) - the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information.Operational databases support OLTP.
6 USING INFORMATION TO MAKE A DECISION 4-6Information RevisitedUSING INFORMATION TO MAKE A DECISIONFor answering such questions as, “How many senior-level marketing majors have not taken statistics?”This is called ONLINE ANALYTICAL PROCESSING (OLAP) - the manipulation of information to support decision making.Data warehouses support OLAP.
7 MANAGING INFORMATION WHILE IT’S USED 4-7Information RevisitedMANAGING INFORMATION WHILE IT’S USEDDetermining who can view or use informationSpecifying how to back up informationIdentifying what storage technologies to useMost importantly, managing information includes organizing it so that people can logically use it without having to know anything about its physical structure. The difference between logical and physical is key.
8 4-8Information RevisitedIn managing information, physical deals with the structure of information as it resides on various storage media.Logical deals with how knowledge workers view their information needs, and includes such terms as:CHARACTER - our smallest unit of information.FIELD - group of related characters.RECORD - group of related fields.FILE - group of related records.DATABASE - group of logically associated files.DATA WAREHOUSE - information from many databases.
9 4-9DatabasesDATABASEa collection of information that you organize and access according to the logical structure of that information.A database is actually composed of two parts:1. the information itselfthe files that are logically associated2. the logical structure of the informationcalled the data dictionary
10 A Database Is a Collection of Information 4-10DatabasesA Database Is a Collection of InformationMost databases contain two or more files with related information.The Inventory database (Figure 4.4, page 125) contains two files - Part and Facility.These two files are logically related because parts are stored in facilities and because you would use both of these files to manage your inventory.
11 A Database Contains a Logical Structure 4-11DatabasesA Database Contains a Logical StructureYou organize and access a database by its logical structure, not its physical position.DATA DICTIONARY - contains the logical structure of information in a database.The data dictionary contains the logical properties that describe information in a database.See Figure 4.5 (page 126) for the data dictionary of the Percentage Markup field in the Inventory database.
12 A Database Has Logical Ties Among the Information 4-12DatabasesA Database Has Logical Ties Among the InformationA PRIMARY KEY is a field in a database file that uniquely describes each record.A FOREIGN KEY is a primary key of one file that also appears in another file. So, foreign keys specify how files are logically related.For example, the Part and Facility files are logically related. So, in Figure 4.4 you can see that Facility Number (the primary key for the Facility file) exists in the Part file (where it’s a foreign key).
13 A Database Contains Built-in Integrity Constraints 4-13DatabasesA Database Contains Built-in Integrity ConstraintsAn INTEGRITY CONSTRAINT is a rule that helps assure the quality of the information in a database.A registration database at your school includes integrity constraints concerning prerequisites for certain classes.Our Inventory database includes an integrity constraint that says a part in the Part file cannot be assigned to a facility that does not exist in the Facility file.
14 DATABASE MANAGEMENT SYSTEM (DBMS) 4-14Database Management SystemsDATABASE MANAGEMENT SYSTEM (DBMS)the software you use to specify the logical organization for a database and access it.A DBMS contains 5 software components:1. DBMS engine2. Data definition subsystem3. Data manipulation subsystem4. Application generation subsystem5. Data administration subsystem
15 4-15DBMSsDBMS ENGINEaccepts logical requests from the various other DBMS subsystems, converts them to their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device.Recall that:PHYSICAL VIEW deals with how information is physically arranged, stored, and accessed on some type of secondary storage device.LOGICAL VIEW focuses on how you need to arrange and access information to meet your particular business needs.
16 DATA DEFINITION SUBSYSTEM 4-16DBMSsDATA DEFINITION SUBSYSTEMhelps you create and maintain the data dictionary and define the structure of the files in a database.You use this subsystem to define the information logical structure when you first create a database.Once you’ve created a database, you use this subsystem to define new fields, delete fields, or change field properties.Figure 4.5 (page 126) contains this subsystem screen for the Part file.
17 DATA MANIPULATION SUBSYSTEM 4-17DBMSsDATA MANIPULATION SUBSYSTEMhelps you add, change, and delete information in a database and mine it for valuable information.This subsystem is most often the primary interface between you as a user and the information contained in a database.Tools in this subsystem include views, report generators, query-by-example tools, and structured query language.
18 DATA MANIPULATION TOOLS 4-18DBMSsDATA MANIPULATION TOOLSVIEW - allows you to see the content of a database file, make whatever changes you want, perform simple sorting, and query to find the location of specific information. See Figure 4.7 page 129.REPORT GENERATOR - helps you quickly define formats of reports and what information you want to see in a report. See Figures 4.8 and 4.9 page 130.
19 DATA MANIPULATION TOOLS 4-19DBMSsDATA MANIPULATION TOOLSQUERY-BY-EXAMPLE (QBE) TOOL - helps you graphically design the answer to a question. Figure 4.10 (page 130) shows the QBE for displaying the names and phone numbers of facility managers in charge of parts that cost more than $10.STRUCTURED QUERY LANGUAGE (SQL) - a standardized fourth-generation language found in most database environments. SQL is the same as QBE, except that you perform a query by creating a statement instead of pointing, clicking, dragging.
20 APPLICATION GENERATION SUBSYSTEM 4-20DBMSsAPPLICATION GENERATION SUBSYSTEMcontains facilities to help you develop transaction-intensive applications. This subsystem includes:Tools for creating data entry screens (See Figure 4.12 page 131 for an example)Programming languages specific to a particular DBMSInterfaces to commonly used programming languages that are independent of any DBMS.
21 DATA ADMINISTRATION SUBSYSTEM 4-21DBMSsDATA ADMINISTRATION SUBSYSTEMhelps you manage the overall database environment by providing facilities for:Backup and recoverySecurity managementQuery optimizationReorganizationConcurrency controlChange management
22 THE RELATIONAL DATABASE MODEL 4-22Database ModelsTHE RELATIONAL DATABASE MODELa database model that uses a series of two-dimensional tables or files to store information.This is the most popular model.Each table is called a RELATION.A relation contains information about a particular ENTITY CLASS (a concept - people, places, or things - about which you wish to store information and that you can identify with a unique key).
23 The entity classes are Customer, Video, Video Rental, and Distributor. 4-23Database ModelsFigure 4.14 (page 136) shows a relational database for a video rental store.The entity classes are Customer, Video, Video Rental, and Distributor.Notice how these tables are related to each other through the use of foreign keys.In the Video Rental relation, you’ll find a primary key that uses more than one one field to create a unique description. This is called a COMPOSITE PRIMARY KEY.A primary key that uses only one field is called an ATOMIC PRIMARY KEY.
24 THE OBJECT-ORIENTED (O-O) DATABASE MODEL 4-24Database ModelsTHE OBJECT-ORIENTED (O-O) DATABASE MODELa database model that brings together, stores, and allows you to work with both information and procedures that act on the information.An OBJECT-ORIENTED DATABASE MANAGEMENT SYSTEM (O-O DBMS) is the DBMS software that allows you to develop and work with an O-O database.
25 See Appendix C for more on objects. 4-25Database ModelsThis model takes advantage of the concept of an OBJECT - a software module containing information that describes an entity class along with a list of procedures that can act on the information describing the entity class.Figure 4.15 (page 138) shows the same video rental store using the O-O database model.Notice that the objects (entity classes) - which include Customer, Video Rental, Video, and Distributor - contain both information and procedures for working with that information.See Appendix C for more on objects.
26 DEVELOPING YOUR OWN DATABASE 4-26Developing DatabasesDEVELOPING YOUR OWN DATABASEBeing able to develop your own database is a part of knowledge worker computing.Building a database for your personal needs includes the following 4 steps:1. Defining entity classes and primary keys2. Defining relationships among entity classes3. Defining information (fields) for each relation4. Using a data definition language to create the databaseFollow along as we build the database to support the report in Figure 4.16 on page 140.
27 #1 - DEFINING ENTITY CLASSES AND PRIMARY KEYS 4-27Developing Databases#1 - DEFINING ENTITY CLASSES AND PRIMARY KEYSFrom the report in Figure 4.16, you can identify the entity classes as Employee, Department, and Job.Now, for each entity class, you must define a primary key that provides a unique description. These include:Employee entity class - Emp ID (e.g., 2345 for Smith)Department entity class - Dept (e.g., 15)Job entity class - Job (e.g., 14 for Acct)
28 #2 - DEFINING RELATIONSHIPS AMONG ENTITY CLASSES 4-28Developing Databases#2 - DEFINING RELATIONSHIPS AMONG ENTITY CLASSESFor this step, use an ENTITY-RELATIONSHIP (E-R) DIAGRAM, a graphical method of representing entity classes and their relationships.See Figure 4.17 (page 140) for the initial E-R diagram of our database and a listing of E-R diagram symbols.
29 An Employee must be assigned to a Department. 4-29Developing DatabasesEMPLOYEEM:1DEPARTMENTAn Employee must be assigned to a Department.An Employee cannot be assigned to more than one Department.A Department may have many Employees assigned to it.A Department is not required to have any Employees assigned to it.
30 Normalization includes the following 3 steps: 4-30Developing DatabasesAfter building the initial E-R diagram, you must follow the process of normalization.NORMALIZATION is a process of assuring that a relational database structure can be implemented as a series of two-dimensional tables.Normalization includes the following 3 steps:1.Eliminate repeating groups or M:M relationships2.Assure that each field in a relation depends only on the primary key of that relation3.Remove all derived fields from the relations.
31 There is an M:M between Employee and Job. 4-31Developing DatabasesThe first rule of normalization states that no M:M relationships can exist.There is an M:M between Employee and Job.You eliminate this by creating an INTERSECTION RELATION - a relation you create to eliminate a repeating group.An intersection relation will have a composite primary key that consists of the primary key fields from the two intersecting relations.In Figure 4.18 (page 142), we created an intersection relation called Employee-Job to eliminate the M:M relationship.
32 #3 - DEFINING INFORMATION (FIELDS) FOR EACH RELATION 4-32Developing Databases#3 - DEFINING INFORMATION (FIELDS) FOR EACH RELATIONIn this step, you follow rules #2 and #3 of normalization.Your goal here is two-fold:1.Make sure that the information in each relation is indeed in the correct relation2.Make sure that the information cannot be derived from other information.
33 To determine if information is in the correct relation, ask: 4-33Developing DatabasesTo determine if information is in the correct relation, ask:“Does this piece of information depend only on the primary key for this relation?”If the answer is yes, the information is in the correct relation.In the Employee relation (Figure 4.20 page 144), we currently store Dept Sup. Does Dept Sup depend on Emp ID?The answer is no - Dept Sup depends on Dept, so it should be in the Department relation.
34 For example, # Emp is a field in the Department relation. 4-34Developing DatabasesDerived information - information that can be mathematically determined from other information - should not be stored in your database.For example, # Emp is a field in the Department relation.However, we can simply count the number of occurrences of each Dept in the Employee relation and determine the number of employees.So, we remove # Emp from the database.
35 #4 - USING A DATA DEFINITION LANGUAGE TO CREATE THE DATABASE 4-35Developing Databases#4 - USING A DATA DEFINITION LANGUAGE TO CREATE THE DATABASEThe final step is to actually create the relations you identified in steps 1-3.You do this with a data definition language.This step includes:Developing a data dictionaryDefining the various relationsDefining primary keys and relationships
36 4-36Data WarehousesDATA WAREHOUSEa logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks. Data warehouses...are a logical extension of databasessupport OLAPare among the newest and hottest buzz words and concepts in the IT field.
37 DATA WAREHOUSE FEATURES 4-37Data WarehousesDATA WAREHOUSE FEATURESData warehouses combine information from different databasesMaking them a true repository of all an organization’s informationData warehouses are multi-dimensionalAs opposed to 2 dimensions in the relational modelOften called hypercubes (See Figure 4.23 page 148)Data warehouses support decision makingWhile databases support OLTP, data warehouses support OLAP
38 the software tools you use to query information in a data warehouse. 4-38Data WarehousesDATA MINING TOOLSthe software tools you use to query information in a data warehouse.QUERY-AND-REPORTING TOOLS - QBE tools, SQL, and report generators.INTELLIGENT AGENTS - various artificial intelligence tools that form the basis for “information discovery” in OLAP.MULTIDIMENSIONAL ANALYSIS (MDA) TOOLS - slice-and-dice techniques that allow you to view multidimensional information from different perspectives.
39 IMPORTANT CONSIDERATIONS IN USING A DATA WAREHOUSE 4-39Data WarehousesIMPORTANT CONSIDERATIONS IN USING A DATA WAREHOUSEDo you need a data warehouse?Do you already have a data warehouse?Who will the users be?How up-to-date must the information be?What data mining tools do you need?
40 MANAGING THE INFORMATION RESOURCE 4-40Managing InformationMANAGING THE INFORMATION RESOURCEHow will changes in technology affect organizing and managing information?What types of database models and databases are most appropriate?Who should oversee the organization’s information?
41 OVERSEEING YOUR ORGANIZATION’S INFORMATION 4-41Managing InformationOVERSEEING YOUR ORGANIZATION’S INFORMATIONCHIEF INFORMATION OFFICER (CIO) is the IT manager who directs all IT systems and personnel while communicating directly with the highest levels of the organization.DATA ADMINISTRATION plans for, oversees the development of, and monitors the information resource.DATABASE ADMINISTRATION is responsible for the more technical and operational aspects of managing information in databases.
42 MANAGING THE INFORMATION RESOURCE 4-42Managing InformationMANAGING THE INFORMATION RESOURCEIs information ownership a consideration?What are the ethics involved in organizing and managing information?How should databases and database applications be developed and maintained?
43 TO SUMMARIZE How we view information: 4-43TO SUMMARIZEHow we view information:The physical view of information deals with how information is physically arranged, stored, and accessed on some type of secondary storage device.The logical view of information focuses on how you need to arrange and access information to meet your particular business needs.A database is a collection of information that you organize and access according to the logical structure of that information.The data dictionary contains the logical structure of information in a database.
44 4-44TO SUMMARIZEA database management system is the software you use to specify the logical organization for a database and access it.Popular database models include the relational model and the object-oriented model.The four steps of developing a personal database application include:1. Define entity classes and primary keys2. Define relationships among entity classes3. Define information (fields) for each relation4. Use a data definition language to create the database
45 4-45TO SUMMARIZEData warehouses are a logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks.Data mining tools - the software tools you use to query information in a data warehouse - include query-and-reporting tools, intelligent agents, and multidimensional analysis (MDA) tools.