Presentation on theme: "DATABASES AND DATA WAREHOUSES Searching for Revenue - Google"— Presentation transcript:
1DATABASES AND DATA WAREHOUSES Searching for Revenue - Google CHAPTER 6DATABASES AND DATA WAREHOUSESOpening CaseSearching for Revenue - GoogleUNIT TWO OPENING CASE – Additional Case InformationIt Takes A Village to Write an EncyclopediaThis case focuses on the invention of wiki technology and the Wikipedia encyclopedia. Start the class off by taking a brief tour of Wikipedia so students can see the online edits. Wikipedia is located at Ask your students if they know which topic area has the largest number of daily changes? One of the hottest areas in Wikipedia is the Start Trek entries, which are changed more than any other topic area in the entire encyclopedia.Wiki technology is taking off and people are finding new uses for the technology daily. Wiki is being used for collaboration among many businesses. Wiki is being used in education in a number of ways to support learning:A teacher could post some key revision words for students to expand into definitions / pagesStudents could work in groups on collaborative documents such as a group reportCourse notes could be refined over the duration of the course by both students and teachersStudents could research new topics and contribute their findingsA wiki could be used as a portfolio showing development of a projectTeacher can start a writing prompt and have students add parts to create a comprehensive class writing activity.A teacher could start a story and students could create links off it which would allow the story to follow different, interactive paths.States and school districts can develop and edit curricula by allowing teachers to add in activities and assessmentsA wiki would be a great tool for collaboratively constructing answers to exam questions!A great tool for a team of students involved in project workAnnotating each other's work
2Chapter Six Overview SECTION 6.1 – DATABASE FUNDAMENTALS Understanding InformationDatabase FundamentalsDatabase AdvantagesRelational Database FundamentalsDatabase Management SystemsIntegrating Data Among Multiple DatabasesSECTION 6.2 – DATA WARAEHOUSE FUNDAMENTALSAccessing Organizational InformationHistory of Data WarehousingData Warehouse FundamentalsBusiness IntelligenceData MiningChapter 6 introduces:DataInformation qualityDatabasesData miningData warehouses in detail and highlights why and how information adds value to an organization
3DATABASE FUNDAMENTALS SECTION 6.1DATABASE FUNDAMENTALSCLASSROOM OPENERGREAT BUSINESS DECISIONS – Julius Reuter Uses Carrier Pigeons to Transfer InformationIn 1850, the idea that sending and receiving information could add business value was born. Julius Reuter began a business that bridged the gap between Belgium and Germany. Reuter built one of the first information management companies built on the premise that customers would be prepared to pay for information that was timely and accurate.Reuter used carrier pigeons to forward stock market and commodity prices from Brussels to Germany. Customers quickly realized that with the early receipt of vital information they could make fortunes. Those who had money at stake in the stock market were prepared to pay handsomely for early information from a reputable source, even if it was a pigeon. Eventually, Reuter’s business grew from 45 pigeons to over 200 pigeons.Eventually the telegraph bridged the gap between Brussels to Germany, and Reuter’s brilliantly conceived temporary monopoly was closed.
4LEARNING OUTCOMESList, describe, and provide an example of each of the five characteristics of high quality informationDefine the relationship between a database and a database management systemDescribe the advantages an organization can gain by using a database.6.1. List, describe, and provide an example of each of the five characteristics of high quality information.Accuracy determines if all values are correct. Example – is the name spelled correctly?Completeness determines if any values are missing. Example - is the address complete?Consistency ensures that aggregate or summary information is in agreement with detailed information. Example – do totals equal the true total of the individual fields?Uniqueness ensures that each transaction, entity, and event is represented only once in the information. Example – are there any duplicate customers?Timeliness determines if the information is current with respect to the business requirement. Example – is the information updated weekly?6.2. Define the relationship between a database and a database management system.A database management system manages the database. The DBMS determines how information is entered, accessed, displayed, and the rules surrounding the fundamental operation of the database.6.3. Describe the advantages an organization can gain by using a database.Database advantages from a business perspective includeIncreased flexibilityIncreased scalability and performanceReduced information redundancyIncreased information integrity (quality)Increased information security
5LEARNING OUTCOMESDefine the fundamental concepts of the relational database modelDescribe the role and purpose of a database management system and list the four components of a database management systemDescribe the two primary methods for integrating information across multiple databases6.4. Define the fundamental concepts of the relational database model.The relational database model stores information in the form of logically related two-dimensional tables. Entities, entity classes, attributes, primary keys, and foreign keys are all fundamental concepts included in the relational database model.6.5. Describe the role and purpose of a database management system and list the four components of a database management system.A database management system (DBMS) is software through which users and application programs interact with a database. The user sends requests to the DBMS and the DBMS performs the actual manipulation of the information in the database. There are two primary ways that users can interact with a DBMS, directly and indirectly.The four components in a DBMS include:Data definition component – helps create and maintain the data dictionary and the structure of the databaseData manipulation component – allows users to create, read, update, and delete information in a databaseApplication generation component – includes tools for creating visually appealing and easy-to-use applicationsData administration component – provides tools for managing the overall database environment by providing faculties for backup, recovery, security, and performance6.6 Describe the two primary methods for integrating information across multiple databases.Forward integration – takes information entered into a given system and sends it automatically to all downstream systems and processes.Backward integration – takes information entered into a given system and sends it automatically to all upstream systems and processes.
6UNDERSTANDING INFORMATION Information is everywhere in an organizationEmployees must be able to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisionsSuccessfully collecting, compiling, sorting, and analyzing information can provide tremendous insight into how an organization is performingGranularity refers to the extent of detail within the information (fine and detailed or “coarse” and abstract information)Have you ever had to correlate two different formats, levels, or granularities of information?How did you correlate the information?Taking a hard look at organizational information can yield exciting and unexpected results such as potential new markets, new ways of reaching customers, and even new ways of doing business
7UNDERSTANDING INFORMATION Information granularity – refers to the extent of detail within the information (fine and detailed or coarse and abstract)LevelsFormatsGranularitiesThis is a good place to discuss the Samsung Electronics and Staples examples from the textStudents should understand that information varies and different levels, formats, and granularities of information can be found throughout an organizationCLASSROOM EXERCISEOrganizing InformationBreak your students into groups and assign each group a different information type from Figure 6.2Ask the students to find examples of the different kinds of information they might encounter in an organization for their information typeFor example, information formats for a spreadsheet might include a profit and loss statement or a market analysisAsk your students to determine potential issues that might arise from having different types of informationAsk your students what happens if the information does not correlateFor example, the customer letters sent out do not match the customers and customer addresses in the databaseFor example, the total on the customer’s bill does not add up to the individual line items
8Information QualityBusiness decisions are only as good as the quality of the information used to make the decisionsCharacteristics of high quality information include:AccuracyCompletenessConsistencyUniquenessTimelinessDo you have any examples of a time when you encountered a problem due to low quality information?For example, you did not receive a package because the address was incorrect or missingList the business ramifications that can occur for an organization that maintains low quality informationCharacteristics of High Quality InformationAccuracy Are all the values correct? For example, is the name spelled correctly? Is the dollar amount recorded properly?Completeness Are any of the values missing? For example, is the address complete including street, city, state, and zip code?Consistency Is aggregate or summary information in agreement with detailed information?For example, do all total fields equal the true total of the individual fields?Uniqueness Is each transaction, entity, and event represented only once in the information?For example, are there any duplicate customers?Timeliness Is the information current with respect to the business requirements? For example, is information updated weekly, daily, or hourly?CLASSROOM EXERCISEInquiring about InformationBreak your students into groups and ask each group to provide an additional example of each of the five common characteristics of high quality information that is not provided in the above figureFor example, Accuracy – does a purchase price on a bill match the item description on the bill?Item 1: Kids juice cup, cost $10,000Chances are a kids juice cup would not cost $10,000 and this is an inaccurate item
9Information Quality Low quality information example Walk-through each of the six issues and have your students extrapolate a potential business problem that might be associated with each issue. The example does not state what type of database or spreadsheet this information is contained (sales, marketing, customer service, billing, etc), so allow your students use their imagination when they are extrapolating the potential business problemsIssue 1: Without a first name it would be impossible to correlate this customer with customers in other databases (Sales, Marketing, Billing, Customer Service) to gain a compete customer view (CRM)Issue 2: Without a complete street address there is no possible way to communicate with this customer via mail or deliveries. An order might be sitting in a warehouse waiting for the complete address before shipping. The company has spent time and money processing an order that might never be completedIssue 3: If this is the same customer, the company will waste money sending out two sets of promotions and advertisements to the same customers. It might also send two identical orders and have to incur the expense of one order being returnedIssue 4: This is a good example of where cleaning data is difficult because this may or may not be an error. There are many times when a phone and a fax have the same number. Since the phone number is also in the address field, chances are that the number is inaccurateIssue 5: The business would have no way of communicating with this customer viaIssue 6: The company could determine the area code based on the customer’s address. This takes time, which costs the company money. This is a good reason to ensure that information is entered correctly the first time. All incorrect information needs to be fixed, which costs time and money
10Understanding the Costs of Poor Information The four primary sources of low quality information include:Online customers intentionally enter inaccurate information to protect their privacyInformation from different systems have different entry standards and formatsCall center operators enter abbreviated or erroneous information by accident or to save timeThird party and external information contains inconsistencies, inaccuracies, and errorsAddressing the above sources of information inaccuracies will significantly improve the quality of organizational informationDetermine a few additional sources of low quality informationA customer service representative could accidentally transpose a number in an address or misspell a last name
11Understanding the Costs of Poor Information Potential business effects resulting from low quality information include:Inability to accurately track customersDifficulty identifying valuable customersInability to identify selling opportunitiesMarketing to nonexistent customersDifficulty tracking revenue due to inaccurate invoicesInability to build strong customer relationshipsCan you list any additional business effects resulting from poor information? (focus on organizational strategies such as SCM, CRM, and ERP)Poor information could cause the SCM system to order too much inventory from a supplier based on inaccurate ordersPoor information could cause a CRM system to send an expensive promotional item (such as a fruit basket) to the wrong address of one of its best customersWhat occurs when you have the inability to build strong customer relationships?Increase buyer powerGartner podcasts are excellent course resources, there is current a good podcast on the cost of poor data to an organization
12Understanding the Benefits of Good Information High quality information can significantly improve the chances of making a good decisionGood decisions can directly impact an organization's bottom lineCLASSROOM EXERCISEUnderstanding Information’s QualityBreak your students into groups and ask them to compile a list of all of the issues found in the following information (the table is located in the IM – cut and paste onto a slide or display on the projector)Ask your students to also list why most low quality information errors occur and what an organization can do to help implement high quality information
13DATABASE FUNDAMENTALS Information is everywhere in an organizationInformation is stored in databasesDatabase – maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)How many of you are familiar with databases?What kinds of databases can be found around your college?Student registrationCourse evaluationPayrollParking servicesExplain to your students that almost every business decision is based on informationThe information required to make these decisions is typically stored in databases
14DATABASE FUNDAMENTALS Database models include:Hierarchical database model – information is organized into a tree-like structure (using parent/child relationships) in such a way that it cannot have too many relationshipsNetwork database model – a flexible way of representing objects and their relationshipsRelational database model – stores information in the form of logically related two-dimensional tablesMost organizations use the relational database modelThis text focuses on the relational database modelDiscuss the Coca-Cola Bottling Company of Egypt example in the text (Fig 6.5)
15DATABASE ADVANTAGESDatabase advantages from a business perspective includeIncreased flexibilityIncreased scalability and performanceReduced information redundancyIncreased information integrity (quality)Increased information securityAll of the above are discussed in the following slides:A good way to explain databases is to compare them to spreadsheetsWhat are the limitations when using a spreadsheet?Limited number of rows and columns (Excel - 65,536 rows by 256 columns) Once you use more than 65,536 rows you have outgrown your spreadsheetOnly one users can access the spreadsheetUsers can view all information in the spreadsheetUsers can change all information in the spreadsheetAll of the disadvantages associated with a spreadsheet are fixed when using a databaseThese advantages are discussed in detail over the next several slides
16Increased Flexibility A well-designed database should:Handle changes quickly and easilyProvide users with different viewsHave only one physical viewPhysical view – deals with the physical storage of information on a storage deviceHave multiple logical viewsLogical view – focuses on how users logically access informationThe separation between logical and physical views is what allows each user to access database information differentlyWhat would happen if a new database called “RealData” hit the market and allowed only one logical view?The “RealData” database simply would never sell. With only one logical view every person in an entire organization would have the same viewDefine two database views for your school’s student database (one for students, and one for instructors)What does the student view display when a student accesses the school’s student database?Courses enrolledGradesTuitionCredits for graduationWhat does the instructor view display when an instructor accesses the school’s student database?Courses teachingStudents in each coursePayment informationVacation time
17Increased Scalability and Performance A database must scale to meet increased demand, while maintaining acceptable performance levelsScalability – refers to how well a system can adapt to increased demandsPerformance – measures how quickly a system performs a certain process or transactionWhat happens to a business if its suddenly experienced a 60 percent growth in sales and its IT systems fail with all of the increased activity?Remind your students that a big part of developing successful IT systems is being able to anticipate future growthCLASSROOM EXERCISEBuilding an ER DiagramBreak your students into groups and ask them to create an entity relationship diagram similar to the one in Figure 6.5 for a company or product of their choice. If the students are uncomfortable with databases, you should recommend that they stick to a company similar to the TCCBCE, perhaps a snack food producer, mountain bike equipment producer, or even a footwear producer. If your students are more comfortable with databases, ask them to choose a company that would challenge them such as a fast food restaurant, online book seller, or even a university’s course registration system.The important part of this exercise is for your students to begin to understand how the tables in a database relate. Be sure their ER diagrams include primary keys and foreign keys. Have your students present their ER diagrams to the class and ask the students to find any potential errors with the diagrams.
18Reduced Redundancy Databases reduce information redundancy Redundancy – the duplication of information or storing the same information in multiple placesInconsistency is one of the primary problems with redundant informationOne of the primary goals of a database is to eliminate information redundancy by recording each piece of information in only one placeThis is a good time to tie the discussion back to the material in the previous chapter, low quality informationRecall what happens when a single customer is stored twice with different phone numbers, addresses, or order information in a single database
19Increased Integrity (Quality) Information integrity – measures the quality of informationIntegrity constraint – rules that help ensure the quality of informationRelational integrity constraint – rule that enforces basic and fundamental information-based constraintsBusiness-critical integrity constraint – rule that enforce business rules vital to an organization’s success and often require more insight and knowledge than relational integrity constraintsCan you define two relational integrity constraints for an ordering system?Users cannot create an order for a nonexistent customerAn order cannot be shipped without an addressCan you define two business-critical integrity constraints for an ordering system?Product returns are not accepted for fresh product 15 days after purchaseA discount maximum of 20 percent
20Increased SecurityInformation is an organizational asset and must be protectedDatabases offer several security features including:Password – provides authentication of the userAccess level – determines who has access to the different types of informationAccess control – determines types of user access, such as read-only accessWhy you would want to define access level security?Access levels will typically mimic the hierarchical structure of the organization and protect organizational information from being viewed and manipulated by individuals who should not have access to the sensitive or confidential informationLow level employees typically have the lowest levels of accessHigh level employees typically have access to all types of database informationFor example: You would not want analysts viewing all salary information for the entire company - in general:Analysts can usually only view their own salaryManagers have higher access and can view the salaries of all their team members, but cannot view other managers’ salariesDirectors can view all of their managers’ and analysts’ salaries, but not other directors’ salariesThe CFO and CEO can view every employee’s salary
21RELATIONAL DATABASE FUNDAMENTALS Entity – a person, place, thing, transaction, or event about which information is storedThe rows in each table contain the entitiesIn Figure 6.5 CUSTOMER includes Dave’s Sub Shop and Pizza Palace entitiesEntity class (table) – a collection of similar entitiesIn Figure 6.5 CUSTOMER, ORDER, ORDER LINE, DISTRIBUTOR, and PRODUCT entity classesThis text focuses on the relational database modelReview Figure 6.5What kinds of additional entity classes might be found in this database?INVENTORY, MARKETING CAMPAIGN, SALES QUOTE, INVOICE, PAYMENTWhat kinds of additional entities might be found in the CUSTOMER table?Could include any additional customer – Joe’s Mexican Restaurant, Fitness Forever, and Summer’s Flower Shop (these are all fictitious)
22RELATIONAL DATABASE FUNDAMENTALS Attributes (fields, columns) – characteristics or properties of an entity classThe columns in each table contain the attributesIn Figure 6.5 attributes for CUSTOMER include:Customer IDCustomer NameContact NamePhoneReview Figure 6.5What kinds of additional attributes might be found in the CUSTOMER table for Dave’s Sub Shop?Could include any additional customer information:AddressFaxCell phone
23RELATIONAL DATABASE FUNDAMENTALS Primary keys and foreign keys identify the various entity classes (tables) in the databasePrimary key – a field (or group of fields) that uniquely identifies a given entity in a tableForeign key – a primary key of one table that appears an attribute in another table and acts to provide a logical relationship among the two tablesReview Figure 6.5Explain to your students that the logic that correlates the tables is implemented through the primary keysFor example: Hawkins Shipping in the DISTRIBUTOR table has a primary key called Distributor ID – DEN8001Notice that Hawkins Shipping (Distributor ID DEN8001) is responsible for delivering orders andTherefore, Distributor ID in the ORDER table creates a logical relationship (who shipped what order) between ORDER and DISTRIBUTOR
24Potential relational database for Coca-Cola Walk your students through the relational database model in Figure 6.5To ensure your students are grasping the concepts, ask them to answer the following:How many orders have been placed for T’s Fun Zone?Ans: 1 Order IT 34563How many orders have been placed for Pizza Palace?Ans: NoneHow many items are included in Dave’s Sub Shop’s two orders?Ans: Order has 3 items and order has one item for a total of 4 items in both orders.Who is responsible for distributing Dave’s Sub Shop’s orders?Ans: Hawkins ShippingWhich products are included in Order 34562?Ans: 300 Vanilla CokePotential relational database for Coca-Cola
25DATABASE MANAGEMENT SYSTEMS Database management systems (DBMS) – software through which users and application programs interact with a databaseDiscuss the two primary forms of user interaction with a databaseDirect interaction –The user interacts directly with the DBMSThe DBMS obtains the information from the databaseIndirect interactionUser interacts with an application (i.e., payroll application, manufacturing application, sales application)The application interacts with the DBMS
26DATABASE MANAGEMENT SYSTEMS Four components of a DBMSThe components of the DBMS are discussed in detail on the following slidesA DBMS contains:Data definition component – helps create and maintain the data dictionary and the structure of the databaseData manipulation component – allows users to create, read, update, and delete information in a databaseApplication generation component – includes tools for creating visually appealing and easy-to-use applicationsData administration component – provides tools for managing the overall database environment by providing faculties for backup, recovery, security, and performance
27Data Definition Component Data definition component – creates and maintains the data dictionary and the structure of the databaseThe data definition component includes the data dictionaryData dictionary – a file that stores definitions of information types, identifies the primary and foreign keys, and maintains the relationships among the tablesThe data dictionary is an important part of the DBMS because users can consult the dictionary to determine the different types of database informationThe following slide displays an example of logical field properties found in a database
28Data Definition Component Data dictionary essentially defines the logical properties of the information that the database containsLogical properties displayed in the figure vary depending on the type of informationA typical address field can accept numbers, letters, and special characters (Relational integrity constraint)The validation rule requiring that a discount cannot exceed 100 percent (Business-critical integrity constraint)CLASSROOM EXERCISEProperties of LogicBreak your students into groups and assign a different logical property from the above figure to each group (Do not assign the Field Name property)Ask your students to define a few additional examples of the logical property they were assignedHave your students present their answers to the entire class
29Data Manipulation Component Data manipulation component – allows users to create, read, update, and delete information in a databaseA DBMS contains several data manipulation tools:View – allows users to see, change, sort, and query the database contentReport generator – users can define report formatsQuery-by-example (QBE) – users can graphically design the answers to specific questionsStructured query language (SQL) – query languageViews and report generators are the most common data manipulation tools used by non-IT personnelA query is a simple question such as “How many orders were placed today?”QBE tools are popular because users manipulate a drag-and-drop GUI to graphically build a questionThere are many different types of QBE tools including BRIOWhat benefits might you receive from using a tool such as BRIO?Without knowing a QBE tool, a person will have to wait for someone else to gather the information to answer their questionsThis could take days in some organizationsSQL is rarely used by non-IT personnel
30Data Manipulation Component Sample report using Microsoft Access Report GeneratorThe above figure displays a sample report created with Microsoft AccessAn IT specialist is typically required to build the reportOnce the report is built, any user can run the report as frequently as they wish by simply clicking on a button
31Data Manipulation Component Sample report using Access Query-By-Example (QBE) toolThe above figure displays a QBE graphical queryThe above figure displays the call-outs that explain the fields in the queryThe results of the query are displayed in Figure on the next slide
32Data Manipulation Component Results from the query in Figure 6.10Explain to your students that the results from this query have not been placed in a report, hence they are not formatted and do not look visually appealingIf this query was built into a report, the user could simply run the report, which would run the query, and display the results in a nice formatted report
33Data Manipulation Component SQL version of the QBE Query in Figure 6.10The figure displays the code that is required to generate the answer in the previous figureThis figure should demonstrate to the students the value of a QBE toolWithout the QBE tool, the user would have to write this complex code to get the answer a questionThe QBE tool automatically generates the SQL codeAll the user has to do is drag and drop on the graphical interface
34Application Generation and Data Administration Components Application generation component – includes tools for creating visually appealing and easy-to-use applicationsData administration component – provides tools for managing the overall database environment by providing faculties for backup, recovery, security, and performanceIT specialists primarily use these componentsIT specialists directly interact with the data administration componentTypically, higher level individuals oversee the use of the data administration componentFor example, the CPO is responsible for ensuring the ethical and legal use of information, therefore, he or she would direct the use of the security features of the data administration component and implement policies and procedures concerning who has access to different types of information
35INTEGRATING DATA AMONG MULTIPLE DATABASES Integration – allows separate systems to communicate directly with each otherForward integration – takes information entered into a given system and sends it automatically to all downstream systems and processesBackward integration – takes information entered into a given system and sends it automatically to all upstream systems and processesOne of the biggest benefits of integration is that organizations only have to enter information into the systems once and it is automatically sent to all of the other systems throughout the organizationThis feature alone creates huge advantages for organizations because it reduces information redundancy and ensures accuracy and completenessWithout integrations an organization would have to enter information into every single system that requires the information from marketing and sales to billing and customer serviceFor example, customer information would have to be manually entered into the marketing, sales, ordering, inventory, billing, and shipping databases. (Each of these systems are separate and would have their own database – if the company doesn’t have a complete ERP installed.)Entering the same customer information into multiple systems is redundant, and chances of making a mistake in one of the systems is highIntegrations offer many advantages, but for the most part, the automated flow of information among separate systems is the biggest benefit
36INTEGRATING DATA AMONG MULTIPLE DATABASES Forward and backward integrationIdentify the arrows along the top of the figure when explaining forward integrationsBasically, all information flows forward along the business processSales enters the information when it is negotiating the sale (looking for opportunities)The information is then passed to the order entry system when the order is actually placedThe order fulfillment system picks the products from the warehouse, packs the products, labels boxes, etcOnce the order is filled and shipped, the customer is billedWhat would happen if users could enter order information directly into the billing system?The systems would quickly become out-of-sync. There might be bills for nonexistent orders, or orders that do not have any bills (if someone deleted a bill)For this reason organizations typically place a business-critical integrity constraint on integrated systems: With a forward integration the information must be entered in the sales system, you could not enter information directly into the billing systemIntegrations are expensive to build and maintainIntegrations are difficult to implementFor these reasons many organizations only build forward integrations and use business-critical integrity constraints to ensure all information is always entered only at the start of the integration (one source of record)Identify the arrows along the bottom of the figure when explaining backward integrationsBasically, all information flows backward along the business processBilling enters information and this information is passed back to the order systemThe order fulfillment system passes the information back to the order entry systemThe order entry system passes the information back to the sales systemWhy would an organization want to build both forward and backward integrations?This allows users to enter information at any point in the business process and the information is automatically sent upstream and downstream to all other systemsFor example, if order fulfillment determined that they could not fulfill an order (the product had been discontinued), they could simply enter this information into the database and it would be sent automatically upstream to the sales representative who could contact the customer and downstream to billing to remove the item from the bill
37INTEGRATING DATA AMONG MULTIPLE DATABASES Building a central repository specifically for integrated informationThe above figure displays an example of customer information integrated using this methodUsers can create, read, update, and delete in the main customer repository, and it is automatically sent to all of the other databasesThis method does not follow the business process when building the integrationsBusiness-critical integrity constraints still need to be built to ensure information is only ever entered into the customer repository, otherwise the information will become out-of-sync
38OPENING CASE QUESTIONS Google How did the Web site RateMyProfessors.com solve its problem of low-quality information?Review the five common characteristics of high-quality information and rank them in order of importance to Google’s businessWhat would be the ramifications to Google’s business if the search information it presented to its customers was of low quality?1. How did the Web site RateMyProfessors.com solve its problem of low-quality information?The developers of the Web site turned to Google’s API to create an automatic verification tool. If Google finds enough mentions in conjunction with a new professor or university to be added to the database, then it considers the information valid and posts it to the Web site.2. Review the five common characteristics of high-quality information and rank them in order of importance to Google’s business.Student answers to this question will vary depending on their personal views and experiences with technology. The important part of the question is understanding the students’ justifications for their order. Potential order of importance:Timeliness – Google’s information must be timely. If users are receiving old and outdated answers to their queries, they will not use Google for long.Accuracy – Google’s search results must be accurateConsistency – Google’s results must be consistent. Users will not trust the system if it provides different results for the same queryCompleteness – Google’s search results need to be complete; however, users understand that there could be thousands of answers to a search result and are not anticipating that Google find and provide thousands of answers for each queryUniqueness – Google’s users expect to receive unique answers to their queries, not the same search site listed over and over again3. What would be the ramifications to Google’s business if the search information it presented to its customers was of low quality?Displaying links that do not work, links that have nothing to do with the query, or multiple duplication of links will cause customers to switch to a different search engine. If Google’s search results were of low-quality, they would quickly lose business. Since providing search results is Google’s primary line of business, it must display high-quality search results.
39OPENING CASE QUESTIONS Google Describe the different types of databases. Why should Google use a relational database?Identify the different types of entity, entity classes, attributes, keys, and relationships that might be stored in Google’s AdWords relational database4. Describe the different types of databases. Why should Google use a relational database?There are many different models for organizing information in a database, including the hierarchical database, network database, and the most prevalent—the relational database model.In a hierarchical database model, information is organized into a tree-like structure that allows repeating information using parent/child relationships, in such a way that it cannot have too many relationships. Hierarchical structures were widely used in the first mainframe database management systems. However, owing to their restrictions, hierarchical structures often cannot be used to relate to structures that exist in the real world.The network database model is a flexible way of representing objects and their relationships. Where the hierarchical model structures information as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a lattice structure.The relational database model is a type of database that stores information in the form of logically related two-dimensional tables. The relational database model stores information in the form of logically related two-dimensional tables. Entities, entity classes, attributes, primary keys, and foreign keys are all fundamental concepts included in the relational database model.5. Identify the different types of entity, entity classes, attributes, keys, and relationships that might be stored in Google’s Adwords relational database.Entity classes could include:DOCUMENT TITLESEARCH TERMWORDLOCATIONWEB PAGEAttributes could include:AuthorTitleKey wordsCategoryWeb site locationLowest bidHighest bidTotal hitsEach table would need to define a primary key and could include:Document IDSearch item IDLocation IDCompany IDThe tables in the database would have 1-to-1 relationships, 1-to-many relationships, and many-to-many relationships. If you are planning on having your students design and build an ERD please review the associated Access and Database Technology Plug-Ins.
41LEARNING OUTCOMESDescribe the roles and purposes of data warehouses and data marts in an organizationCompare the multidimensional nature of data warehouses (and data marts) with the two-dimensional nature of databases6.7 Describe the roles and purposes of data warehouses and data marts in an organizationThe primary purpose of data warehouses and data marts are to perform analytical processing or OLAPThe insights into organizational information that can be gained from analytical processing are instrumental in setting strategic directions and goals6.8 Compare the multidimensional nature of data warehouses (and data marts) with the two-dimensional nature of databasesDatabases contain information in a series of two-dimensional tables, which means that you can only ever view two dimensions of information at one time. In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows. Each layer in a data warehouse or data mart represents information according to an additional dimension. Dimensions could include such things as products, promotions, stores, category, region, stock price, date, time, and even the weather. The ability to look at information from different dimensions can add tremendous business insight.
42LEARNING OUTCOMESIdentify the importance of ensuring the cleanliness of information throughout an organizationExplain the relationship between business intelligence and a data warehouse6.9 Identify the importance of ensuring the cleanliness of information throughout an organizationAn organization must maintain high-quality information in the data warehouseInformation cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete informationWithout high-quality information the organization will be unable to make good business decisions6.10 Explain the relationship between business intelligence and a data warehouse.A data warehouse is an enabler of business intelligence. The purpose of a data warehouse is to pull all kinds of disparate information into a single location where it is cleansed and scrubbed for analysis.
43HISTORY OF DATA WAREHOUSING Data warehouses extend the transformation of data into informationIn the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functionsThe data warehouse provided the ability to support decision making without disrupting the day-to-day operationsCLASSROOM OPENERGREAT BUSINESS DECISIONS – Bill Inmon – The Father of the Data WarehouseBill Inmon, is recognized as the "father of the data warehouse" and co-creator of the "Corporate Information Factory." He has 35 years of experience in database technology management and data warehouse design. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for every major computing association and many industry conferences, seminars, and tradeshows.As an author, Bill has written about a variety of topics on the building, usage, and maintenance of the data warehouse and the Corporate Information Factory. He has written more than 650 articles, many of them have been published in major computer journals such as Datamation, ComputerWorld, DM Review and Byte Magazine. Bill currently publishes a free weekly newsletter for the Business Intelligence Network, and has been a major contributor since its inception.
44DATA WAREHOUSE FUNDAMENTALS Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasksThe primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposesWhat is the primary difference between a database and data warehouse?The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry informationThis enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repositoryData warehouses support only analytical processing (OLAP)
45DATA WAREHOUSE FUNDAMENTALS Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouseData mart – contains a subset of data warehouse informationThe ETL process gathers data from the internal and external databases and passes it to the data warehouseThe ETL process also gathers data from the data warehouse and passes it to the data marts
46DATA WAREHOUSE FUNDAMENTALS The data warehouse modeled in the above figure compiles information from internal databases or transactional/operational databases and external databases through ETLIt then send subsets of information to the data marts through the ETL processAsk your students to distinguish between a data warehouse and a data mart?Ans: A data warehouse has an enterprisewide organizational focus, while a data mart focuses on a subset of information for a given business unit such as finance
47Multidimensional Analysis Databases contain information in a series of two-dimensional tablesIn a data warehouse and data mart, information is multidimensional, it contains layers of columns and rowsDimension – a particular attribute of informationEach layer in a data warehouse or data mart represents information according to an additional dimensionDimensions could include such things as:ProductsPromotionsStoresCategoryRegionStock priceDateTimeWeatherWhy is the ability to look at information based on different dimensions critical to a businesses success?Ans: The ability to look at information from different dimensions can add tremendous business insightBy slicing-and-dicing the information a business can uncover great unexpected insights
48Multidimensional Analysis Cube – common term for the representation of multidimensional informationUsers can slice and dice the cube to drill down into the informationCube A represents store information (the layers), product information (the rows), and promotion information (the columns)Cube B represents a slice of information displaying promotion II for all products at all storesCube C represents a slice of information displaying promotion III for product B at store 2CLASSROOM EXERCISEAnalyzing Multiple Dimensions of InformationJump! is a company that specializes in making sports equipment, primarily basketballs, footballs, and soccer balls. The company currently sells to four primary distributors and buys all of its raw materials and manufacturing materials from a single vendor. Break your students into groups and ask them to develop a single cube of information that would give the company the greatest insight into its business (or business intelligence) given the following choices:Product A, B, C, and DDistributor X, Y, and ZPromotion I, II, and IIISalesSeasonDate/TimeSalesperson Karen and JohnVendor SmithsonRemember you can pick only 3 dimensions of information for the cube, they need to pick the best 3ProductPromotionThese give the three most business-critical pieces of information
49Multidimensional Analysis Data mining – the process of analyzing data to extract information not offered by the raw data aloneTo perform data mining users need data-mining toolsData-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision makingData mining can begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down), or the reverse (drilling up)Data-mining tools include query tools, reporting tools, multidimensional analysis tools, statistical tools, and intelligent agentsAsk your students to provide an example of what an accountant might discover through the use of data-mining toolsAns: An accountant could drill down into the details of all of the expense and revenue finding great business intelligence including which employees are spending the most amount of money on long-distance phone calls to which customers are returning the most productsCould the data warehousing team at Enron have discovered the accounting inaccuracies that caused the company to go bankrupt?If the did spot them, what should the team have done?
50Information Cleansing or Scrubbing An organization must maintain high-quality data in the data warehouseInformation cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete informationThis is a an excellent time to return to the information learned in Chapter 6 on high-quality and low-quality informationWhat would happen if the information contained in the data warehouse was only about 70 percent accurate?Would you use this information to make business decisions?Is it realistic to assume that an organization could get to a 100% accuracy level on information contained in its data warehouse?No, it is too expensive
51Information Cleansing or Scrubbing Contact information in an operational systemTaking a look at customer information highlights why information cleansing and scrubbing is necessaryCustomer information exists in several operational systemsIn each system all details of this customer information could change form the customer ID to contact informationDetermining which contact information is accurate and correct for this customer depends on the business process that is being executed
52Information Cleansing or Scrubbing Standardizing Customer name from Operational SystemsAsk your students if they have ever received more than one piece of identical mail, such as a flyer, catalog, or applicationIf so, ask them why this might have occurredCould it have occurred because their name was in many different disparate systems?What is the cost to the business of sending multiple identical marketing materials to the same customers?ExpenseRisk of alienating customers
53Information Cleansing or Scrubbing Information cleansing allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse
54Information Cleansing or Scrubbing Accurate and complete informationWhy do you think most businesses cannot achieve 100% accurate and complete information?If they had to choose a percentage for acceptable information what would it be and why?Some companies are willing to go as low as 20% complete just to find business intelligenceFew organizations will go below 50% accurate – the information is useless if it is not accurateAchieving perfect information is almost impossibleThe more complete and accurate an organization wants to get its information, the more it costsThe tradeoff between perfect information lies in accuracy verses completenessAccurate information means it is correct, while complete information means there are no blanksMost organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete
55BUSINESS INTELLIGENCE Business intelligence – information that people use to support their decision-making effortsPrinciple BI enablers include:TechnologyPeopleCultureTechnologyEven the smallest company with BI software can do sophisticated analyses today that were unavailable to the largest organizations a generation ago. The largest companies today can create enterprisewide BI systems that compute and monitor metrics on virtually every variable important for managing the company. How is this possible? The answer is technology—the most significant enabler of business intelligence.PeopleUnderstanding the role of people in BI allows organizations to systematically create insight and turn these insights into actions. Organizations can improve their decision making by having the right people making the decisions. This usually means a manager who is in the field and close to the customer rather than an analyst rich in data but poor in experience. In recent years “business intelligence for the masses” has been an important trend, and many organizations have made great strides in providing sophisticated yet simple analytical tools and information to a much larger user population than previously possible.CultureA key responsibility of executives is to shape and manage corporate culture. The extent to which the BI attitude flourishes in an organization depends in large part on the organization’s culture. Perhaps the most important step an organization can take to encourage BI is to measure the performance of the organization against a set of key indicators. The actions of publishing what the organization thinks are the most important indicators, measuring these indicators, and analyzing the results to guide improvement display a strong commitment to BI throughout the organization.
56DATA MININGData-mining software includes many forms of AI such as neural networks and expert systemsData-mining tools apply algorithms to information sets to uncover inherent trends and patterns in the informationAnalysts use this information to develop new business strategies and business solutionsAsk your students to identify an organization that would “not” benefit from investing in data warehousing and data-mining toolsAns: NoneCLASSROOM EXERCISEAnalyzing Multiple Dimensions of InformationJump! is a company that specializes in making sports equipment, primarily basketballs, footballs, and soccer balls. The company currently sells to four primary distributors and buys all of its raw materials and manufacturing materials from a single vendor. Break your students into groups and ask them to develop a single cube of information that would give the company the greatest insight into its business (or business intelligence).Product A, B, C, and DDistributor X, Y, and ZPromotion I, II, and IIISalesSeasonDate/TimeSalesperson Karen and JohnVendor Smithson
57DATA MINING Common forms of data-mining analysis capabilities include: Cluster analysisAssociation detectionStatistical analysisCan you explain the difference between cluster analysis, association detection, and statistical analysis?Cluster analysis - a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possibleAssociation detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the informationStatistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysisCluster analysis, association detection, and statistical analysis are covered in detail over the next few slides
58Cluster AnalysisCluster analysis – a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possibleCRM systems depend on cluster analysis to segment customer information and identify behavioral traitsSome examples of cluster analysis include:Consumer goods by content, brand loyalty or similarityProduct market typology for tailoring sales strategiesRetail store layouts and sales performancesCorporate decision strategies using social preferencesControl, communication, and distribution of organizationsIndustry processes, products, and materialsDesign of assembly line control functionsCharacter recognition logic in OCR readersData base relationships in management information systems
59Association Detection Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the informationMarket basket analysis – analyzes such items as Web sites and checkout scanner information to detect customers’ buying behavior and predict future behavior by identifying affinities among customers’ choices of products and servicesMaytag uses association detection to ensure that each generation of appliances is better than the previous generationMaytag’s warranty analysis tool automatically detects potential issues, provides quick and easy access to reports, and performs multidimensional analysis on all warranty information
60Statistical AnalysisStatistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysisForecast – predictions made on the basis of time-series informationTime-series information – time-stamped information collected at a particular frequencyKraft uses statistical analysis to assure consistent flavor, color, aroma, texture, and appearance for all of its lines of foodsKraft evaluates every manufacturing procedure, from recipe instructions to cookie dough shapes and sizes to ensure that the billions of Kraft products that reach consumers each year taste great (and the same) with every biteNestle Italiana uses data mining and statistical analysis to determine production forecasts for seasonal confectionery productsThe company’s data-mining solution gathers, organizes, and analyzes massive volumes of information to produce powerful models that identify trends and predict confectionery sales
61OPENING CASE QUESTIONS Google How could Google use a data warehouse to improve its business operations?Why would Google need to scrub and cleanse the information in its data warehouse?Identify a data mart that Google’s marketing and sales department might use to track and analyze its AdWords revenue6. How could Google use a data warehouse to improve its business operations?Google could use a data warehouse to contain not only internal organization information, but also external information such as market trends, competitor information, and industry trends. Google could then analyze its business across markets, among its competitors, and throughout different industries.7. Why would Google need to scrub and cleanse the information in its data warehouse?Google must maintain high-quality information in its data warehouse. Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Without high-quality information Google will be unable to make good business decisions.8. Identify a data mart that Google’s marketing and sales department might use to track and analyze its AdWords revenue.One potential data mart might include information broken down by industry (products, telecommunications, health care, energy, travel, human services) and tracked against revenue by companies. This would tell Google which industries are using AdWords and which industries are untapped. It would also tell Google which customers in each industry are taking advantage of AdWords and perhaps would benefit from a specialized marketing plan, and which customers are not yet taking advantage of AdWords and might be interested in learning about the product.
62CLOSING CASE ONE Fishing for Quality Explain the importance of high-quality information for the Alaska Department of Fish and GameReview the five common characteristics of high quality information and rank them in order of importance for the Alaska Department of Fish and GameHow could data warehouses and data marts be used to help the Alaska Department of Fish and Game improve the efficiency and effectiveness of its operations?1. Explain the importance of high-quality information for the Alaska Department of Fish and Game.If the department receives low quality information from fish counts then either too many fish escape or too many are caught. Allowing too many salmon to swim upstream could deprive fishermen of their livelihoods. Allowing too many to be caught before they swim upstream to spawn could diminish fish populations- yielding devastating effects for years to come.2. Review the five common characteristics of high quality information and rank them in order of importance for the Alaska Department of Fish and Game.Student answers to this question will vary depending on their personal views and experiences with technology. The important part of the question is understanding the student’s justifications for their order. Potential order of importance:Timeliness – Without timely information the department can not make fishing decisionsAccuracy – inaccurate information will lead to the department making the wrong decisionsCompleteness – incomplete information will make it harder for the department to make decisions regarding the amount of fish. Incomplete information probably occurs frequently since part of the process, fish escapement, is performed manuallyConsistency – information inconsistency probably occurs since the fish escapement is performed manuallyUniqueness – a fish ticket could be mistakenly entered twice3. How could data warehouses and data marts be used to help the Alaska Department of Fish and Game improve the efficiency and effectiveness of its operations?A data warehouse is a logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks. The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository in such a way that employees can make decisions and undertake business analysis activities. Therefore, while databases store the details of all transactions (for instance, the sale of a product) and events (hiring a new employee), data warehouses store that same information but in an aggregated form more suited to supporting decision making tasks. Aggregation, in this instance, can include totals, counts, averages, and the like. The Alaska department of Fish and Game could use a data warehouse to track all of its information, including external information such as weather, environmental issues, and fish markets. This would allow the department to make informed decisions with all possible variables.The data warehouse sends subsets of the information to data marts. A data mart contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts having focused information subsets particular to the needs of a given business unit such as finance or production and operations. The department could use data marts to monitor small subsets of information.
63CLOSING CASE ONE Fishing for Quality What two data marts might the Alaska Department of Fish and Game want to build to help it analyze its operational performance?Do the managers at the Alaska Department of Fish and Game actually have all of the information they require to make an accurate decision? Explain the statement “it is never possible to have all of the information required to make the best decision possible”4. What two data marts might the Alaska Department of Fish and Game want to build to help it analyze its operational performance?The department might have a data mart for:Daily CatchesFishery InformationSalmon SeasonFish SpeciesCatch InformationMarketing InformationWeatherEnvironmentFishermenMarket5. Do the managers at the Alaska Department of Fish and Game actually have all of the information they require to make an accurate decision? Explain the statement “it is never possible to have all of the information required to make the best decision possible.”No, the managers at the Alaska Department of Fish and Game will never have every single piece of information. It would be almost impossible to count every single fish. However, they have enough to make an accurate estimate as to the number of fish. If you wait to have every single piece of information you would probably never make a decision. We typically receive enough information to make an accurate decision. Of course, the more information you have, the better the decision you can make, but if you wait to get every piece of information you will take too long to make the decision.
64CLOSING CASE TWO Mining the Data Warehouse Explain how Ben & Jerry’s is using business intelligence tools to remain successful and competitive in a saturated marketIdentify why information cleansing and scrubbing is critical to California Pizza Kitchen’s business intelligence tool’s success1. Explain how Ben & Jerry’s is using business intelligence tools to remain successful and competitive in a saturated market.Ben & jerry’s tracks the ingredients and life of each pint in a data warehouse. If a consumer calls in with a complaint, the consumer affairs staff matches up the pint with which supplier’s mile, eggs, or cherries, etc. did not meet the organization’s near-obsession with quality.2. Identify why information cleansing and scrubbing is critical to California Pizza Kitchen’s business intelligence tool’s success.Financial statements must be as accurate and complete as possible. There have been too many instances in the past where shoddy financial statements have lead to financial crisis such as Enron and WorldCom. It does not matter how good or how many BI tools California Pizza Kitchen uses; if the core data is dirty the results will be inaccurate.
65CLOSING CASE TWO Mining the Data Warehouse Illustrate why 100 percent accurate and complete information is impossible for Noodles & Company to obtainDescribe how each of the companies above is using BI from their data warehouse to gain a competitive advantage3. Illustrate why 100 percent accurate and complete information is impossible for Noodles & Company to obtain.Noodles & Company will never have 100 percent accurate and complete information. Perfect information is pricey. Achieving perfect information is almost impossible. The more complete and accurate an organization wants to get its information, the more it costs. The tradeoff between perfect information lies in accuracy verses completeness. Accurate information means it is correct, while complete information means there are no blanks. Most organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete.4. Describe how each of the companies above is using BI from their data warehouse to gain a competitive advantage.Ben & Jerry’s is using BI to improve quality. Customers know that a pint of Ben & Jerry’s ice cream is of the highest quality.California Pizza Kitchen and Noodles & Company are using BI to improve financial analysis capabilities. Both companies can now receive more accurate and complete financial views of their businesses.
66CLOSING CASE THREE Harrah’s Identify the effects poor information might have on Harrah’s service-oriented business strategyHow does Harrah’s uses database technologies to implement its service-oriented strategy?Harrah’s was one of the first casino companies to find value in offering rewards to customers who visit multiple Harrah’s locations. Describe the effects on the company if it did not build any integrations among the databases located at each of its casinos1. Identify the effects low-quality information might have on Harrah’s service-oriented business strategyUsing the wrong information can lead to making the wrong decision. Making the wrong decision can cost time, money, and even reputations. Business decisions are only as good as the information used to make the decision. Low-quality information leads to low-quality business decisions. High-quality information can significantly improve the chances of making a good business decision and directly affect an organization’s bottom line. Harrah’s must use high-quality information whenever it is making business decisions, especially decisions that affect its service-oriented business strategy.2. How does Harrah’s uses database technologies to implement its service-oriented strategy?Harrah’s implements a service-oriented strategy called Total Rewards. Total Rewards allows Harrah’s to give every single customer the appropriate amount of personal attention, whether it’s leaving sweets in the hotel room or offering free meals. Total Rewards works by providing each customer with an account and a corresponding card that the player swipes each time he or she plays a casino game. The program collects information, via a database, on the amount of time the customers gamble, their total winnings and losses, and their betting strategies. Customers earn points based on the amount of time they spend gambling, which they can then exchange for comps such as free dinners, hotel rooms, tickets to shows, and even cash.3. Harrah’s was one of the first casino companies to find value in offering rewards to customers who visit multiple Harrah’s locations. Describe the effects on the company if it did not build any integrations among the databases located at each of its casinosWithout database integration among its hotels and casinos, Harrah’s would be unable to determine what a customer’s true value is to the company. For example, a customer that spend $500,000 dollars at one casino might be treated like royalty. This same customer could visit another Harrah’s location, but since the information is not integrated, the new location would have no idea that they had a high-rolling customer on the premises and they might not treat the customer accordingly.
67CLOSING CASE THREE Harrah’s Estimate the potential impact to Harrah’s business if there is a security breach in its customer informationIdentify three different types of data marts Harrah’s might want to build to help it analyze its operational performance4. Estimate the potential impact to Harrah’s business if there is a security breach in its customer informationSome customers have concerns regarding Harrah’s information collection strategy since they want to keep their gambling information private. If there was a security violation and sensitive customer information was compromised Harrah’s would risk losing its customers’ trust and their business.5. Identify three different types of data marts Harrah’s might want to build to help it analyze its operational performanceAnswers to this question will vary. Potential answers include (1) customers’ spending habits across properties, (2) repeat customer spending habits at a single location, (3) dealer sales at a location and across locations.
68CLOSING CASE THREE Harrah’s What might occur if Harrah’s fails to clean or scrub its information before loading it into its data warehouse?Describe cluster analysis, association detection, and statistical analysis and explain how Harrah’s could use each one to gain insights into its business6. What might occur if Harrah’s fails to clean or scrub its information before loading it into its data warehouse?Harrah’s must maintain high quality information in its data warehouse. Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Without high quality information Harrah’s will be unable to make good business decisions and operate its service-oriented strategy. Potential business effects resulting from low quality information include:Inability to accurately track customersDifficulty identifying valuable customersInability to identify selling opportunitiesMarketing to nonexistent customersDifficulty tracking revenue due to inaccurate invoicesInability to build strong customer relationships – which increases buyer power7. Describe cluster analysis, association detection, and statistical analysis and explain how Harrah’s could use each one to gain insights into its business.Cluster analysis is a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible. Cluster analysis is frequently used to segment customer information for customer relationship management systems to help organizations identify customers with similar behavioral traits, such as clusters of best customers or one-time customers. Cluster analysis also has the ability to uncover naturally occurring patterns in information.Association detection reveals the degree to which variables are related and the nature and frequency of these relationships in the information.Statistical analysis performs such functions as information correlations, distributions, calculations, and variance analysis, just to name a few.Harrah’s can use all of the above to uncover customer patterns to ensure it is taking advantage of customer relationship management strategies with its customers. It could also use the tools to uncover patterns in food, drink, and room availability to optimize its supply chain.