Opening Case: It Takes a Village to Write an Encyclopedia

Opening Case: It Takes a Village to Write an Encyclopedia
CHAPTER 7 Databases and Data Warehouses Opening Case: It Takes a Village to Write an Encyclopedia

Chapter Seven Overview
SECTION 7.1 – DATABASES Organizational Data and Information Storing Transactional Information Relational Database Fundamentals Relational Database Advantages Database Management Systems Integrating Data Among Multiple Databases SECTION 7.2 – DATA WAREHOUSING History of Data Warehousing Data Warehouse Fundamentals Business Intelligence Operational, Tactical, and Strategic BI Data Mining Business Benefits of BI Chapter 7 introduces: Data Information quality Databases Data mining Data warehouses in detail and highlights why and how information adds value to an organization

LEARNING OUTCOMES Understand the defining value characteristics of both transactional data and analytical information, and the need for organizations to have data and information that are timely and of high quality. Describe relational database fundamentals and advantages. Understand how users interact with a database management system, the advantage of data-driven Web sites, and the primary methods of integrating data and information across multiple databases in organizations.

LEARNING OUTCOMES Describe data warehouse fundamentals and advantages.
Understand business intelligence, data mining, and the relationship between business intelligence and data warehousing.

SECTION 7.1 DATABASES CLASSROOM OPENER
GREAT BUSINESS DECISIONS – Julius Reuter Uses Carrier Pigeons to Transfer Information In 1850, the idea that sending and receiving information could add business value was born. Julius Reuter began a business that bridged the gap between Belgium and Germany. Reuter built one of the first information management companies built on the premise that customers would be prepared to pay for information that was timely and accurate. Reuter used carrier pigeons to forward stock market and commodity prices from Brussels to Germany. Customers quickly realized that with the early receipt of vital information they could make fortunes. Those who had money at stake in the stock market were prepared to pay handsomely for early information from a reputable source, even if it was a pigeon. Eventually, Reuter’s business grew from 45 pigeons to over 200 pigeons. Eventually the telegraph bridged the gap between Brussels and Germany, and Reuter’s brilliantly conceived temporary monopoly was closed.

ORGANIZATIONAL DATA AND INFORMATION
Data are raw facts that describe the characteristics of an event. Information is data converted into a meaningful and useful context. Granularity refers to the extent of detail within the information (fine and detailed or “coarse” and abstract information) Have you ever had to correlate two different formats, levels, or granularities of information? How did you correlate the information? Taking a hard look at organizational information can yield exciting and unexpected results such as potential new markets, new ways of reaching customers, and even new ways of doing business

Information granularity – refers to the extent of detail within the information (fine and detailed or coarse and abstract) Levels Formats Granularities This is a good place to discuss the Samsung Electronics and Staples examples from the text Students should understand that information varies and different levels, formats, and granularities of information can be found throughout an organization CLASSROOM EXERCISE Organizing Information Break your students into groups and assign each group a different information type from Figure 7.2 Ask the students to find examples of the different kinds of information they might encounter in an organization for their information type For example, information formats for a spreadsheet might include a profit and loss statement or a market analysis Ask your students to determine potential issues that might arise from having different types of information Ask your students what happens if the information does not correlate For example, the customer letters sent out do not match the customers and customer addresses in the database For example, the total on the customer’s bill does not add up to the individual line items

The Value of Transactional Data and Analytical Information
Transactional data encompasses all of the data contained within a single business process or unit of work, and its primary purpose is to support the performing of daily operational tasks. Analytical information encompasses all organizational information, and its primary purpose is to support the performing of higher-level analysis tasks.

The Value of Transactional Data and Analytical Information

The Value of Timely Data and Information
Real-time is immediate Real-time data Real-time information Real-time system

The Value of Quality Data and Information

Low-quality information example

The four primary sources of low-quality information include: Online customers intentionally enter inaccurate information to protect their privacy Data or information from different systems have different entry standards and formats Call centre operators enter abbreviated or erroneous information by accident or to save time Third party and external information contains inconsistencies, inaccuracies, and errors Addressing the above sources of information inaccuracies will significantly improve the quality of organizational information Determine a few additional sources of low-quality information A customer service representative could accidentally transpose a number in an address or misspell a last name

Understanding the Costs of Poor Information
Potential business effects resulting from low quality information include: Inability to accurately track customers Difficulty identifying valuable customers Inability to identify selling opportunities Marketing to nonexistent customers Difficulty tracking revenue due to inaccurate invoices Inability to build strong customer relationships Can you list any additional business effects resulting from poor information? (focus on organizational strategies such as SCM, CRM, and ERP) Poor information could cause the SCM system to order too much inventory from a supplier based on inaccurate orders Poor information could cause a CRM system to send an expensive promotional item (such as a fruit basket) to the wrong address of one of its best customers What occurs when you have the inability to build strong customer relationships? Increase buyer power Gartner podcasts are excellent course resources, there is current a good podcast on the cost of poor data to an organization

Understanding the Benefits of Good Information
High-quality information can significantly improve the chances of making a good decision Good decisions can directly impact an organization's bottom line CLASSROOM EXERCISE Understanding Information’s Quality Break your students into groups and ask them to compile a list of all of the issues found in the table on page 5 of the Chapter 7 Instructor’s Manual – cut and paste onto a slide or display on the projector) Ask your students to also list why most low quality information errors occur and what an organization can do to help implement high quality information

RELATIONAL DATABASE FUNDAMENTALS
Information is everywhere in an organization Information is stored in databases Database – maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses) How many of you are familiar with databases? What kinds of databases can be found around your school? Student registration Course evaluation Payroll Parking services The iTunes software on your iPod Explain to your students that almost every business decision is based on information The information required to make these decisions is typically stored in databases

RELATIONAL DATABASE FUNDAMENTALS
Database models include: Hierarchical database model – information is organized into a tree-like structure (using parent/child relationships) in such a way that it cannot have too many relationships Network database model – a flexible way of representing objects and their relationships Relational database model – stores information in the form of logically related two-dimensional tables Most organizations use the relational database model This text focuses on the relational database model Discuss the Coca-Cola Bottling Company of Egypt example in the text (Fig 7.5)

Entities, Entity Classes, and Attributes
Entity – a person, place, thing, transaction, or event about which information is stored The rows in each table contain the entities In Figure 7.5 CUSTOMER includes Dave’s Sub Shop and Pizza Palace entities Entity class (table) – a collection of similar entities In Figure 7.5 CUSTOMER, ORDER, ORDER LINE, DISTRIBUTOR, and PRODUCT entity classes This text focuses on the relational database model Review Figure 7.5 What kinds of additional entity classes might be found in this database? INVENTORY, MARKETING CAMPAIGN, SALES QUOTE, INVOICE, PAYMENT What kinds of additional entities might be found in the CUSTOMER table? Could include any additional customer – Joe’s Mexican Restaurant, Fitness Forever, and Summer’s Flower Shop (these are all fictitious)

Attributes (fields, columns) – characteristics or properties of an entity class The columns in each table contain the attributes In Figure 7.5 attributes for CUSTOMER include: Customer ID Customer Name Contact Name Phone Review Figure 7.5 What kinds of additional attributes might be found in the CUSTOMER table for Dave’s Sub Shop? Could include any additional customer information: Address Fax Cell phone

Walk your students through the relational database model in Figure 7.5 To ensure your students are grasping the concepts, ask them to answer the following: How many orders have been placed for T’s Fun Zone? Ans: 1 Order IT 34563 How many orders have been placed for Pizza Palace? Ans: None How many items are included in Dave’s Sub Shop’s two orders? Ans: Order has 3 items and order has one item for a total of 4 items in both orders. Who is responsible for distributing Dave’s Sub Shop’s orders? Ans: Manitoba Shipping Which products are included in Order 34562? Ans: 300 Vanilla Coke Potential relational database for Coca-Cola

Keys and Relationships
Primary keys and foreign keys identify the various entity classes (tables) in the database Primary key – a field (or group of fields) that uniquely identifies a given entity in a table Foreign key – a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables Review Figure 7.5 Explain to your students that the logic that correlates the tables is implemented through the primary keys For example: Manitoba Shipping in the DISTRIBUTOR table has a primary key called Distributor ID – MB8001 Notice that Manitoba Shipping (Distributor ID MB8001) is responsible for delivering orders and 34562 Therefore, Distributor ID in the ORDER table creates a logical relationship (who shipped what order) between ORDER and DISTRIBUTOR

RELATIONAL DATABASE ADVANTAGES
Database advantages from a business perspective include Increased flexibility Increased scalability and performance Reduced redundancy Increased integrity (quality) Increased security All of the above are discussed in the following slides: A good way to explain databases is to compare them to spreadsheets What are the limitations when using a spreadsheet? Limited number of rows and columns (Excel - 65,536 rows by 256 columns) Once you use more than 65,536 rows you have outgrown your spreadsheet Only one user can access the spreadsheet Users can view all information in the spreadsheet Users can change all information in the spreadsheet All of the disadvantages associated with a spreadsheet are fixed when using a database These advantages are discussed in detail over the next several slides

Increased Flexibility
A well-designed database should: Handle changes quickly and easily Provide users with different views Have only one physical view Physical view – deals with the physical storage of information on a storage device Have multiple logical views Logical view – focuses on how users logically access information The separation between logical and physical views is what allows each user to access database information differently What would happen if a new database called “RealData” hit the market and allowed only one logical view? The “RealData” database simply would never sell. With only one logical view every person in an entire organization would have the same view Define two database views for your school’s student database (one for students, and one for instructors) What does the student view display when a student accesses the school’s student database? Courses enrolled Grades Tuition Credits for graduation What does the instructor view display when an instructor accesses the school’s student database? Courses teaching Students in each course Payment information Vacation time

Increased Scalability and Performance
A database must scale to meet increased demand, while maintaining acceptable performance levels Scalability – refers to how well a system can adapt to increased demands Performance – measures how quickly a system performs a certain process or transaction What happens to a business if it suddenly experienced a 60 percent growth in sales and its IT systems fail with all of the increased activity? Remind your students that a big part of developing successful IT systems is being able to anticipate future growth CLASSROOM EXERCISE Building an ER Diagram Break your students into groups and ask them to create an entity relationship diagram similar to the one in Figure 7.5 for a company or product of their choice. If the students are uncomfortable with databases, you should recommend that they stick to a company similar to the TCCBCE, perhaps a snack food producer, mountain bike equipment producer, or even a footwear producer. If your students are more comfortable with databases, ask them to choose a company that would challenge them, such as a fast food restaurant, online book seller, or even a university’s course registration system. The important part of this exercise is for your students to begin to understand how the tables in a database relate. Be sure their ER diagrams include primary keys and foreign keys. Have your students present their ER diagrams to the class and ask the students to find any potential errors with the diagrams.

Reduced Redundancy Databases reduce information redundancy
Redundancy – the duplication of information or storing the same information in multiple places Inconsistency is one of the primary problems with redundant information One of the primary goals of a database is to eliminate information redundancy by recording each piece of information in only one place This is a good time to tie the discussion back to the material in the previous chapter, low quality information Recall what happens when a single customer is stored twice with different phone numbers, addresses, or order information in a single database

Increased Integrity (Quality)
Information integrity – measures the quality of information Integrity constraint – rules that help ensure the quality of information Relational integrity constraint – rule that enforces basic and fundamental information-based constraints Business-critical integrity constraint – rule that enforces business rules vital to an organization’s success and often requires more insight and knowledge than relational integrity constraints Can you define two relational integrity constraints for an ordering system? Users cannot create an order for a nonexistent customer An order cannot be shipped without an address Can you define two business-critical integrity constraints for an ordering system? Product returns are not accepted for fresh product 15 days after purchase A discount maximum of 20 percent

Increased Security Information is an organizational asset and must be protected Databases offer several security features including: Password – provides authentication of the user Access level – determines who has access to the different types of information Access control – determines types of user access, such as read-only access Why you would want to define access level security? Access levels will typically mimic the hierarchical structure of the organization and protect organizational information from being viewed and manipulated by individuals who should not have access to the sensitive or confidential information Low level employees typically have the lowest levels of access High level employees typically have access to all types of database information For example: You would not want analysts viewing all salary information for the entire company - in general: Analysts can usually only view their own salary Managers have higher access and can view the salaries of all their team members, but cannot view other managers’ salaries Directors can view all of their managers’ and analysts’ salaries, but not other directors’ salaries The CFO and CEO can view every employee’s salary

DATABASE MANAGEMENT SYSTEMS
Database management systems (DBMS) – software through which users and application programs interact with a database Discuss the two primary forms of user interaction with a database Direct interaction – The user interacts directly with the DBMS The DBMS obtains the information from the database Indirect interaction User interacts with an application (i.e., payroll application, manufacturing application, sales application) The application interacts with the DBMS

Data-Driven Web Sites Data-driven Web site – is an interactive Web site kept updated and relevant to the needs of its customers.

Data-Driven Web Site Advantages

Querying Data-Driven Web Sites

INTEGRATING DATA AMONG MULTIPLE DATABASES
Integration – allows separate systems to communicate directly with each other Forward integration – takes information entered into a given system and sends it automatically to all downstream systems and processes Backward integration – takes information entered into a given system and sends it automatically to all upstream systems and processes One of the biggest benefits of integration is that organizations only have to enter information into the systems once and it is automatically sent to all of the other systems throughout the organization This feature alone creates huge advantages for organizations because it reduces information redundancy and ensures accuracy and completeness Without integrations an organization would have to enter information into every single system that requires the information, from marketing and sales to billing and customer service For example, customer information would have to be manually entered into the marketing, sales, ordering, inventory, billing, and shipping databases. (Each of these systems are separate and would have their own database – if the company doesn’t have a complete ERP installed.) Entering the same customer information into multiple systems is redundant, and chances of making a mistake in one of the systems is high Integrations offer many advantages, but for the most part, the automated flow of information among separate systems is the biggest benefit

Forward and backward integration Identify the arrows along the top of the figure when explaining forward integrations Basically, all data flows forward along the business process Sales enters the data when it is negotiating the sale (looking for opportunities) The data is then passed to the order entry system when the order is actually placed The order fulfillment system picks the products from the warehouse, packs the products, labels boxes, etc. Once the order is filled and shipped, the customer is billed What would happen if users could enter order data directly into the billing system? The systems would quickly become out-of-sync. There might be bills for nonexistent orders, or orders that do not have any bills (if someone deleted a bill) For this reason organizations typically place a business-critical integrity constraint on integrated systems: With a forward integration the information must be entered in the sales system; you could not enter information directly into the billing system Integrations are expensive to build and maintain Integrations are difficult to implement For these reasons many organizations only build forward integrations and use business-critical integrity constraints to ensure all information is always entered only at the start of the integration (one source of record) Identify the arrows along the bottom of the figure when explaining backward integrations Basically, all information flows backward along the business process Billing enters information and this information is passed back to the order system The order fulfillment system passes the information back to the order entry system The order entry system passes the information back to the sales system Why would an organization want to build both forward and backward integrations? This allows users to enter information at any point in the business process and the information is automatically sent upstream and downstream to all other systems For example, if order fulfillment determined that they could not fulfill an order (the product had been discontinued), they could simply enter this information into the database and it would be sent automatically upstream to the sales representative who could contact the customer and downstream to billing to remove the item from the bill

Building a central repository specifically for integrated information The above figure displays an example of customer data integrated using this method Users can create, read, update, and delete in the main customer repository, and it is automatically sent to all of the other databases This method does not follow the business process when building the integrations Business-critical integrity constraints still need to be built to ensure information is only ever entered into the customer repository, otherwise the information will become out-of-sync

OPENING CASE QUESTIONS It Takes a Village to Write an Encyclopedia
Determine if an entry in Wikipedia is an example of transactional information or analytical information. What is the impact to Wikipedia if the information contained in its database is of low quality? Review the five common characteristics of high-quality information and rank them in order of importance. How is Wikipedia resolving the problem of poor information? Identify the different types of entities that might be stored in Wikipedia’s database. Why is database technology so important to Wikipedia’s business model? Determine if any entry in Wikipedia is an example of transactional information or analytical information? Answers will vary as there are examples of both types of information contained within Wikipedia although the majority of entries would constitute transactional information as they don’t contain what would be considered “all organizational information”. What is the impact of Wikipedia if the information contained in its database is of low quality? Displaying information that is incomplete information, considered incorrect, or when it doesn’t present any entry for a query will cause customers to switch to a different encyclopaedia. If Wikipedia’s search results were of low-quality, they would quickly lose business. Since providing encyclopaedia entries is Wikipedia’s primary line of business, it must display high-quality entries. Review the five common characteristics of high quality information and rank them in order of importance. Student answers to this question will vary depending on their personal views and experiences with technology. The important part of the answer is that students are justifying their order. Timeliness – Wikipedia’s information must be timely. If users are receiving old and outdated answers to their queries, they will not use Wikipedia for long. Accuracy – Wikipedia’s search results must be accurate Consistency – Wikipedia’s results must be consistent. Users will not trust the system if it provides different results for the same query Completeness – Wikipedia’s search results need to be complete; however, users understand that there could be more than one answer to a search result and are not anticipating that Wikipedia find and provide thousands of answers for each query Uniqueness – Wikipedia’s users expect to receive unique answers to their queries, not the same information listed over and over again How is Wikipedia resolving the problem of poor information? Information is monitored by editors and other users along with a system that requires people adding information to Wikipedia to register. Identify the different types of entities that might be stored in Wikipedia’s database. Entities may include: Subject Web Page Content References Editors Registered Users Why is database technology so important to Wikipedia’s business model? There may be some variety in the answers to this question but the answers should focus on the two key abilities that database technology delivers to Wikipedia. The first is the ability for users to make effective searches and decrease their search costs and the second are the advantages that a data-driven website gives Wikipedia.

SECTION 7.2 DATA WAREHOUSING

HISTORY OF DATA WAREHOUSING
Data warehouses extend the transformation of data into information In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functions The data warehouse provided the ability to support decision making without disrupting the day-to-day operations CLASSROOM OPENER GREAT BUSINESS DECISIONS – Bill Inmon – The Father of the Data Warehouse Bill Inmon, is recognized as the "father of the data warehouse" and co-creator of the "Corporate Information Factory." He has 35 years of experience in database technology management and data warehouse design. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for every major computing association and many industry conferences, seminars, and tradeshows. As an author, Bill has written about a variety of topics on the building, usage, and maintenance of the data warehouse and the Corporate Information Factory. He has written more than 650 articles, many of them published in major computer journals such as Datamation, ComputerWorld, DM Review and Byte Magazine. Bill currently publishes a free weekly newsletter for the Business Intelligence Network, and has been a major contributor since its inception.

DATA WAREHOUSE FUNDAMENTALS
Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes What is the primary difference between a database and data warehouse? The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repository Data warehouses support only analytical processing (OLAP)

Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse Data mart – contains a subset of data warehouse information The ETL process gathers data from the internal and external databases and passes it to the data warehouse The ETL process also gathers data from the data warehouse and passes it to the data marts

The data warehouse modeled in the above figure compiles information from internal databases or transactional/operational databases and external databases through ETL It then send subsets of information to the data marts through the ETL process Ask your students to distinguish between a data warehouse and a data mart? Ans: A data warehouse has an enterprise-wide organizational focus, while a data mart focuses on a subset of information for a given business unit such as finance

Multidimensional Analysis
Databases contain information in a series of two-dimensional tables In a data warehouse and data mart, information is multidimensional; it contains layers of columns and rows Dimension – a particular attribute of information Each layer in a data warehouse or data mart represents information according to an additional dimension Dimensions could include such things as: Products Promotions Stores Category Region Stock price Date Time Weather Why is the ability to look at information based on different dimensions critical to a business’s success? Ans: The ability to look at information from different dimensions can add tremendous business insight By slicing-and-dicing the information a business can uncover great unexpected insights

Multidimensional Analysis
Cube – common term for the representation of multidimensional information Users can slice and dice the cube to drill down into the information Cube A represents store information (the layers), product information (the rows), and promotion information (the columns) Cube B represents a slice of information displaying promotion II for all products at all stores Cube C represents a slice of information displaying promotion III for product B at store 2 CLASSROOM EXERCISE Analyzing Multiple Dimensions of Information Jump! is a company that specializes in making sports equipment, primarily basketballs, footballs, and soccer balls. The company currently sells to four primary distributors and buys all of its raw materials and manufacturing materials from a single vendor. Break your students into groups and ask them to develop a single cube of information that would give the company the greatest insight into its business (or business intelligence) given the following choices: Product A, B, C, and D Distributor X, Y, and Z Promotion I, II, and III Sales Season Date/Time Salespersons Karen and John Vendor Smithson Remember you can pick only 3 dimensions of information for the cube, they need to pick the best 3 Product Promotion These give the three most business-critical pieces of information

Information Cleansing or Scrubbing
An organization must maintain high-quality data in the data warehouse Information cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information What would happen if the information contained in the data warehouse was only about 70 percent accurate? Would you use this information to make business decisions? Is it realistic to assume that an organization could get to a 100% accuracy level on information contained in its data warehouse? No, it is too expensive

Contact information in an operational system Taking a look at customer information highlights why information cleansing and scrubbing is necessary Customer information exists in several operational systems In each system all details of this customer information could change from the customer ID to contact information Determining which contact information is accurate and correct for this customer depends on the business process that is being executed

Standardizing Customer name from Operational Systems Ask your students if they have ever received more than one piece of identical mail, such as a flyer, catalogue, or application If so, ask them why this might have occurred Could it have occurred because their name was in many different disparate systems? What is the cost to the business of sending multiple identical marketing materials to the same customers? Expense Risk of alienating customers

Information cleansing allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse

Accurate and complete information Why do you think most businesses cannot achieve 100% accurate and complete information? If they had to choose a percentage for acceptable information what would it be and why? Some companies are willing to go as low as 20% complete just to find business intelligence Few organizations will go below 50% accurate – the information is useless if it is not accurate Achieving perfect information is almost impossible The more complete and accurate an organization wants to get its information, the more it costs The tradeoff between perfect information lies in accuracy versus completeness Accurate information means it is correct, while complete information means there are no blanks Most organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete

BUSINESS INTELLIGENCE
Business intelligence – information that people use to support their decision-making efforts

BI information analysis

How BI can answer tough customer questions

OPERATIONAL, TACTICAL, AND STRATEGIC BI

OPERATIONAL, TACTICAL, AND STRATEGIC BI
The three forms of BI must work towards a common goal

BI’s Operational Value
The latency between a “business event” and an “action taken”

DATA MINING Data mining – the process of analyzing data to extract information not offered by the raw data alone To perform data mining users need data-mining tools Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behaviour and guide decision making Data mining can begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down), or the reverse (drilling up) Data-mining tools include query tools, reporting tools, multidimensional analysis tools, statistical tools, and intelligent agents Ask your students to provide an example of what an accountant might discover through the use of data-mining tools Ans: An accountant could drill down into the details of all of the expenses and revenues finding great business intelligence, including which employees are spending the most amount of money on long-distance phone calls and which customers are returning the most products Could the data warehousing team at Enron have discovered the accounting inaccuracies that caused the company to go bankrupt? If they did spot them, what should the team have done?

DATA MINING Common forms of data-mining analysis capabilities include:
Cluster analysis Association detection Statistical analysis Can you explain the difference between cluster analysis, association detection, and statistical analysis? Cluster analysis - a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the information Statistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysis Cluster analysis, association detection, and statistical analysis are covered in detail over the next few slides

Cluster Analysis Cluster analysis – a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible CRM systems depend on cluster analysis to segment customer information and identify behavioural traits Some examples of cluster analysis include: Consumer goods by content, brand loyalty or similarity Product market typology for tailoring sales strategies Retail store layouts and sales performances Corporate decision strategies using social preferences Control, communication, and distribution of organizations Industry processes, products, and materials Design of assembly line control functions Character recognition logic in OCR readers Data base relationships in management information systems

Cluster Analysis

Association Detection
Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the information Market basket analysis – analyzes such items as Web sites and checkout scanner information to detect customers’ buying behaviour and predict future behaviour by identifying affinities among customers’ choices of products and services Maytag uses association detection to ensure that each generation of appliances is better than the previous generation Maytag’s warranty analysis tool automatically detects potential issues, provides quick and easy access to reports, and performs multidimensional analysis on all warranty information

Statistical Analysis Statistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysis Forecast – predictions made on the basis of time-series information Time-series information – time-stamped information collected at a particular frequency Kraft uses statistical analysis to assure consistent flavour, colour, aroma, texture, and appearance for all of its lines of foods Kraft evaluates every manufacturing procedure, from recipe instructions to cookie dough shapes and sizes to ensure that the billions of Kraft products that reach consumers each year taste great (and the same) with every bite Nestle Italiana uses data mining and statistical analysis to determine production forecasts for seasonal confectionery products The company’s data-mining solution gathers, organizes, and analyzes massive volumes of information to produce powerful models that identify trends and predict confectionery sales

BUSINESS BENEFITS OF BI
Categories of BI benefits: Direct quantifiable benefits Indirect quantifiable benefits Unpredictable benefits Intangible benefits

OPENING CASE QUESTIONS It Takes a Village to Write an Encyclopedia
How could Wikipedia use a data warehouse to improve its business operations? Why must Wikipedia cleanse or scrub the information in its data warehouse? How could a company use information from Wikipedia to gain business intelligence? Choose one of the three common forms of data-mining analysis and explain how Wikipedia could use it to gain BI. How can Wikipedia use tactical, operational and strategic BI? How could Wikipedia use a data warehouse to improve its business operations? Wikipedia could use a data warehouse to contain not only internal organization information, but also external information such as market trends, competitor information, and industry trends. Wikipedia could then analyze its business across markets, among its competitors, and throughout different industries. Why must Wikipedia need to scrub and cleanse the information in its data warehouse? Wikipedia must maintain high-quality information in its data warehouse. Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Without high-quality information Wikipedia will be unable to make good business decisions. How could a company use information from Wikipedia to gain business intelligence. Wikipedia’s information could be used by a company to gain business intelligence by using Wikipedia as a source of basic information or knowledge for topics the company is interested in. This would be similar to how students today use Wikipedia for basic topic knowledge and as a starting point for their research or data-mining. Choose one of the three common forms of data mining analysis and explain how Wikipedia could use it to gain BI. Student answers will vary depending on the common form they choose to discuss. The key item to look for in their answers is that they have linked the technique to what Wikipedia does with its data. The most common answer will likely be association detection. How can Wikipedia use tactical, operational, and strategic BI? Wikipedia can use different types of BI to achieve different terms of strategic and operational goals. Examples on how Wikipedia could use each are listed below. Operational BI would focus on keeping Wikipedia operations (databases, servers, etc) in synch with the loads from users. Tactical BI would be focused more on keeping the information and data that is Wikipedia’s product up to date and meeting the needs of users Strategic BI would be more focused on how to keep Wikipedia relevant to users over the long term.

CLOSING CASE ONE Scouting for Quality
Explain the importance of high-quality information for Scouts Canada. Review the five common characteristics of high quality information and rank them in order of importance for Scouts Canada. How could data warehouses and data marts be used to help Scouts Canada improve the efficiency and effectiveness of its operations? 1. Explain the importance of high-quality information for Scouts Canada. If national office receives low quality information from Scouting groups then national office does not know which scouts are covered by insurance, creating a significant liability risk to Scouting Canada. 2. Review the five common characteristics of high quality information and rank them in order of importance for Scouts Canada. Student answers to this question will vary depending on their personal views and experiences with technology. The important part of the question is understanding the student’s justifications for their order. Potential order of importance: Timeliness – Without timely information the department cannot update insurance properly Accuracy – inaccurate information will lead to scouts not being insured and national office not knowing how many members there are across Canada Completeness – incomplete information will make it harder for national office to make decisions regarding administration. Incomplete information probably occurs frequently since part of the process, maintaining national records, is performed manually Consistency – information inconsistency probably occurs since membership data entry is performed manually Uniqueness – scouting membership data can be entered in duplicate 3. How could data warehouses and data marts be used to help Scouting Canada improve the efficiency and effectiveness of its operations? A data warehouse is a logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks. The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository in such a way that employees can make decisions and undertake business analysis activities. Therefore, while databases store the details of all transactions (for instance, the membership of a new boy scout) and events (accepting a new Scout leader), data warehouses store that same information but in an aggregated form more suited to supporting decision making tasks. Aggregation, in this instance, can include totals, counts, averages, and the like. Scouting Canada could use a data warehouse to track all of its information, allowing the organization to make informed decisions with all possible variables. The data warehouse sends subsets of the information to data marts. A data mart contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts having focused information subsets particular to the needs of a given business unit such as finance or production and operations. The organization could use data marts to monitor small subsets of information.

CLOSING CASE ONE Scouting for Quality
What kinds of data marts might Scouting Canada want to build to help it analyze its operational performance? Do the managers at Scouting Canada actually have all of the information they require to make an accurate decision? Explain the statement “it is never possible to have all of the information required to make the best decision possible.” 4. What two data marts might Scouting Canada want to build to help it analyze its operational performance? The organization might have a data mart for: Quarterly membership changes Scout leader certifications Insurance premium changes 5. Do the administrators at Scouting Canada actually have all of the information they require to make an accurate decision? Explain the statement “it is never possible to have all of the information required to make the best decision possible.” No, the administrators at Scouting Canada will never have every single piece of information. It would be almost impossible to attend every scouting meeting and count every scout. If you wait to have every single piece of information you would probably never make a decision. We typically receive enough information to make an accurate decision. Of course, the more information you have, the better the decision you can make, but if you wait to get every piece of information you will take too long to make the decision.

CLOSING CASE TWO Google
How did the Web site RateMyProfessor.com solve its problem of low-quality information? Review the five common characteristics of high-quality information and rank them in order of importance to Google’s business. What would be the ramifications of Google’s business if the search information it presented to its customers was of low quality? Describe the different types of databases. Why should Google use a relational database? Identify the different types of entities, entity classes, attributes, keys, and relationships that might be stored in Google’s AdWords relational database. How did the Web site RateMyProfessors.com solve its problem of low-quality information? The developers of the Web site turned to Google’s API to create an automatic verification tool. If Google finds enough mentions in conjunction with a new professor or university to be added to the database, then it considers the information valid and posts it to the Web site. Review the five common characteristics of high-quality information and rank them in order of importance to Google’s business. Student answers to this question will vary depending on their personal views and experiences with technology. The important part of the question is understanding the students’ justifications for their order. Potential order of importance: Timeliness – Google’s information must be timely. If users are receiving old and outdated answers to their queries, they will not use Google for long. Accuracy – Google’s search results must be accurate Consistency – Google’s results must be consistent. Users will not trust the system if it provides different results for the same query Completeness – Google’s search results need to be complete; however, users understand that there could be thousands of answers to a search result and are not anticipating that Google find and provide thousands of answers for each query Uniqueness – Google’s users expect to receive unique answers to their queries, not the same search site listed over and over again What would be the ramifications to Google’s business if the search information it presented to its customers was of low quality? Displaying links that do not work, links that have nothing to do with the query, or duplication of links will cause customers to switch to a different search engine. If Google’s search results were of low-quality, they would quickly lose business. Since providing search results is Google’s primary line of business, it must display high-quality search results. Describe the different types of databases. Why should Google use a relational database? There are many different models for organizing information in a database, including the hierarchical database, network database, and the most prevalent—the relational database model. In a hierarchical database model, information is organized into a tree-like structure that allows repeating information using parent/child relationships, in such a way that it cannot have too many relationships. Hierarchical structures were widely used in the first mainframe database management systems. However, owing to their restrictions, hierarchical structures often cannot be used to relate to structures that exist in the real world. The network database model is a flexible way of representing objects and their relationships. Where the hierarchical model structures information as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a lattice structure. The relational database model is a type of database that stores information in the form of logically related two-dimensional tables. The relational database model stores information in the form of logically related two-dimensional tables. Entities, entity classes, attributes, primary keys, and foreign keys are all fundamental concepts included in the relational database model. Identify the different types of entity, entity classes, attributes, keys, and relationships that might be stored in Google’s Adwords relational database. Entity classes could include: DOCUMENT TITLE SEARCH TERM WORD LOCATION WEB PAGE Attributes could include: Author Title Key words Category Web site location Lowest bid Highest bid Total hits Each table would need to define a primary key and could include: Document ID Search item ID Location ID Company ID The tables in the database would have 1-to-1 relationships, 1-to-many relationships, and many-to-many relationships. If you are planning on having your students design and build an ERD please review the associated Access and Database Technology Plug-Ins.

CLOSING CASE TWO Google
How could Google use a data warehouse to improve its business operations? Why would Google need to scrub and cleanse the information in its data warehouse? Identify a data mart that Google’s marketing and sales department might use to track and analyze its AdWords revenue. How could Google use a data warehouse to improve its business operations? Google could use a data warehouse to contain not only internal organization information, but also external information such as market trends, competitor information, and industry trends. Google could then analyze its business across markets, among its competitors, and throughout different industries. Why would Google need to scrub and cleanse the information in its data warehouse? Google must maintain high-quality information in its data warehouse. Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Without high-quality information Google will be unable to make good business decisions. Identify a data mart that Google’s marketing and sales department might use to track and analyze its AdWords revenue. One potential data mart might include information broken down by industry (products, telecommunications, health care, energy, travel, human services) and tracked against revenue by companies. This would tell Google which industries are using AdWords and which industries are untapped. It would also tell Google which customers in each industry are taking advantage of AdWords and perhaps would benefit from a specialized marketing plan, and which customers are not yet taking advantage of AdWords and might be interested in learning about the product.

CLOSING CASE THREE Harrah’s
Identify the effects poor information might have on Harrah’s service-oriented business strategy How does Harrah’s use database technologies to implement its service-oriented strategy? Harrah’s was one of the first casino companies to find value in offering rewards to customers who visit multiple Harrah’s locations. Describe the effects on the company if it did not build any integrations among the databases located at each of its casinos. How could Harrah’s use distributed databases or a data warehouses to synchronize customer information? 1. Identify the effects low-quality information might have on Harrah’s service-oriented business strategy Using the wrong information can lead to making the wrong decision. Making the wrong decision can cost time, money, and even reputations. Business decisions are only as good as the information used to make the decision. Low-quality information leads to low-quality business decisions. High-quality information can significantly improve the chances of making a good business decision and directly affect an organization’s bottom line. Harrah’s must use high-quality information whenever it is making business decisions, especially decisions that affect its service-oriented business strategy. 2. How does Harrah’s use database technologies to implement its service-oriented strategy? Harrah’s implements a service-oriented strategy called Total Rewards. Total Rewards allows Harrah’s to give every single customer the appropriate amount of personal attention, whether it’s leaving sweets in the hotel room or offering free meals. Total Rewards works by providing each customer with an account and a corresponding card that the player swipes each time he or she plays a casino game. The program collects information, via a database, on the amount of time the customers gamble, their total winnings and losses, and their betting strategies. Customers earn points based on the amount of time they spend gambling, which they can then exchange for comps such as free dinners, hotel rooms, tickets to shows, and even cash. 3. Harrah’s was one of the first casino companies to find value in offering rewards to customers who visit multiple Harrah’s locations. Describe the effects on the company if it did not build any integrations among the databases located at each of its casinos. How could Harrah’s use distributed databases or a data warehouse to synchronize customer information? Without database integration among its hotels and casinos, Harrah’s would be unable to determine what a customer’s true value is to the company. For example, a customer that spends $500,000 dollars at one casino might be treated like royalty. This same customer could visit another Harrah’s location, but since the information is not integrated, the new location would have no idea that they had a high-rolling customer on the premises and they might not treat the customer accordingly. Distributed databases or a data warehouse could be used to help make this data centrally available with a higher degree of data quality.

Estimate the potential impact to Harrah’s business if there is a security breach in its customer information. Identify three different types of data marts Harrah’s might want to build to help it analyze its operational performance. 4. Estimate the potential impact to Harrah’s business if there is a security breach in its customer information Some customers have concerns regarding Harrah’s information collection strategy since they want to keep their gambling information private. If there was a security violation and sensitive customer information was compromised Harrah’s would risk losing its customers’ trust and their business. 5. Identify three different types of data marts Harrah’s might want to build to help it analyze its operational performance Answers to this question will vary. Potential answers include (1) customers’ spending habits across properties, (2) repeat customer spending habits at a single location, (3) dealer sales at a location and across locations.

What might occur if Harrah’s fails to clean or scrub its information before loading it into its data warehouse? Describe cluster analysis, association detection, and statistical analysis and explain how Harrah’s could use each one to gain insights into its business. 6. What might occur if Harrah’s fails to clean or scrub its information before loading it into its data warehouse? Harrah’s must maintain high quality information in its data warehouse. Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Without high quality information Harrah’s will be unable to make good business decisions and operate its service-oriented strategy. Potential business effects resulting from low quality information include: Inability to accurately track customers Difficulty identifying valuable customers Inability to identify selling opportunities Marketing to nonexistent customers Difficulty tracking revenue due to inaccurate invoices Inability to build strong customer relationships – which increases buyer power 7. Describe cluster analysis, association detection, and statistical analysis and explain how Harrah’s could use each one to gain insights into its business. Cluster analysis is a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible. Cluster analysis is frequently used to segment customer information for customer relationship management systems to help organizations identify customers with similar behavioural traits, such as clusters of best customers or one-time customers. Cluster analysis also has the ability to uncover naturally occurring patterns in information. Association detection reveals the degree to which variables are related and the nature and frequency of these relationships in the information. Statistical analysis performs such functions as information correlations, distributions, calculations, and variance analysis, just to name a few. Harrah’s can use all of the above to uncover customer patterns to ensure it is taking advantage of customer relationship management strategies with its customers. It could also use the tools to uncover patterns in food, drink, and room availability to optimize its supply chain.

Opening Case: It Takes a Village to Write an Encyclopedia

Similar presentations

Presentation on theme: "Opening Case: It Takes a Village to Write an Encyclopedia"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Opening Case: It Takes a Village to Write an Encyclopedia

Similar presentations

Presentation on theme: "Opening Case: It Takes a Village to Write an Encyclopedia"— Presentation transcript:

Similar presentations

About project

Feedback