Chapter 9: Data Warehousing

Slides:



Advertisements
Similar presentations
Chapter 11: Data Warehousing
Advertisements

MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Accessing Organizational Information—Data Warehouse
Chapter 13 The Data Warehouse.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc. All rights reserved. 8-1 BUSINESS DRIVEN TECHNOLOGY Chapter Eight: Viewing and Protecting Organizational.
Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified.
Dr. Chen, Data Base Management Chapter 10: Data Quality and Integration Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga.
Chapter 11: Data Warehousing
© 2007 by Prentice Hall 1 Chapter 11: Data Warehousing Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Business Driven Technology Unit 2
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Chapter 1: Data Warehousing
Designing a Data Warehouse
Chapter 4 Data Warehousing.
CHAPTER 08 Accessing Organizational Information – Data Warehouse
Data Warehousing.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 8 Accessing Organizational Information – Data Warehouse.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 9: data warehousing
MBA 664 Database Management Systems Dave Salisbury ( )
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
BUS1MIS Management Information Systems Semester 1, 2012 Week 6 Lecture 1.
1 Data Warehouses BUAD/American University Data Warehouses.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
1 Data Warehousing. 2Definition Data Warehouse Data Warehouse: – A subject-oriented, integrated, time-variant, non- updatable collection of data used.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Chapter 9: data warehousing
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Chapter 3 Databases and Data Warehouses: Building Business Intelligence Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Data Warehousing.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Carnegie Mellon University © Robert T. Monroe Management Information Systems Data Warehousing Management Information Systems Robert.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
 Definition of terms  Reasons for need of data warehousing  Describe three levels of data warehouse architectures  Describe two components of star.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 14: Data Warehousing Modern Database Management 9 th Edition Jeffrey A. Hoffer, Mary.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Lecture 12: Data Quality and Integration
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
1 HCMC UT, 2008 Data Warehousing 1.Basic Concepts of data warehousing 2.Data warehouse architectures 3.Some characteristics of data warehouse data 4.The.
CHAPTER SIX DATA Business Intelligence
Chapter 13 The Data Warehouse
Summarized from various resources Modern Database Management
Chapter 11: Data Warehousing
Data Warehouse.
CHAPTER SIX OVERVIEW SECTION 6.1 – DATABASE FUNDAMENTALS
Data Warehouse.
Data Warehousing Concepts
Presentation transcript:

Chapter 9: Data Warehousing Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA 99258 chen@jepson.gonzaga.edu

Objectives Definition of terms Reasons for information gap between information needs and availability Reasons for need of data warehousing Describe three levels of data warehouse architectures (ETL) Describe two components of star schema Estimate fact table size Design a data mart Develop requirements for a data mart OLAP, data mining and its applications

A Solution to the Information Gap A solution to bridging the information gap is the ______ _________ which consolidate and integrate information from many different sources and arrange it in a meaningful format for making accurate business decisions. data warehouses

Two issues need to know about D.W. 1. A major factor drives the need for data warehousing Businesses need an integrated view of company information. 2. Which of the following organizational trends does not encourage the need for data warehousing? a) Multiple, nonsynchronized systems b) Focus on customer relationship management c) Downsizing d) Focus on supplier relationship management Answer: ______________ #2 answer (downsizing) Downsizing

Need for Data Warehousing Integrated, company-wide view of high-quality information (from disparate databases) Separation of operational and informational systems and data (for improved performance) Table 9-1 – Comparison of Operational and Informational Systems

DATA WAREHOUSE FUNDAMENTALS Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes What is the primary difference between a database and data warehouse? The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repository Data warehouses support only analytical processing (OLAP)

Definition Data Warehouse: Data Mart: A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Subject-oriented: e.g. customers, patients, students, products DW is organized around key high-level entities of the enterprise Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources Time-variant: Can study trends and changes data in the warehouse contain a time dimension so that they may be used to study trends and changes. Non-updatable: Read-only, periodically refreshed Data Mart: A data warehouse that is limited in scope contains a subset of data warehouse information

History Leading to Data Warehousing Improvement in database technologies, especially relational DBMSs Advances in computer hardware, including mass storage and parallel architectures Emergence of end-user computing with powerful interfaces and tools Advances in middleware, enabling heterogeneous database connectivity Recognition of difference between operational and informational systems

Need for Data Warehousing Integrated, company-wide view of high-quality information (from disparate databases) Separation of operational and informational systems and data (for improved performance)

Issues with Company-Wide View Inconsistent key structures Synonyms Free-form vs. structured fields Inconsistent data values Missing data See figure 9-1 for example

Figure 9-1 Examples of heterogeneous data

Database vs. Datawarehouse DBMS Data Warehouse ???

Database vs. Datawarehouse DBMS Data Warehouse Data Mining

Database vs. Datawarehouse DBMS Datawarehouse ???

Data Marts and the Data Warehouse Legacy systems feed data to the warehouse. The warehouse feeds specialized information to departments (data marts). Operational Data Store Legacy Systems Finance Data Mart Accounting Marketing Sales ETL Organizational Data Warehouse ETL

The Data Mart is More Specialized Organizational Data Warehouse Corporate Highly granular data Normalized design Robust historical data Large data volume Data Model driven data Versatile General purpose DBMS technologies The data mart serves the needs of one business unit, not the organization. Sales Data Mart Finance Data Mart Marketing Data Mart ETL Accting Data Mart Organizational Data Warehouse Data Marts Departmentalized Summarized, aggregated data Star join design Limited historical data Limited data volume Requirements driven data Focused on departmental needs Multi-dimensional DBMS technologies

Organizational Trends Motivating Data Warehouses No single system of records Multiple systems not synchronized Organizational need to analyze activities in a balanced way Customer relationship management Supplier relationship management

Separating Operational and Informational Systems Operational system – a system that is used to run a business in real time, based on current data; also called a system of record Informational system – a system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications

Position of the Data Warehouse Within the Organization

DATA WAREHOUSE FUNDAMENTALS (cont.) Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse The ETL process gathers data from the internal and external databases and passes it to the data warehouse The ETL process also gathers data from the data warehouse and passes it to the data marts

Data Warehouse Architectures Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and Real-Time Data Warehouse Three-Layer architecture All involve some form of extraction, transformation and loading (ETL)

Figure 9-2 Independent data mart data warehousing architecture Data marts: Mini-warehouses, limited in scope E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts 23

Figure 9-3 Dependent data mart with operational data store: a three-level architecture ODS provides option for obtaining current data E T L Single ETL for enterprise data warehouse (EDW) Dependent data marts loaded from EDW Simpler data access 24

Figure 9-4 Logical data mart and real time warehouse architecture ODS and data warehouse are one and the same Data marts are NOT separate databases, but logical views of the data warehouse  Easier to create new data marts E T L Near real-time ETL for Data Warehouse 25

The ETL Process – another perspective and example Capture/Extract - E Scrub or data cleansing Transform - T Load and Index - L ETL = Extract, transform, and load

Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Static extract = capturing a snapshot of the source data at a point in time Incremental extract = capturing changes that have occurred since the last static extract

Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

Record-level: Field-level: Selection – data partitioning Transform = convert data from format of operational system to format of data warehouse Record-level: Selection – data partitioning Joining – data combining Aggregation – data summarization Field-level: single-field – from one field to one field multi-field – from many fields to one, or one field to many

Refresh mode: bulk rewriting of target data at periodic intervals Load/Index= place transformed data into the warehouse and create indexes Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data warehouse

Information Cleansing or Scrubbing An organization must maintain high-quality data in the data warehouse Information cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information This is a an excellent time to return to the information learned in Chapter 6 on high-quality and low-quality information What would happen if the information contained in the data warehouse was only about 70 percent accurate? Would you use this information to make business decisions? Is it realistic to assume that an organization could get to a 100% accuracy level on information contained in its data warehouse? No, it is too expensive

Information Cleansing or Scrubbing Standardizing Customer name from Operational Systems Ask your students if they have ever received more than one piece of identical mail, such as a flyer, catalog, or application If so, ask them why this might have occurred Could it have occurred because their name was in many different disparate systems? What is the cost to the business of sending multiple identical marketing materials to the same customers? Expense Risk of alienating customers Pat (or Patti) Burton information was entered in different ways and saved in different operational systems (i.e., Sales, Customer Service and Billing). They are, therefore, cleansed by a ‘Cleaning’ software and the cleaned/accurate information was saved in the Customer Information. They should be created and saved in a single repository (DB) and in a single/consistent form

Information Cleansing or Scrubbing Information cleansing allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse

Information Cleansing or Scrubbing Accurate and complete information Why do you think most businesses cannot achieve 100% accurate and complete information? If they had to choose a percentage for acceptable information what would it be and why? Some companies are willing to go as low as 20% complete just to find business intelligence Few organizations will go below 50% accurate – the information is useless if it is not accurate Achieving perfect information is almost impossible The more complete and accurate an organization wants to get its information, the more it costs The tradeoff between perfect information lies in accuracy verses completeness Accurate information means it is correct, while complete information means there are no blanks Most organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete

Representation of Data in DW Dimensional Modeling – a retrieval-based system that supports high-volume query access Not only accommodate but also boost the processing of complex multidimensional queries. Two means 1. ______schema – the most commonly used and the simplest style of dimensional modeling Contain a fact table surrounded by and connected to several dimension tables Fact table contains the descriptive attributes (numerical values) needed to perform decision analysis and query reporting, and foreign keys are used to link to dimension table. Dimension tables contain classification and aggregation information about the values in the fact table (i.e., attributes describing the data contained within the fact table). 2. ___________ schema – an extension of star schema where the diagram resembles a snowflake in shape Star Snowflakes

Fact Table vs. Dimensional Table Many to Many Relationship (M:N) cpk pk fk fk Dimensional Table Fact Table

Figure 9-5 Three-layer data architecture for a data warehouse

Data Characteristics Status vs. Event Data Figure 9-6 Example of DBMS log entry Status Event = a database action (create/ update/ delete) that results from a transaction

Data Characteristics Transient vs. Periodic Data Figure 9-7 Transient operational data With transient data, changes to existing records are written over previous records, thus destroying the previous data content

Data Characteristics Transient vs. Periodic Data Figure 9-8 Periodic warehouse data Periodic data are never physically altered or deleted once they have been added to the store

Other Data Warehouse Changes New descriptive attributes New business activity attributes New classes of descriptive attributes Descriptive attributes become more refined Descriptive data are related to one another New source of data

Data Reconciliation Typical operational data is: Transient – not historical Not normalized (perhaps due to denormalization for performance) Restricted in scope – not comprehensive Sometimes poor quality – inconsistencies and errors After ETL, data should be: Detailed – not summarized yet Historical – periodic Normalized – 3rd normal form or higher Comprehensive – enterprise-wide perspective Timely – data should be current enough to assist decision-making Quality controlled – accurate with full integrity

Derived Data Objectives Characteristics Ease of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Ad-hoc query support Data mining capabilities Characteristics Detailed (mostly periodic) data Aggregate (for summary) Distributed (to departmental servers) Most common data model = star schema (also called “dimensional model”)

Figure 9-9 Components of a star schema Fact tables contain factual (descriptive) or quantitative data (numerical values) 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Dimension tables contain descriptions about the subjects of the business (values in the fact table) Excellent for ad-hoc queries, but bad for online transaction processing

Figure 9-10 Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions

Figure 9-11 Star schema with sample data

Surrogate Dimension Keys Dimension table keys should be surrogate (non-intelligent and non-business related), because: Business keys may change over time Helps keep track of nonkey attribute values for a given production key Surrogate keys are simpler and shorter Surrogate keys can be same length and format for all keys

Grain of the Fact Table Granularity of Fact Table–what level of detail do you want? Transactional grain–finest level Aggregated grain–more summarized Finer grains  better market basket analysis capability Finer grain  more dimension tables, more rows in fact table In Web-based commerce, finest granularity is a click

Duration of the Database Natural duration–13 months or 5 quarters Financial institutions may need longer duration Older data is more difficult to source and cleanse

Size of Fact Table Depends on the number of dimensions and the grain of the fact table Number of rows = product of number of possible values for each dimension associated with the fact table Example: assume the following for Figure 9-11: Total rows calculated as follows (assuming only half the products record sales for a given month):

Break ! (Ch. 9) Exercise # 5 – a, b, c (p. 422) With the following assumptions: The length of a fiscal period is one month The data mart will contain five years of historical data 3. Approximately 5 percent of the policies experience some type of change each month 4. There are 8 fields in each record (row) HW #3 (p.422) – a, b, c Assume one professor per course section ALL computations for b & c should be shown to get credits .

Figure 9-12 Modeling dates Fact tables contain time-period data  Date dimensions are important

Variations of the Star Schema Multiple Facts Tables Can improve performance Often used to store facts for different combinations of dimensions Conformed dimensions Factless Facts Tables No nonkey data, but foreign keys for associated dimensions Used for: Tracking events Inventory coverage

Figure 9-13 Conformed dimensions Two fact tables  two (connected) start schemas. Conformed dimension Associated with multiple fact tables

Figure 9-14a Factless fact table showing occurrence of an event No data in fact table, just keys associating dimension records Fact table forms an n-ary relationship between dimensions

Normalizing Dimension Tables Multivalued Dimensions Facts qualified by a set of values for the same business subject Normalization involves creating a table for an associative entity between dimensions Hierarchies Sometimes a dimension forms a natural, fixed depth hierarchy Design options Include all information for each level in a single denormalized table Normalize the dimension into a nested set of 1:M table relationships

Figure 9-15 Multivalued dimension Helper table is an associative entity that implements a M:N relationship between dimension and fact.

Figure 9-16 Fixed product hierarchy Dimension hierarchies help to provide levels of aggregation for users wanting summary information in a data warehouse.

Slowly Changing Dimensions (SCD) Need to maintain knowledge of the past One option: for each changing attribute, create a current value field and many old-valued fields (multivalued) Better option: create a new dimension table row each time the dimension object changes, with all dimension characteristics at the time of change

Figure 9-18 Example of Type 2 SCD Customer dimension table The dimension table contains several records for the same customer. The specific customer record to use depends on the key and the date of the fact, which should be between start and end dates of the SCD customer record.

Figure 9-19 Dimension segmentation For rapidly changing attributes (hot attributes), Type 2 SCD approach creates too many rows and too much redundant data. Use segmentation instead.

10 Essential Rules for Dimensional Modeling Use atomic facts Create single-process fact tables Include a date dimension for each fact table Enforce consistent grain Disallow null keys in fact tables Honor hierarchies Decode dimension tables Use surrogate keys Conform dimensions Balance requirements with actual data

Other Data Warehouse Advances Columnar databases Issue of Big Data (huge volume, often unstructured) Columnar databases optimize storage for summary data of few columns (different need than OLTP) Data compression Sybase, Vertica, Infobright, NoSQL “Not only SQL” Deals with unstructured data MongoDB, CouchDB, Apache Cassandra

The User Interface Metadata (data catalog) Identify subjects of the data mart Identify dimensions and facts Indicate how data is derived from enterprise data warehouses, including derivation rules Indicate how data is derived from operational data store, including derivation rules Identify available reports and predefined queries Identify data analysis techniques (e.g. drill-down) Identify responsible people

Online Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Relational OLAP (ROLAP) Traditional relational representation Multidimensional OLAP (MOLAP) Cube structure OLAP Operations Cube slicing–come up with 2-D view of data Drill-down–going from summary to more detailed views

Multidimensional Analysis Databases contain information in a series of two-dimensional tables In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows Dimension – a particular attribute of information Each layer in a data warehouse or data mart represents information according to an additional dimension Dimensions could include such things as: Products Promotions Stores Category Region Stock price Date Time Weather Why is the ability to look at information based on different dimensions critical to a businesses success? Ans: The ability to look at information from different dimensions can add tremendous business insight By slicing-and-dicing the information a business can uncover great unexpected insights

Figure 9-21 Slicing a data cube REGION Hoffer’s text (chapter 11) CUSTOMER

Multidimensional Analysis Cube – common term for the representation of multidimensional information Users can slice and dice the cube to drill down into the information Cube A represents store information (the layers), product information (the rows), and promotion information (the columns) Cube B represents a slice of information displaying promotion II for all products at all stores Cube C represents a slice of information displaying promotion III for product B at store 2 CLASSROOM EXERCISE Analyzing Multiple Dimensions of Information Jump! is a company that specializes in making sports equipment, primarily basketballs, footballs, and soccer balls. The company currently sells to four primary distributors and buys all of its raw materials and manufacturing materials from a single vendor. Break your students into groups and ask them to develop a single cube of information that would give the company the greatest insight into its business (or business intelligence) given the following choices: Product A, B, C, and D; 2. Distributor X, Y, and Z 3. Promotion I, II, and III; 4. Sales; 5. Season; 6. Date/Time 7. Salesperson Karen and John; 8 Vendor Smithson Remember you can pick only 3 dimensions of information for the cube, they need to pick the best 3 Product Sales Promotion These give the three most business-critical pieces of information

Figure 9-22: Example of drill-down Summary report Starting with summary data, users can obtain details for particular cells Drill-down with color added

Business Performance Mgmt (BPM) Figure 9-25 Sample Dashboard BPM systems allow managers to measure, monitor, and manage key activities and processes to achieve organizational goals. Dashboards are often used to provide an information system in support of BPM. Charts like these are examples of data visualization, the representation of data in graphical and multimedia formats for human analysis.

OLAP and its Applications What software and function that enable you to create OLAP and its applications? ANSWER EXCEL with Pivot Table

Multidimensional Analysis Data mining – the process of analyzing data to extract information not offered by the raw data alone To perform data mining users need data-mining tools Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making An example Grocery Store in UK (see next slide) Data mining can begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down), or the reverse (drilling up) Data-mining tools include query tools, reporting tools, multidimensional analysis tools, statistical tools, and intelligent agents Ask your students to provide an example of what an accountant might discover through the use of data-mining tools Ans: An accountant could drill down into the details of all of the expense and revenue finding great business intelligence including which employees are spending the most amount of money on long-distance phone calls to which customers are returning the most products Could the data warehousing team at Enron have discovered the accounting inaccuracies that caused the company to go bankrupt? If the did spot them, what should the team have done?

CRM and Data Mining (BI)Example A Grocery store in U.K. with the following “patterns” found: Every Thursday afternoon Young Fathers (why?) shopping at store Two of the followings are always included in their shopping list Diapers and Beers What other decisions should be made as a store manager (in terms of store layout)? Short term vs. Long term This is an example of cross-selling Other types of promotion: up-sell, bundled-sell IT (e.g., BI) helps to find valuable information then decision makers make a timely/right decision for improving/creating competitive advantages.

More on OLTP vs. OLAP The figure depicts a relational database environment with two tables. The first table contains information about pet owners; the second, information about pets. The tables are related by the single column they have in common: Owner_ID. By relating tables to one another, we can reduce ____________ of data and improve database performance. The process of breaking tables apart and thereby reducing data redundancy is called _______________. redundancy Fig. Extra-a: A simple database with a relation between two tables. normalization

OLTP vs. OLAP (cont.) Most relational databases which are designed to handle a high number of reads and writes (updates and retrievals of information) are referred to as ________ (OnLine Transaction Processing) systems. OLTP systems are very efficient for high volume activities such as cashiering, where many items are being recorded via bar code scanners in a very short period of time. However, using OLTP databases for analysis is generally not very efficient, because in order to retrieve data from multiple tables at the same time, a query containing ________ must be used. OLTP joins

OLTP vs. OLAP (cont.) In order to keep our transactional databases running quickly and smoothly, we may wish to create a data warehouse. A data warehouse is a type of large database (including both current and historical data) that has been _____________ and archived. Denormalization is the process of intentionally combining some tables into a single table in spite of the fact that this may introduce duplicate data in some columns. denormalized Fig. Extra-b: A combination of the tables into a single dataset. The figure depicts what our simple example data might look like if it were in a data warehouse. When we design databases in this way, we reduce the number of joins necessary to query related data, thereby speeding up the process of analyzing our data. Databases designed in this manner are called __________ (OnLine Analytical Processing) systems. OLAP

OLTP vs. OLAP (cont.) Transactional systems and analytical systems have conflicting purposes when it comes to database speed and performance. For this reason, it is difficult to design a single system which will serve both purposes. This is why data warehouses generally contain archived data. Archived data are data that have been copied out of a transactional database. Denormalization typically takes place at the time data are copied out of the transactional system. It is important to keep in mind that if a copy of the data is made in the data warehouse, the data may become out-of-______ . This happens when a copy is made in the data warehouse and then later, a change to the original record is made in the source database. Data mining activities performed on out-of-synch records may be useless, or worse, misleading. An alternative archiving method would be to move the data out of the transactional system. This ensures that data won’t get out-of-synch, however, it also makes the data unavailable should a user of the transactional system need to view or update it. synch

Data Mining Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Goals: Explain observed events or conditions Confirm hypotheses Explore data for new or unexpected relationships

DATA MINING Data-mining software includes many forms of AI such as neural networks and expert systems Data-mining tools apply algorithms to information sets to uncover inherent trends and patterns in the information Analysts use this information to develop new business strategies and business solutions Ask your students to identify an organization that would “not” benefit from investing in data warehousing and data-mining tools Ans: None CLASSROOM EXERCISE Analyzing Multiple Dimensions of Information Jump! is a company that specializes in making sports equipment, primarily basketballs, footballs, and soccer balls. The company currently sells to four primary distributors and buys all of its raw materials and manufacturing materials from a single vendor. Break your students into groups and ask them to develop a single cube of information that would give the company the greatest insight into its business (or business intelligence). Product A, B, C, and D Distributor X, Y, and Z Promotion I, II, and III Sales Season Date/Time Salesperson Karen and John Vendor Smithson

Data Mining Examples A telephone company used a data mining tool to analyze their customer’s data warehouse. The data mining tool found about 10,000 supposedly residential customers that were expending over $1,000 monthly in phone bills. After further study, the phone company discovered that they were really small business owners trying to avoid paying business rates *

Data Mining Examples (cont.) 65% of customers who did not use the credit card in the last six months are 88% likely to cancel their accounts. If age < 30 and income <= $25,000 and credit rating < 3 and credit amount > $25,000 then the minimum loan term is 10 years. 82% of customers who bought a new TV 27" or larger are 90% likely to buy an entertainment center within the next 4 weeks.

Sustainable Competitive Advantages Any sustainable competitive advantages? How can an organization sustain its competitive advantage? Firms may create/improve their competitive advantages only if they: have to learn, employ approach, capacity revenue management learning to learn and learning to change (life-long learning environment)

BUSINESS INTELLIGENCE Business intelligence – information that people use to support their decision-making efforts Principle BI enablers include: Technology People Culture Technology Even the smallest company with BI software can do sophisticated analyses today that were unavailable to the largest organizations a generation ago. The largest companies today can create enterprisewide BI systems that compute and monitor metrics on virtually every variable important for managing the company. How is this possible? The answer is technology—the most significant enabler of business intelligence. People Understanding the role of people in BI allows organizations to systematically create insight and turn these insights into actions. Organizations can improve their decision making by having the right people making the decisions. This usually means a manager who is in the field and close to the customer rather than an analyst rich in data but poor in experience. In recent years “business intelligence for the masses” has been an important trend, and many organizations have made great strides in providing sophisticated yet simple analytical tools and information to a much larger user population than previously possible. Culture A key responsibility of executives is to shape and manage corporate culture. The extent to which the BI attitude flourishes in an organization depends in large part on the organization’s culture. Perhaps the most important step an organization can take to encourage BI is to measure the performance of the organization against a set of key indicators. The actions of publishing what the organization thinks are the most important indicators, measuring these indicators, and analyzing the results to guide improvement display a strong commitment to BI throughout the organization.

Working , Not Harder Smarter Overlapping Human/Organizational (Culture, Process)/ Technological factors in BI/KM: PEOPLE ORGANIZATIONAL PROCESSES Overlapping Human/Organizational/ Technological factors in KM: People (workforce) Organizational Processes Technology (IT infrastructure) IS – IT, Organization and Management TECHNOLOGY Knowledge N

Essential Value Propositions for a Successful Company Business Competency Set corporate goals and get executive sponsorship for the initiative Model Core Execution First, you have to have a business model, then, the company needs to set corporate goals and get executive sponsorship for the initiative." "Start with your business objectives of the product or service the company is selling, figure out where it is in the lifecycle, and determine which phase of CRM to focus on, for example, the company should determine whether it wants to focus on acquiring customers, retaining customers or up-selling and cross selling to customers." Examples: Dell vs. Gateway and Toyota vs. GM/FORD

Relationship between the Organizational Knowledge and Core Competency Can be transferred and reused efficiently and effectively across functional areas (sharing and collaboration) Core competency A specific business context Best Practices Top down Initiates an assessment of the firm’s core K which is fundamental to the business. Outcome: core K asset is identified The firm develops a strategy and establishes technical infrastructures, organizational mechanisms, and business rpocesses necessary for managing the core K as an asset the firm makes decisions on how to embed KM in everyday business process Bottom-UP follow Knowledge creations and reuse (K “harvesting” process) K hunting: refers to the process of collecting K, harvesting the process of filtering, and hardening the process of structuring tacit, useful K into explicit, reusable K. K hardening begins with applying harvested organizational K to specific business context – that produces best practices Once the best practice is developed in the critical business areas, the K that generalizes beyond the original context is identified and retained as a CORE COMPETENCY IT People Culture Organizational knowledge

BI: Big Data And Data Warehousing Two paradigms in BI: _____ __________ and ___ _____. Both are competing each other for turning data into actionable information. However, in recent years, the variety and complexity of data made data warehouse incapable of keeping up the changing needs. Big Data A new paradigm that the world of IT was forced to develop, not because the _______ of the structured data but the ______ and the _______ . Data Warehouse Big Data Velocity: the need to process data coming in at a rapid rate added velocity to the equation volume variety velocity

Introduction to Big Data Analytics Not just big! V______ structured, unstructured, or in a stream Two aspects for studying “Big Data” _______ and __________ /analyzing “Big Data” Push ____________ to the data instead of pushing data to a computing mode. olume ariety elocity storing processing Velocity: the need to process data coming in at a rapid rate added velocity to the equation computation

Break ! (Ch. 9) Exercise # 5 – a, b, c (p. 422) HW With the following assumptions: The length of a fiscal period is one month The data mart will contain five years of historical data 3. Approximately 5 percent of the policies experience some type of change each month HW #3 (p.422) ALL computations for b & c should be shown to get credits .