Presentation is loading. Please wait.

Presentation is loading. Please wait.

McGraw-Hill-Ryerson©2015 The McGraw-Hill Companies, All Rights Reserved Opening Case: The Case for Business Intelligence at Netflix.

Similar presentations


Presentation on theme: "McGraw-Hill-Ryerson©2015 The McGraw-Hill Companies, All Rights Reserved Opening Case: The Case for Business Intelligence at Netflix."— Presentation transcript:

1 McGraw-Hill-Ryerson©2015 The McGraw-Hill Companies, All Rights Reserved Opening Case: The Case for Business Intelligence at Netflix

2 4-2 Copyright © 2015 McGraw-Hill Ryerson Limited Chapter Four Overview SECTION 4.1 – DATABASES – Storing Transactional Data – Relational Database Fundamentals – Relational Database Advantages – Database Management Systems – Data-Driven Web Sites SECTION 4.2 – DATA WAREHOUSING – Accessing Organizational Information – History of Data Warehousing – Data Warehouse Fundamentals – Data Mining

3 4-3 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcomes 1.Describe the structure of a relational database. 2.Describe the advantages to storing data in a relational database. 3.Explain how users interact with a database management system, the advantage of data-driven Web sites, and the primary methods of integrating data and information across multiple databases in organizations. 4.Describe data warehouse fundamentals and advantages. 5.Describe data mining and explain the relationship between data-mining and data warehousing.

4 McGraw-Hill-Ryerson©2015 The McGraw-Hill Companies, All Rights Reserved DATABASES

5 4-5 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Transactional Data is stored in databases. Database – Collection of records – Schema describes data it holds, the objects (data items) represented & relationships among them – Database (or Data) Models How the Schema is organized. Most common is relational using multiple tables set up in rows and columns Storing Transactional Data Example of a Relational Database Table 4.1

6 4-6 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Database models include: – Hierarchical database model Information is organized into a tree-like structure that allow for repeating data. One parent record has many subordinate (or child) records – Network database model Flexible way of representing objects and their relationships. Subordinate (or child) records can have many parent records forming a complex, multi-dimensional lattice structure – Relational database model – Stores data in the form of logically related two- dimensional tables. Database Fundamentals 4.1

7 4-7 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Entity class – A category of person, place, thing or event about which information is stored. Entity – An individual person, place, thing or an individual occurrence of an event about which information is stored. Table – Collects the data for an entity class. For example: One table is for Customers, another for Orders, another for Products. Record – Rows containing the data for each entity belonging to that class. Field (Attribute) – Columns indicating the characteristics stored for each entity Relational Database Fundamentals 4.1

8 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-8 Relationship Fundamentals From Figure 4.1 Potential Relational Database for the Coca-Cola Bottling Company This is a fictitious sample order by Dave’s Sub Shop for Barq’s Root Beer from Coca Cola with transactional data to be stored in a database. 4.1

9 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-9 Relational Database Fundamentals Example of an Entity Class (Table) Examples of Entities (Records) From Figure 4.1 Relationship Fundamentals Potential Relational Database for the Coca-Cola Bottling Company 4.1

10 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome 4-10 Storing Data in a Relational Database Figure 4.1 Potential Relational Database for the Coca-Cola Bottling Company Data is stored in tables according to its particular category. 4.1

11 4-11 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Primary key A field (or group of fields) contain values that uniquely identify a given record in a table. Foreign key A primary key of one table that appears a field in another table. A value in the foreign key of one table corresponds to the value in the primary key of another table. Relationships The data from one table is linked to another when the computer finds a match between the values in a primary key to the values in the foreign key of another table. Relating Data through Keys 4.1

12 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-12 Relational Database Fundamentals 2.Customer ID is a Primary Key in the Customer table. The value 23 is a unique identifier for Dave’s Sub Shop. See Figure 4.1 Potential Relational Database for the Coca-Cola Bottling Company 3.Customer ID is the Foreign Key in the Order table. The value 23 is a unique identifier for Dave’s Sub Shop and links the order to the customer. 1.The Customer table is linked to the Order table by means of the Customer ID field. 4.1

13 4-13 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Increased Flexibility – Handle changes quickly and easily – Provide users with different views – Have only one physical view Physical view – deals with the physical storage of information on a storage device – Have multiple logical views Logical view – focuses on how users logically access information Relational Database Advantages 4.2

14 4-14 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Increased Scalability and Performance A database must increase or decrease in size to meet increased demand, while maintaining acceptable performance levels. – Scalability refers to how well a system can adapt its capacity to changing demands. – Performance measures how quickly a system performs a certain process or transaction. Relational Database Advantages 4.2

15 4-15 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Reduced Data Redundancy Data (Information) Redundancy is the duplication of information or storing the same information in multiple places Problems include: – Inconsistency of data describing the same thing. – Waste of space, waste of time to enter and update. – Difficulty securing data in many places. Relational Database Advantages 4.2

16 4-16 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Increased Information Integrity (Quality) Information integrity measures the quality of information Integrity constraints are rules to ensure the quality of information: – Relational integrity constraints are rules enforcing data structures and accurate storage, analysis & display of information – Business-critical integrity constraints are rules supporting operational requirements such return policies and credit terms. – Support error reduction & increase in the use of organizational data. Relational Database Advantages 4.2

17 4-17 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Increased Security Information is an organizational asset and must be protected. Databases offer several security features including: – Password – provides authentication of the user – Access level – determines who has access to the different types of information – Access control – determines types of user access, such as read-only read-write, read- write-copy Relational Database Advantages 4.2

18 4-18 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Software through which users and application programs interact with a database Database Management Systems (DBMS) Figure 4.2 Interacting Directly and Indirectly with a Database Through a DBMS 4.3

19 4-19 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome An interactive Web site which uses a database to keep it updated and relevant to the needs of its customers. Data-Driven Web Sites Figure A Data-driven Website 1)Search Engine 3)Search Query Results 2)Database

20 4-20 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Development Capability Allows website owner to make changes anytime with little or no training Content Management Capability Faster turnaround time and more accurate updates. Future Expandability Easier layout, displays and functionality changes. Minimization of human error Has “error-trapping” mechanisms to ensure content & formats are correct. Less production & Update Costs Data entry personnel are trained more quickly and are less expensive than programmers. More efficiency System cascades changes through the site. Better stability System tracks templates and source files. Data Driven Web Site Advantages See Figure

21 4-21 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Allows separate systems to communicate directly with each other. – Forward integration takes information entered into a given system and sends it automatically to all downstream systems and processes. – Backward integration takes information entered into a given system and sends it automatically to all upstream systems and processes. Data Integration 4.3

22 4-22 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Forward and Backward Integration Figure 4.5 Forward and Backward Customer Data Integration 4.3

23 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-23 Integrated Customer Data Figure Integrated Customer Data

24 4-24 Copyright © 2015 McGraw-Hill Ryerson Limited OPENING CASE QUESTIONS The Case for Business Intelligence at NetFlix 1.What is the impact to NetFlix if the information contained in its database is of low quality? 2.Review the five common characteristics of high quality information and rank them in order of importance to NetFlix? 3.How might NetFlix resolve issues of poor information in their customer movie reviews? 4.Identify the different types of entities that might be stored in NetFlix's database. 5.Why is database technology so important to NetFlix and its business model?

25 McGraw-Hill-Ryerson©2015 The McGraw-Hill Companies, All Rights Reserved Data Warehousing

26 4-26 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome In the 1990’s, Functional systems were too cumbersome & inefficient – Operations systems and data were not integrated. – Little historic data, little trend information – Quality issues – Good for transactions processing, not analysis Turn of the millennium – Data scattered over too many platforms – Complex analysis was not timely History of Data Warehousing Learning 4.4

27 4-27 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Data warehouse – A logical collection of information – Gathered from many different operational databases – Supports strategic business analysis activities and decision-making tasks. Primary Purpose – To aggregate information throughout an organization – Not a location for ALL data, only data of interest. Data Warehouse Fundamentals 4.4

28 4-28 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Subject oriented – Information is organized around a major organizational subject area, e.g.. Customers Integrated – Sourced from a variety of internal operational systems and external databases into a coherent whole Time-variant – Time-stamped according to its cycle (daily, yearly etc.) Non-volatile – Once loaded, data does not change Characteristics of Data Warehouses 4.4

29 4-29 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Extraction, transformation, and loading (ETL) – A process that extracts information from internal and external databases, – Transforms the information using a common set of enterprise definitions – Loads the information into a data warehouse. Data mart – Contains a subset of data warehouse information – Extracted to be analyzed for specific objectives. Data Warehouse Fundamentals 4.4

30 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-30 Model of a Typical Data Warehouse Figure

31 4-31 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Databases contain information in two- dimensional tables…rows and columns Data warehouse information is three- dimensional…layers of rows and columns – Each Dimension is a particular characteristic of the information; an attribute. – Cube is acommon term for the representation of multi-dimensional information. Multi-dimensional Analysis 4.4

32 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-32 Multi-dimensional Analysis Figure A Cube of Information for Performing Multi-Dimensional Analysis on Three Stores for Five Products and Four Promotions.

33 4-33 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Software tools use sophisticated algorithms to parse, standardize, correct, match and consolidate warehouse information. Process is done during the ETL process and once it is in the warehouse. Critical when data exits in several operational systems. Information Cleansing or Scrubbing 4.4

34 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-34 Information Cleansing or Scrubbing Figure Customer Contact Data in Operational Systems

35 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-35 Standardizing Customer Name from Operational Systems Figure

36 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-36 Information Cleansing or Scrubbing Figure

37 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-37 Accurate and Complete Information Figure

38 4-38 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome The process of analyzing data to extract information. – Drilling Down progresses through increasing levels of detail. – Drilling Up works through increasing levels of summarization. Data Mining Tools ‒Variety of techniques that find patterns and relationships in large volumes of information. ‒Specialized technologies and functionalities including Query tools, reporting tools, statistical tools and intelligence agents. Data Mining 4.5

39 4-39 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Apply algorithms to information sets to uncover inherent trends and patterns which are used to develop new business strategies. Classification – Assigning records to one of a pre-defined set of classes Estimation – Determining the values for an unknown continuous variable behavior Affinity grouping – Which things go together Clustering – Breaks up a heterogeneous population of records into a number of more homogenous subgroups. Data Mining Activities 4.5

40 Learning Outcome Copyright © 2015 McGraw-Hill Ryerson Limited 4-40 Data Mining Output 4.5 Figure 4.13 SC Johnson: Changes in Consumer Environmental Behaviour

41 4-41 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Cluster analysis A statistical technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible. Data Mining Techniques 4.5

42 4-42 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Association detection Reveals the relationship between variables along with the nature and frequency of the relationships Rule Generators – Form business rules from the data mining applications – Predict business events and their probability of occurrence Market basket analysis – Analyzes websites & checkout scanners – Predict future buyer behaviour Association Detection 4.5 Figure 4.14 Data Collection for Market Basket Analysis

43 4-43 Copyright © 2015 McGraw-Hill Ryerson Limited Learning Outcome Performs such functions as information correlations, distributions, calculations, and variance analysis – SENECA defines qualitative variables and assigns them numerical scales. Then, builds models, forecasts and trends based on consumer testing. – Forecast – Predictions made on the basis of time-series information – Time-series information – Data collected at regular, equal-spaced, periods. Used for trend analysis. – Many large vendors provide end-to-end data mining decision tools with predictive analytical capabilities. Statistical Analysis 4.5

44 4-44 Copyright © 2015 McGraw-Hill Ryerson Limited OPENING CASE QUESTIONS The Case for Business Intelligence at NetFlix 6.Why must NetFlix cleanse or scrub the information in its database? 7.Choose one of the three common forms of data mining analysis and explain how NetFlix is using it to gain BI? 8.How might NetFlix be using tactical, operational and strategic BI?

45 4-45 Copyright © 2015 McGraw-Hill Ryerson Limited CLOSING CASE ONE Scouting for Quality 1.Explain the importance of high-quality information for Scouts Canada. 2.Review the five common characteristics of high quality information and rank them in order of importance for Scouts Canada. 3.How could data warehouses and data marts be used to help Scouts Canada improve the efficiency and effectiveness of its operations? Its decision making?

46 4-46 Copyright © 2015 McGraw-Hill Ryerson Limited CLOSING CASE ONE Scouting for Quality 4.What kinds of data marts might Scouting Canada want to build to help it analyze its operational performance? 5.Do the managers at Scouts Canada actually have all of the information they require to make an accurate decision? Explain the statement “it is never possible to have all of the information required to make the best decision possible.”

47 4-47 Copyright © 2015 McGraw-Hill Ryerson Limited CLOSING CASE TWO Searching for Revenue: Google 1.Review the five common characteristics of high- quality information and rank them in order of importance to Google’s business. 2.What would be the ramifications of Google’s business if the search information it presented to its customers was of low quality? 3.Describe the different types of databases. Why should Google use a relational database? 4.Identify the different types of entities, entity classes, attributes, keys, and relationships that might be stored in Google’s AdWords relational database.

48 4-48 Copyright © 2015 McGraw-Hill Ryerson Limited CLOSING CASE TWO Searching for Revenue - Google 5.How might Google use a data warehouse to improve its business operations? 6.Why would Google need to scrub and cleanse the information in its data warehouse? 7.Identify a data mart that Google’s marketing and sales department might use to track and analyze its AdWords revenue.

49 4-49 Copyright © 2015 McGraw-Hill Ryerson Limited CLOSING CASE THREE Caesars - Gambling Big on Technology 1.Identify the effects poor information might have on Caesar’s service-oriented business strategy. 2.How does Caesar’s use database technologies to implement its service-oriented strategy? 3.Caesar’s was one of the first casino companies to find value in offering rewards to customers who visit multiple Caesar’s locations. Describe the effects on the company if it did not build any integration among the databases located at each of its casinos. How could Caesar’s use distributed databases or a data warehouses to synchronize customer information?

50 4-50 Copyright © 2015 McGraw-Hill Ryerson Limited 4.Estimate the potential impact to Caesar’s business if there is a security breach in its customer information. 5.Identify three different types of data marts Caesar’s might want to build to help it analyze its operational performance. 6.What might occur if Caesar’s fails to clean or scrub its information before loading it into its data warehouse? 7.Describe cluster analysis, association detection, and statistical analysis and explain how Caesar’s could use each one to gain insights into its business. CLOSING CASE THREE Caesars - Gambling Big on Technology


Download ppt "McGraw-Hill-Ryerson©2015 The McGraw-Hill Companies, All Rights Reserved Opening Case: The Case for Business Intelligence at Netflix."

Similar presentations


Ads by Google