Presentation on theme: "Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas."— Presentation transcript:
Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas
6-2 LEARNING OUTCOMES List, describe, and provide an example of each of the five characteristics of high quality information Define the relationship between a database and a database management system Describe the advantages an organization can gain by using a database.
6-3 UNDERSTANDING INFORMATION Information is everywhere in an organization Employees must be able to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisions Successfully collecting, compiling, sorting, and analyzing information can provide tremendous insight into how an organization is performing
6-4 UNDERSTANDING INFORMATION Information granularity – refers to the extent of detail within the information (fine and detailed or coarse and abstract) –Levels –Formats –Granularities
6-5 Information Quality Business decisions are only as good as the quality of the information used to make the decisions Characteristics of high quality information include: –Accuracy –Completeness –Consistency –Uniqueness –Timeliness
6-6 Information Quality Low quality information example
6-7 Understanding the Costs of Poor Information The four primary sources of low quality information include: 1.Online customers intentionally enter inaccurate information to protect their privacy 2.Information from different systems have different entry standards and formats 3.Call center operators enter abbreviated or erroneous information by accident or to save time 4.Third party and external information contains inconsistencies, inaccuracies, and errors
6-8 Understanding the Costs of Poor Information Potential business effects resulting from low quality information include: –Inability to accurately track customers –Difficulty identifying valuable customers –Inability to identify selling opportunities –Marketing to nonexistent customers –Difficulty tracking revenue due to inaccurate invoices –Inability to build strong customer relationships
6-9 Understanding the Benefits of Good Information High quality information can significantly improve the chances of making a good decision Good decisions can directly impact an organization's bottom line
6-10 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
6-11 DATABASE ADVANTAGES Database advantages from a business perspective include –Increased flexibility –Increased scalability and performance –Reduced information redundancy –Increased information integrity (quality) –Increased information security
6-12 Increased Flexibility A well-designed database should: –Handle changes quickly and easily –Provide users with different views –Have only one physical view Physical view – deals with the physical storage of information on a storage device –Have multiple logical views Logical view – focuses on how users logically access information
6-13 INTEGRATING DATA AMONG MULTIPLE DATABASES Integration – allows separate systems to communicate directly with each other –Forward integration – takes information entered into a given system and sends it automatically to all downstream systems and processes –Backward integration – takes information entered into a given system and sends it automatically to all upstream systems and processes
6-14 INTEGRATING DATA AMONG MULTIPLE DATABASES Forward and backward integration
6-15 INTEGRATING DATA AMONG MULTIPLE DATABASES Building a central repository specifically for integrated information
Data Warehouse Data Mining in eBiz
6-17 LEARNING OUTCOMES Describe the roles and purposes of data warehouses and data marts in an organization Compare the multidimensional nature of data warehouses (and data marts) with the two-dimensional nature of databases Identify the importance of ensuring the cleanliness of information throughout an organization Explain the relationship between business intelligence and a data warehouse
6-18 HISTORY OF DATA WAREHOUSING Data warehouses extend the transformation of data into information In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functions The data warehouse provided the ability to support decision making without disrupting the day-to-day operations
6-19 DATA WAREHOUSE FUNDAMENTALS Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes
6-20 DATA WAREHOUSE FUNDAMENTALS Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse Data mart – contains a subset of data warehouse information
6-21 DATA WAREHOUSE FUNDAMENTALS
6-22 From Data Warehousing to Data Mining
6-23 Multidimensional Analysis Databases contain information in a series of two-dimensional tables In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows –Dimension – a particular attribute of information
6-24 Multidimensional Analysis Cube – common term for the representation of multidimensional information
6-25 Multidimensional Analysis Data mining – the process of analyzing data to extract information not offered by the raw data alone To perform data mining users need data-mining tools –Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making
6-26 Information Cleansing or Scrubbing An organization must maintain high-quality data in the data warehouse Information cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information
6-27 Information Cleansing or Scrubbing Contact information in an operational system
6-28 Information Cleansing or Scrubbing Standardizing Customer name from Operational Systems
6-29 Information Cleansing or Scrubbing
6-30 Information Cleansing or Scrubbing Accurate and complete information
6-31 DATA MINING Data-mining software includes many forms of AI such as neural networks and expert systems
6-32 Data Mining’s Growth in Popularity One reason is that we keep getting more and more data all the time and need tools to understand it. We also are aware that the human brain has trouble processing multidimensional data. A third reason is that machine learning techniques are becoming more affordable and more refined at the same time.
6-33 DATA MINING Common forms of data-mining analysis capabilities include: –Cluster analysis –Association detection –Statistical analysis
6-34 Cluster Analysis Cluster analysis – a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible CRM systems depend on cluster analysis to segment customer information and identify behavioral traits
6-35 Cluster Example
6-36 Statistical Analysis Statistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysis –Forecast – predictions made on the basis of time-series information –Time-series information – time-stamped information collected at a particular frequency
6-37 Association Detection Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the information –Market basket analysis – analyzes such items as Web sites and checkout scanner information to detect customers’ buying behavior and predict future behavior by identifying affinities among customers’ choices of products and services
6-38 Making Accurate Predictions with Data Mining Although the literature contains statements such as “data mining will allow us to predict who will buy a particular product,” that is against human nature. In situations where data mining is used to predict response to a marketing campaign, only about 5% of the people selected as “likely respondents” actually do respond.
6-39 Making Accurate Predictions with Data Mining (cont.) Although the accuracy of predicting individual behavior is not so good, it is better than it seems, since direct marketing efforts often have “hit rates” of only about 1% without data mining.
6-40 Online Analytical Processing (OLAP) 1.Multidimensional view 2.Transparent to user 3.Accessible 4.Consistent reporting 5.Client-server architecture 6.Generic dimensionality 7.Dynamic sparse matrix handling 8.Multiuser support 9.Cross-dimensional ops 10.Intuitive manipulation 11.Flexible reporting 12.Unlimited dimension and aggregation Codd developed a set of 12 rules for the development of multidimensional databases:
6-41 OLAP as Implemented To date, it does not appear that any implementation exists that satisfies all 12 rules. Some people argue it might not even be possible to attain all of them. More recently, the term OLAP has come to represent the broad category of software technology that enables multidimensional analysis of enterprise data.
6-42 Multidimensional OLAP (MOLAP) Data can be viewed across several dimensions. Here sales are arrayed by region and product. A fourth dimension could be added by using several graphs -- perhaps at different time points. Most analyses have many more dimensions than this. MOLAP handles data as an n-dimensional hypercube.
6-43 Relational OLAP (ROLAP) A large relational database server replaces the multidimensional one. The database contains both detailed and summarized data, allowing “drill down” techniques to be applied. SQL interfaces allow vendors to build tools, both portable and scalable. This does require databases with many relational tables which may lead to substantial processor overhead on complex joins.
6-44 A Typical Relational Schema
: Techniques Used to Mine the Data Paralleling the popularity of data mining itself, the development of new techniques is exploding as well. Many innovations are vendor-specific, which sometimes does little to advance the state of the art. Regardless, data-mining techniques tend to fall into four major categories: 1. classification2. association 3. sequencing4. clustering
6-46 Classification methods The goal is to discover rules that define whether an item belongs to a particular subset or class of data. For example, if we are trying to determine which households will respond to a direct mail campaign, we will want rules that separate the “probables” from the not probables. These IF-THEN rules often are portrayed in a tree-like structure.
6-47 The Knowledge Discovery Search Process Steps in Discovery : –Define the business problem and obtain the data to study it. –Use data mining software to model the problem. –Mine the data to search for patterns of interest.
6-48 The Knowledge Discovery Search Process (cont.) –Review the mining results and refine them by respecifying the model. –Once validated, make the model available to other users of the DW.
6-49 Creating a Data-Mining Model Although syntax differs from vendor to vendor, building a model on top of a database is much like creating a table: CREATE MODEL mail_list Income character input, Age integer input, Respond character input To populate it with data, use an SQL INSERT: INSERT INTO mail_list SELECT income, age, respond FROM client_list WHERE region = ‘Southeast”
6-50 Creating a Data-Mining Model (cont.) The process automatically created additional views of the model (mail_list_UNDERSTAND and mail_list_PREDICT). These can be examined: SELECT * FROM mail_list_UNDERSTAND WHERE input_column_name = ‘income” and input_column_value = “high” and output_column_name = “respond” and output_column_value = ‘yes” Once these are created, they are treated as tables in the database so they can be viewed and joined by other users.
6-51 New Applications for Data Mining As the technology matures, new applications emerge, especially in two new categories, text mining and web mining. Some text mining examples are: –Distilling the meaning of a text –Accurate summarization of a text –Explication of the text theme structure –Clustering of texts
6-52 Web mining Web mining is a special case of text mining where the mining occurs over a website. It enhances the website with intelligent behavior, such as suggesting related links or recommending new products. It allows you to unobtrusively learn the interests of the visitors and modify their user profiles in real time. They also allow you to match resources to the interests of the visitor.
6-53 Market Basket Analysis: This is the most widely used and, in many ways, most successful data mining algorithm. It essentially determines what products people purchase together. Stores can use this information to place these products in the same area. Direct marketers can use this information to determine which new products to offer to their current customers. Inventory policies can be improved if reorder points reflect the demand for the complementary products.
6-54 Market Basket Analysis Methodology We first need a list of transactions and what was purchased. This is pretty easily obtained these days from scanning cash registers. Next, we choose a list of products to analyze, and tabulate how many times each was purchased with the others. The diagonals of the table shows how often a product is purchased in any combination, and the off-diagonals show which combinations were bought.
6-55 A Convenience Store Example Consider the following simple example about five transactions at a convenience store: Transaction 1: Frozen pizza, cola, milk Transaction 2: Milk, potato chips Transaction 3: Cola, frozen pizza Transaction 4: Milk, pretzels Transaction 5: Cola, pretzels These need to be cross tabulated and displayed in a table.
6-56 A Convenience Store Example Pizza and Cola sell together more often than any other combo; a cross-marketing opportunity? Milk sells well with everything – people probably come here specifically to buy it. Product Bought Pizza also Milk also Cola also Chips also Pretzels also Pizza21200 Milk13111 Cola21301 Chips01010 Pretzels01102
6-57 Using the Results The tabulations can immediately be translated into association rules and the numerical measures computed. Comparing this week’s table to last week’s table can immediately show the effect of this week’s promotional activities. Some rules are going to be trivial (hot dogs and buns sell together) or inexplicable (toilet rings sell only when a new hardware store is opened).
6-58 Limitations to Market Basket Analysis A large number of real transactions are needed to do an effective basket analysis, but the data’s accuracy is compromised if all the products do not occur with similar frequency. The analysis can sometimes capture results that were due to the success of previous marketing campaigns (and not natural tendencies of customers).
6-59 Performing Analysis with Virtual Items The sales data can be augmented with the addition of virtual items. For example, we could record that the customer was new to us, or had children. The transaction record might look like: Item 1: Sweater Item 2: Jacket Item 3: New This might allow us to see what patterns new customers have versus old customers.
6-60 Taxonomies The presence of items not purchased very frequently is an obstacle to a good market basket analysis. One way to deal with this is to eliminate products that occur with a frequency less than some threshold. A better idea would be to try to form groups of products that fall below the threshold. Four flavors of popsicle occur 9% of the time all together, but no more than 3% individually.
6-61 Multidimensional Market Basket Analysis Rules can involve more than two items, for example Plant and Clay Pot IMPLIES Soil. These rules are built iteratively. First, pairs are found, then relevant sets of three or four. These are then pruned by removing those that occur infrequently. In an environment like a grocery store, where customers commonly buy over 100 items, rules could involve as many as 10 items.
6-62 Current Limitations and Challenges to Data Mining Despite the potential power and value, data mining is still a new field. Some things that that thus far have limited advancement are: –Identification of missing information – not all knowledge gets stored in a database –Data noise and missing values – future systems need better ways to handle this –Large databases and high dimensionality – future applications need ways to partition data into more manageable chunks
6-63 Popular tools and languages by industry types David Smith 2012
6-64 CLOSING CASE 1.Review the five common characteristics of high quality information and rank them in order of importance for a government organization. 2.How could data warehouses and data marts be used to help marketing departments of travel companies improve the efficiency and effectiveness of its operations?