# Chapter Fifteen Overview of Other Multivariate Techniques and Data Mining.

## Presentation on theme: "Chapter Fifteen Overview of Other Multivariate Techniques and Data Mining."— Presentation transcript:

Chapter Fifteen Overview of Other Multivariate Techniques and Data Mining

Copyright © Houghton Mifflin Company. All rights reserved.15 | 2 Dependence and Interdependence Techniques Dependence technique –One variable is designated as the dependent variable and the rest are treated as independent variables Interdependence technique –There are no dependent and independent variable designations, all variables are treated equally in a search for underlying patterns of relationships

Copyright © Houghton Mifflin Company. All rights reserved.15 | 3 Dependence Technique – Regression Analysis Input Data –Dependent variable(s) - metric –Independent variable(s) - metric Primary Purpose of the Technique –Ascertain the relative importance of independent variable(s) in explaining variation in the dependent variable –Predict dependentvariable values for given values of the independent variable(s)

Copyright © Houghton Mifflin Company. All rights reserved.15 | 5 Analysis of Variance ANOVA is appropriate in situations where the independent variable is set at certain specific levels (called treatments in an ANOVA context) and metric measurements of the dependent variable are obtained at each of those levels

Copyright © Houghton Mifflin Company. All rights reserved.15 | 6 Example 24 Stores Chosen randomly for the study 8 Stores randomly chosen for each treatment Treatment 1 Store brand sold at the regular price Treatment 2 Store brand sold at 50¢ off the regular price Treatment 3 Store brand sold at 75¢ off the regular price monitor sales of the store brand for a week in each store

Copyright © Houghton Mifflin Company. All rights reserved.15 | 8 EG1 -- Experiment Group 1, X1-- Regular Price EG2 -- Experiment Group 2, X2-- 50c off EG3 -- Experiment Group 3, X3-- 75c off O1 -- Observation (monitoring unit sales data in each store) O2 -- Observation (monitoring unit sales data in each store) O3 -- Observation (monitoring unit sales data in each store) After Only Design

Copyright © Houghton Mifflin Company. All rights reserved.15 | 9 EG 1 (R)X 1 O 1 EG 2 (R)X 2 O 2 EG 3 (R)X 3 O 3 After Only Design EG1 -- Experiment Group 1, X1-- Regular Price EG2 -- Experiment Group 2, X2-- 50c off EG3 -- Experiment Group 3, X3-- 75c off O1 -- Observation (monitoring unit sales data in each store) O2 -- Observation (monitoring unit sales data in each store) O3 -- Observation (monitoring unit sales data in each store)

Copyright © Houghton Mifflin Company. All rights reserved.15 | 10 ANOVA –Grocery Store Hypothesis Grocery Store Example –H o 1 = 2 = 3 –H a At least one is different from one or more of the others Hypotheses for K Treatment groups or samples –H o 1 = 2 =... = k –H a At least one is different from one or more of the others

Copyright © Houghton Mifflin Company. All rights reserved.15 | 12 Bank Customers Gender MaleFemale < 35 Years 35-64 Years > 64 Years < 35 Years 35-64 Years > 64 Years Measure Overall Perceptions Bank Customer Perceptions Study

Copyright © Houghton Mifflin Company. All rights reserved.15 | 14 Bank Customer Perceptions Study (Contd) Male and female customers differed in their overall perceptions Customers' perceptions differed according to their ages

Copyright © Houghton Mifflin Company. All rights reserved.15 | 16 Factorial Anova The Factorial ANOVA is used to analyze data from a factorial design experiment variable –Using the grocery store example Add in the impact of an in-store, point of purchase display on orange juice In addition to the pricing factor

Copyright © Houghton Mifflin Company. All rights reserved.15 | 18 Factorial ANOVA The interaction effect is calculated and F- tested SS T =SS TR + SS E = SS A + SS B + SS AB + SS E SS A = Effect of Treatment A SS B = Effect of Treatment B SS AB = Interaction Effect

Copyright © Houghton Mifflin Company. All rights reserved.15 | 19 Discriminant Analysis Identifies the distinguishing features of prespecified subgroups of units that are formed on the basis of some dependent variable Examples of subgroups –Heavy, moderate, and light users of a product –Homeowners and renters –Viewers and nonviewers of a television program

Copyright © Houghton Mifflin Company. All rights reserved.15 | 20 Discriminant Analysis (Contd) Dependent Variable –Categorical: as many categories as there are subgroups Heavy, moderate, and light users: 3 categories Independent Variable –Metric-scaled Purpose of discriminant analysis is to classify new units into one of the subgroups given the new units values of the independent variable

Copyright © Houghton Mifflin Company. All rights reserved.15 | 23 Using the Discriminant Function Y = v 1 X 1 + v 2 X 2 –Discriminant weights v 1 and v 2 can be interpreted as signifying the relative importance of X 1 and X 2 in being able to discriminate between the two groups Ynew = v 1 X 1, new + v 2 X 2, new –The program assigns either to the owner group or to the non-owner group based on the criterion value

Copyright © Houghton Mifflin Company. All rights reserved.15 | 24 Evaluating a Discriminant Function Confusion Matrix –Indicates the degree of correspondence, or lack thereof, between the actual groupings of the sample units and the predicted groupings obtained by classifying the same units through the discriminant function

Copyright © Houghton Mifflin Company. All rights reserved.15 | 26 Usefulness of Discriminant Analysis Discriminant analysis is very useful for –Defining customer segments –Identifying critical characteristics capable of distinguishing among them –Classifying prospective customers into appropriate segments

Copyright © Houghton Mifflin Company. All rights reserved.15 | 27 Factor Analysis A data and variable reduction technique that attempts to partition a given set of variables into groups of maximally correlated variables

Copyright © Houghton Mifflin Company. All rights reserved.15 | 28 Intuitive Explanation Consider two statements from the Star Brand Inc.(SBI) survey –S 1. I have been satisfied with the Star products I have purchased –S 2. When I have to purchase a home appliance in the future, it will likely be a Star product

Copyright © Houghton Mifflin Company. All rights reserved.15 | 31 Factor Analysis Output and Its Interpretation Primary output of factor analysis is a factor- loading matrix Achieved Communality –represents the proportion of variance in an original variable accounted for by all the extracted factors. –Each original variable will have an achieved communality value in the factor analysis output

Copyright © Houghton Mifflin Company. All rights reserved.15 | 32 Factor Analysis Output (cont) The eigenvalue for a given factor measures the variance in all the variables which is accounted for by that factor. –Note that the eigenvalue is not the percent of variance explained but the amount of variance in relation to total variance –(since variables are standardized to have means of 0 and variances of 1, total variance is equal to the number of variables). –SPSS will output a corresponding column titled '% of variance'. A factor's eigenvalue may be computed as the sum of its squared factor loadings for all the variables.

Copyright © Houghton Mifflin Company. All rights reserved.15 | 34 Reducing Star Data X 1, X 4, and X 6 can be combined into one factor X 2, X 3, and X 5 can be into a second factor 6 variables can be reduced to two factors

Copyright © Houghton Mifflin Company. All rights reserved.15 | 35 Potential Applications of Factor Analysis Used to –Develop concise but comprehensive, multiple- item scales for measuring various marketing constructs –Illuminate the nature of distinct dimensions underlying an existing data set –Convert a large volume of data into a set of factor scores on a limited number of uncorrelated factors

Copyright © Houghton Mifflin Company. All rights reserved.15 | 36 Cluster Analysis Segment objects into groups so that members within each group are similar to one another in a variety of ways Useful for segmenting customers, market areas, and products

Copyright © Houghton Mifflin Company. All rights reserved.15 | 37 Use of Cluster Analysis Firm offering recreational services wanted to enter a new region of the country They gathered data on more than 100 characteristics including –Demographics –Expenditures on recreation –Leisure time activities –Interests of household members The firm identified one or several household segments that are likely to be most responsive to its advertising and to its services

Copyright © Houghton Mifflin Company. All rights reserved.15 | 38 How Does Cluster Analysis Work? Cluster analysis measures the similarity between objects on the basis of their values on the various characteristics

Copyright © Houghton Mifflin Company. All rights reserved.15 | 40 Multidemensional Scaling Uncovers key dimensions underlying customers' evaluations from a series of similarity and/or preference judgments provided by customers about products or brands within a given set

Copyright © Houghton Mifflin Company. All rights reserved.15 | 44 Conjoint Analysis Technique for deriving the utility values that customers presumably attach to different levels of an object's attributes Requires respondents to compare hypothetical product profiles or, brands –The hypothetical stimuli are descriptive profiles formed by systematically combining varying levels of certain key attributes

Copyright © Houghton Mifflin Company. All rights reserved.15 | 45 Personal Computer Study To assess the role played by attributes in customer evaluations of personal compters –Price: 3 levels - \$299, \$649, \$999 –Processor: 2 levels – 2.6 GHz, 2.8 GHz –Hard Drive: 4 levels - 80 GB, 120 GB, 160 GB, 200 GB

Copyright © Houghton Mifflin Company. All rights reserved.15 | 46 Personal Computer Study (Contd) 3 Levels of Price X, 2 Levels of Processor Speed X, 4 Levels of Hard Drive Capacity = 24 different descriptive profiles of personal computers are possible Data Collection in Conjoint Analysis –Two-Factors-at-a-Time Approach –Full-Profile Approach

Copyright © Houghton Mifflin Company. All rights reserved.15 | 47 Note: This rating procedure would be repeated for the 12 price-versatility combinations and the 8 memory-versatility combinations Price Processing Speed \$299\$649\$999 2.6 GHx 2.8 ghZ Personal Computer Study: Two-Factors-At-a-Time Approach

Copyright © Houghton Mifflin Company. All rights reserved.15 | 48 PERSONAL COMPUTER – DESKTOP Price \$299 Speed 2.6 GHz Hard Drive 80 GB PERSONAL COMPUTER – DESKTOP Price \$299 Speed 2.6 GHz Hard Drive 120 GB PERSONAL COMPUTER - DESKTOP Price \$299 Speed 2.6 GHz Hard Drive 160 GB PERSONAL COMPUTER -DESKTOP Price \$299 Speed 2.6 GHz Hard Drive 200 GB Note: Customers are asked to rank order their preferences for the 24 different profiles representing all possible combinations of the three attributes Personal Computer Study: Full-Profile Approach

Copyright © Houghton Mifflin Company. All rights reserved.15 | 51 Relative Attributes of the 3 Attributes Range for price = 0.8 - 0.3 = 0.5 –Price is the most critical Range for hard drive capacity = 0.8 - 0.4 = 0.4 –Hard drive capacity is the next most critical Range for processor speed = 0.9 - 0.6 = 0.3 –Processor speed Ii the least critical

Copyright © Houghton Mifflin Company. All rights reserved.15 | 52 Potential Attractiveness of Different Personal Computer Configurations PC Configuration A –2.6 GHz, 120 GB, \$649 –Total utility for the personal computer = 0.6 + 0.7 + 0.4 = 1.7 PC Configuration B –2.8 GHz, 160 GB, \$999 –Total utility for the personal computer = 0.9 + 0.8 + 0.3 = 2.0 Personal Computer B is more attractive

Copyright © Houghton Mifflin Company. All rights reserved.15 | 53 Data Mining Is an analytic process designed to explore data to find consistent patterns or systematic relationships between variables Data mining is sometimes referred to as knowledge discovery in databases (KDD) or predictive analytics

Copyright © Houghton Mifflin Company. All rights reserved.15 | 54 Data Mining Procedures Association: Looking for patterns that connect one event or characteristic Sequence or path analysis: Looking for patterns in which one event leads to a later event Classification: Looking for new patterns by segmenting the data into groups.

Copyright © Houghton Mifflin Company. All rights reserved.15 | 55 Data Mining Procedures (Contd) Clustering: Finding and visually documenting groups of facts or groups of customers not previously known to be similar Scoring Models: Developing propensity scores for individual customers Forecasting/Prediction: Discovering patterns in data that can lead to reasonable predictions about the future

Copyright © Houghton Mifflin Company. All rights reserved.15 | 56 Data Mining Analysis The various data mining procedures are used in different types of data mining analysis For example, association and sequence or path analysis is used in market basket analysis, whereas classification and clustering are used for segmentation analysis

Copyright © Houghton Mifflin Company. All rights reserved.15 | 57 Market Basked Analysis Examines customers shopping carts to determine items that are most frequently purchased together From an analysis of 8 million market baskets, Mind Meld, Inc. has found that beauty care items, greeting cards, and seasonal candies were often bought together by beauty conscious customers

Copyright © Houghton Mifflin Company. All rights reserved.15 | 58 Market Basked Marketing Implications 1.Ability to place products that sell together in close proximity 2.Knowing which products potentially increase impulse purchasing 3.Suggest products to customers based on purchase by previous customers 4.Separate goods to ensure maximum coverage of the store

Copyright © Houghton Mifflin Company. All rights reserved.15 | 59 Classification Models Decision trees are one of the most popular classification algorithms The basic idea underlying the decision tree technique is to hierarchically segment individuals included in the database on the basis of a designated categorical dependent variable

Copyright © Houghton Mifflin Company. All rights reserved.15 | 60 Sample Decision Tree The first split is based on income variable. All users of the product are high-income customers The second split is based on education and shows that 100% of customers have a college degree Thus the firm should target high income customers with college degrees

Copyright © Houghton Mifflin Company. All rights reserved.15 | 62 Scoring Models and RFM Analysis A score (typically a numerical value) is assigned to each record in the database and indicates the likelihood that the specific customer will exhibit a particular behavior Recency, Frequency and Monetary Value Analysis (RFM) is one of the most commonly used scoring models

Copyright © Houghton Mifflin Company. All rights reserved.15 | 63 RFM Analysis 1.Customers who purchased recently are more likely to buy again than customers who have not purchased in a while 2.Customers who purchase frequently are more likely to buy again than customers who have made just one or two purchases 3.Customers who spent the most money in total are more likely to buy again. The most valuable customers tend to continue to become even more valuable.

Copyright © Houghton Mifflin Company. All rights reserved.15 | 64 Forecasting or Prediction Models Logistic Regression is a type of regression analysis in which the dependent variable is dichotomous coded as either 1 or 0, to represent the occurrence or nonoccurrence of some outcome event. Where p is the probability that Y-1 and X1, X2…Xn are the independent variables and Beta sub 0, Beta sub 1….Beta sub N are the regression coefficients.

Copyright © Houghton Mifflin Company. All rights reserved.15 | 65 Logistic Regression Does not assume that the relationship between the independent and dependent variable is linear Does not assume that the dependent variable or error terms are normally distributed

Copyright © Houghton Mifflin Company. All rights reserved.15 | 66 Neural Networks Neural network models mimic the functioning of the human brain and basically work on the principle of biological neurons

Copyright © Houghton Mifflin Company. All rights reserved.15 | 67 Neural Network Analysis Like the human brain, neural networks can learn. New information is assimilated and used to make decisions. Neural network analysis has been shown to make superior predictions over other more traditional forecasting techniques such as discriminant analysis and logistic regression.

Copyright © Houghton Mifflin Company. All rights reserved.15 | 68 New Trends in Data Mining New data sources such as audio or video data are leading to new techniques for analyzing that data Text mining analysis is used to uncover patterns and relationships within thousands of documents The same techniques are now being adapted to search audio and video files

Copyright © Houghton Mifflin Company. All rights reserved.15 | 69 Case Study- Athenaeum Booksellers Part B Case objectives: Opportunity to analyze the factor analysis output for an actual research study. Learn some of the technical aspects of factor analysis Infer matters of managerial significance. Review and interpret the eigenvalues contained in the output. analyze and interpret the different factors that arise out of the analysis. consider the managerial implications of the findings.

Copyright © Houghton Mifflin Company. All rights reserved.15 | 70 Case Discussion Address the information in the first table, specifically the interpretation of the eigenvalues. Interpret the percent of variance explained Interpret the factor loadings for the two factors What do these factors represent? What are the managerial implications to Mr. Karvonides The factor analysis tables are in the textbook Copy of the customer satisfaction survey is in the textbook.

Copyright © Houghton Mifflin Company. All rights reserved.15 | 71 First Factor Eigenvalue of 3.110 (out of a total of 7), Explains roughly 44 percent of the variance. It represents aspects of service and variety. The implications of these findings are that –Mr. K should be sure to provide good levels of service and a wide variety of products in order to satisfy customers. The following variables load heavily on the first factor: