HATCO Case Primary Database This example investigates a business-to- business case from existing customers of HATCO. The primary database consists 100 observations on 14 separate variables. Three types of information were collected: The perceptions of HATCO, 7 attributes (X1 – X7); The actual purchase outcomes, 2 specific measures (X9,X10); The characteristics of the purchasing companies, 5 characteristics (X8, X11-X14).
Table 2.1 Description of Database Variables (Hair et al., 1998)
Missing Data A missing data process is any systematic event external to the respondent (e.g. data entry errors or data collection problems) or action on the part of the respondent (such as refusal to answer) that leads to missing values. The impact of missing data is detrimental not only through its potential “hidden” biases of the results but also in its practical impact on the sample size available for analysis.
Understanding the missing data Ignorable missing data Remediable missing data Examining the pattern of missing data
Table 2.2 Summary Statistics of Pretest Data (Hair et al., 1998)
Outliers Four classes of outliers: Procedural error Extraordinary event can be explained Extraordinary observations has no explanation Observations fall within the ordinary range of values on each of the variables but are unique in their combination of values across the variables. Detecting outliers Univariate detection Bivariate detection Multivariate detection
Outliers detection Univariate detection threshold: For small samples, within ±2.5 standardized variable values For larger samples, within ±3 or ± 4 standardized variable values Bivariate detection threshold: Varying between 50 and 90 percent of the ellipse representing normal distribution. Multivariate detection: The Mahalanobis distance D 2
Table 2.7 Identification of Univariate and Bivariate Outliers (Hair et al., 1998)
Fig 2.3 Graphical Identification of Bivariate Outliers (Hair et al., 1998)