Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter Fourteen Data Preparation

Similar presentations


Presentation on theme: "Chapter Fourteen Data Preparation"— Presentation transcript:

1 Chapter Fourteen Data Preparation
Copyright © 2010 Pearson Education, Inc. 14-1

2 Chapter Outline 1) Overview 2) The Data Preparation Process 3) Questionnaire Checking 4) Editing Treatment of Unsatisfactory Responses 5) Coding Coding Questions 6) Selecting a Data Analysis Strategy

3 Data Preparation Process
Fig. 14.1 Select Data Analysis Strategy Prepare Preliminary Plan of Data Analysis Check Questionnaire Edit Code Transcribe Clean Data Statistically Adjust the Data

4 Questionnaire Checking
A questionnaire returned from the field may be unacceptable for several reasons. Parts of the questionnaire may be incomplete. The pattern of responses may indicate that the respondent did not understand or follow the instructions. The responses show little variance. One or more pages are missing. The questionnaire is received after the preestablished cutoff date. The questionnaire is answered by someone who does not qualify for participation.

5 Editing Treatment of Unsatisfactory Results Returning to the Field – The questionnaires with unsatisfactory responses may be returned to the field, where the interviewers recontact the respondents. Discarding Unsatisfactory Respondents – In this approach, the respondents with unsatisfactory responses are simply discarded.

6 Coding Coding means assigning a code, usually a number, to each possible response to each question. The code includes an indication of the column position (field) and data record it will occupy. Coding Questions Fixed field codes, which mean that the number of records for each respondent is the same and the same data appear in the same column(s) for all respondents, are highly desirable. If possible, standard codes should be used for missing data. Coding of structured questions is relatively simple, since the response options are predetermined. In questions that permit a large number of responses, each possible response option should be assigned a separate column.

7 Restaurant Preference
Table 14.1

8

9 SPSS Variable View of the Data of Table 14.1

10 Example of Questionnaire Coding
Fig. 14.3

11 Data Cleaning Consistency Checks
Consistency checks identify data that are out of range, logically inconsistent, or have extreme values. Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to identify out-of-range values for each variable and print out the respondent code, variable code, variable name, record number, column number, and out-of-range value. Extreme values should be closely examined.

12

13 Selecting a Data Analysis Strategy
Earlier Steps (1, 2, & 3) of the Marketing Research Process Known Characteristics of the Data Data Analysis Strategy Properties of Statistical Techniques Background and Philosophy of the Researcher Fig. 14.5

14 A Classification of Univariate Techniques
Fig. 14.6 Independent Related * Two- Group test * Z test * One-Way ANOVA * Paired t test * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA * Sign * Wilcoxon * McNemar Metric Data Non-numeric Data Univariate Techniques One Sample Two or More Samples * t test Frequency Chi-Square K-S Runs Binomial

15 A Classification of Multivariate Techniques
Fig. 14.7 More Than One Dependent Variable * Multivariate Analysis of Variance * Canonical Correlation * Multiple Discriminant Analysis * Structural Equation Modeling and Path Analysis * Cross-Tabulation * Analysis of Variance and Covariance * Multiple Regression * 2-Group Discriminant/Logit * Conjoint Analysis * Factor Analysis * Confirmatory Factor Analysis One Dependent Variable Variable Interdependence Interobject Similarity * Cluster Analysis * Multidimensional Scaling Dependence Technique Interdependence Technique Multivariate Techniques

16 Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
Hypothesis Testing

17 Internet Usage Data Table 15.1
Respondent Sex Familiarity Internet Attitude Toward Usage of Internet Number Usage Internet Technology Shopping Banking Table 15.1

18 Frequency Distribution
In a frequency distribution, one variable is considered at a time. A frequency distribution for a variable produces a table of frequency counts, percentages, and cumulative percentages for all the values associated with that variable.

19 Frequency of Familiarity with the Internet
Table 15.2

20 Frequency Histogram Fig. 15.1 2 3 4 5 6 7 1 Frequency Familiarity 8

21

22 Statistics Associated with Frequency Distribution: Measures of Location
The mean, or average value, is the most commonly used measure of central tendency. The mean, ,is given by Where, Xi = Observed values of the variable X n = Number of observations (sample size) The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories. X = i / n S 1

23 Statistics Associated with Frequency Distribution: Measures of Location
The median of a sample is the middle value when the data are arranged in ascending or descending order. If the number of data points is even, the median is usually estimated as the midpoint between the two middle values – by adding the two middle values and dividing their sum by 2. The median is the 50th percentile.

24 Statistics Associated with Frequency Distribution: Measures of Variability
The range measures the spread of the data. It is simply the difference between the largest and smallest values in the sample. Range = Xlargest – Xsmallest

25 Statistics Associated with Frequency Distribution: Measures of Variability
The variance is the mean squared deviation from the mean. The variance can never be negative. The standard deviation is the square root of the variance. The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage, and is a unitless measure of relative variability. s x = ( X i - ) 2 n 1 S C V /

26 Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes two or more variables simultaneously. Cross-tabulation results in tables that reflect the joint distribution of two or more variables with a limited number of categories or distinct values, e.g., Table 15.3.

27 Gender and Internet Usage
Table 15.3 Gender Row Internet Usage Male Female Total Light (1) 5 10 15 Heavy (2) Column Total 1

28 Two Variables Cross-Tabulation
Since two variables have been cross-classified, percentages could be computed either columnwise, based on column totals (Table 15.4), or rowwise, based on row totals (Table 15.5). The general rule is to compute the percentages in the direction of the independent variable, across the dependent variable. The correct way of calculating percentages is as shown in Table 15.4.

29 Internet Usage by Gender
Table 15.4

30 Gender by Internet Usage
Table 15.5

31 Exhibit 12.5 Cross-Tabulation

32 Hypothesis Testing Related to Differences
Parametric tests assume that the variables of interest are measured on at least an interval scale. Nonparametric tests assume that the variables are measured on a nominal or ordinal scale. These tests can be further classified based on whether one or two or more samples are involved. The samples are independent if they are drawn randomly from different populations. For the purpose of analysis, data pertaining to different groups of respondents, e.g., males and females, are generally treated as independent samples. The samples are paired when the data for the two samples relate to the same group of respondents.

33 A Classification of Hypothesis Testing Procedures for Examining Differences
Independent Samples Paired Samples * Two-Group t test * Z test * Paired t test * Chi-Square * Mann-Whitney * Median * K-S * Sign * Wilcoxon * McNemar Chi-Square Fig. 15.9 Hypothesis Tests One Sample Two or More Samples * t test * Chi-Square * K-S * Runs * Binomial Parametric Tests (Metric Tests) Non-parametric Tests (Nonmetric Tests)

34 Univariate Tests of Significance
Tests of one variable at a time z-test t-test Appropriate for interval or ratio data

35 Univariate Hypothesis Test

36 Bivariate Statistical Tests
Compare characteristics of two groups or two variables Cross-tabulation with Chi-Square t-test to compare two means Analysis of variance (ANOVA) to compare three or more means

37 Requires interval or ratio data
Comparing means Requires interval or ratio data The t-test is the difference between the means divided by the variability of random means The t-value is a ratio of the difference between the two sample means and the std error The t-test tries to determine if the difference between the two sample means occurred by chance

38 Comparing Two Means with Independent Samples t-Test

39 Paired Samples t-Test

40

41 SPSS Windows The main program in SPSS is FREQUENCIES. It produces a table of frequency counts, percentages, and cumulative percentages for the values of each variable. It gives all of the associated statistics. If the data are interval scaled and only the summary statistics are desired, the DESCRIPTIVES procedure can be used. The EXPLORE procedure produces summary statistics and graphical displays, either for all of the cases or separately for groups of cases. Mean, median, variance, standard deviation, minimum, maximum, and range are some of the statistics that can be calculated.

42

43 SPSS Windows To select these procedures click: Analyze>Descriptive Statistics>Frequencies Analyze>Descriptive Statistics>Descriptives Analyze>Descriptive Statistics>Explore The major cross-tabulation program is CROSSTABS. This program will display the cross-classification tables and provide cell counts, row and column percentages, the chi-square test for significance, and all the measures of the strength of the association that have been discussed. To select these procedures, click: Analyze>Descriptive Statistics>Crosstabs

44 SPSS Windows Analyze>Compare Means>Means …
The major program for conducting parametric tests in SPSS is COMPARE MEANS. This program can be used to conduct t tests on one sample or independent or paired samples. To select these procedures using SPSS for Windows, click: Analyze>Compare Means>Means … Analyze>Compare Means>One-Sample T Test … Analyze>Compare Means>Independent-Samples T Test … Analyze>Compare Means>Paired-Samples T Test …

45 SPSS Windows The nonparametric tests discussed in this chapter can be conducted using NONPARAMETRIC TESTS. To select these procedures using SPSS for Windows, click: Analyze>Nonparametric Tests>Chi-Square … Analyze>Nonparametric Tests>Binomial … Analyze>Nonparametric Tests>Runs … Analyze>Nonparametric Tests>1-Sample K-S … Analyze>Nonparametric Tests>2 Independent Samples … Analyze>Nonparametric Tests>2 Related Samples …

46 SPSS Windows: Frequencies
Select ANALYZE on the SPSS menu bar. Click DESCRIPTIVE STATISTICS and select FREQUENCIES. Move the variable “Familiarity [familiar]” to the VARIABLE(s) box. Click STATISTICS. Select MEAN, MEDIAN, MODE, STD. DEVIATION, VARIANCE, and RANGE.

47 SPSS Windows: Frequencies
Click CONTINUE. Click CHARTS. Click HISTOGRAMS, then click CONTINUE. Click OK.

48 SPSS Windows: Cross-tabulations
Select ANALYZE on the SPSS menu bar. Click on DESCRIPTIVE STATISTICS and select CROSSTABS. Move the variable “Internet Usage Group [iusagegr]” to the ROW(S) box. Move the variable “Sex[sex]” to the COLUMN(S) box. Click on CELLS. Select OBSERVED under COUNTS and COLUMN under PERCENTAGES.

49 SPSS Windows: Cross-tabulations
Click CONTINUE. Click STATISTICS. Click on CHI-SQUARE, PHI AND CRAMER’S V. Click OK.

50 SPSS Windows: One Sample t Test
Select ANALYZE from the SPSS menu bar. Click COMPARE MEANS and then ONE SAMPLE T TEST. Move “Familiarity [familiar]” in to the TEST VARIABLE(S) box. Type “4” in the TEST VALUE box. Click OK.

51 SPSS Windows: Two Independent Samples t Test
Select ANALYZE from the SPSS menu bar. Click COMPARE MEANS and then INDEPENDENT SAMPLES T TEST. Move “Internet Usage Hrs/Week [iusage]” in to the TEST VARIABLE(S) box. Move “Sex[sex]” to GROUPING VARIABLE box. Click DEFINE GROUPS. Type “1” in GROUP 1 box and “2” in GROUP 2 box. Click CONTINUE. Click OK.

52 Copyright © 2010 Pearson Education, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Copyright © 2010 Pearson Education, Inc.


Download ppt "Chapter Fourteen Data Preparation"

Similar presentations


Ads by Google