Data Analysis: Data Analysis: Review and Practical Application using SPSS.

Data Analysis: Data Analysis: Review and Practical Application using SPSS

Data of Interest National Insurance Company – 1000 questionnaires sent – 285 respondents Questionnaire Presentation – Copy given in class

Coding Coding broadly refers to the set of all tasks associated with transforming edited responses into a form that is ready for analysis Steps – Transforming responses to each question into a set of meaningful categories – Assigning numerical codes to the categories – Creating a data set suitable for computer analysis

Transforming Responses into Meaningful Categories A structured question is pre-categorized Responses to a nonstructured or open-ended question to be grouped into a meaningful and manageable set of categories Q 1: Q 1: In this questionnaire, how many non- categorized questions?

Missing-Value Category A missing value can stem from – A respondent's refusal to answer a question – An interviewer's failure to ask a question or record an answer or a "don't know" that does not seem legitimate Best way to treat missing value responses – Sound questionnaire design – Tight control over fieldwork

Assigning Numerical Codes Assign appropriate numerical codes to responses that are not already in quantified form To assign numerical codes, the researcher should facilitate computer manipulation and analysis of the responses

Multiple Response Question – Rank Order Question Please rank the following Insurance companies by placing a 1 beside the company you think is best overall, a 2 beside the company you think is second best, and so on. __________Progressive __________All State __________National Q2 Q2 How would you code the previous question to be added to the questionnaire ? This question requires as many variables (and columns) as there are objects to be ranked: 3 separate variables are needed

Creating a Data Set Organized collection of data records Each sample unit within the data set is called a Case or Observation Structure of a Data Set – The number of observations = n – The total number of variables embedded in the questionnaire is m, then Data set = n x m matrix of numbers Importance of Coding Sheet: Anybody can enter /check data set. (Copy of coding sheet)

SPSS Data Set 2 Views : Variable and Data. Raw Variable (labels and values) Transformed Variable (compute and recode)

Preliminary Data Analysis: Basic Descriptive Statistics Preliminary data analysis examines the central tendency and the dispersion of the data on each variable in the data set Measurement level dictates what to do Feeling for the data What can we do: limitations on next slide? Run descriptives. (outputs 1)

Measures of Central Tendency and Dispersion for Different Types of Variables

Why Averages May be Misleading Researchers tested a new sauce product and found – Mean rating of the taste test was close to the middle of the scale, which had "very mild" and "very hot" as its bipolar adjectives Researcher’s conclusion – Consumers need really neither really hot nor really mild sauce

Why Averages May be Misleading (Cont’d) Deeper examination revealed – The existence of a large proportion of consumers who wanted the sauce to be mild and an equally large proportion who wanted it to be hot nor really mild sauce Moral of the story: – A clear understanding of the distribution of responses can help a researcher avoid erroneous inferences. Talk about Skewness and Kurtosis.

Crosstabs: Occurencies in specific condition. Most of the time with categorical variables Examples to run

Cross-Tabulations- Comparing frequencies: Chi-square Contingency Test Technique used for determining whether there is a statistically significant relationship between two categorical (nominal or ordinal) variables

Cross-Tabulation Using SPSS for National Insurance Company One crucial issue in the customer survey of National Insurance Company was how a customer's education was associated with whether or not she or he would recommend National to a friend.

Need to Conduct Chi-square Test to Reach a Conclusion The hypotheses are: – H 0 :There is no association between educational level and willingness to recommend National to a friend (the two variables are independent of each other). – H a :There is some association between educational level and willingness to recommend National to a friend (the two variables are not independent of each other). – Let’s do it….

Conducting the Test Test involves comparing the actual, or observed, cell frequencies in the cross-tabulation with a corresponding set of expected cell frequencies(E ij )

Expected Values n i n j E ij = ----- n where n i and n j are the marginal frequencies, that is, the total number of sample units in category i of the row variable and category j of the column variable, respectively

where r and c are the number of rows and columns, respectively, in the contingency table. The number of degrees of freedom associated with this chi ‑ square statistic are given by the product (r - 1)(c - 1). Chi-square Test Statistic

Computed Chi- square value P-value National Insurance Company Study

National Insurance Company Study --P-Value Significance The actual significance level (p-value) = 0.019 the chances of getting a chi-square value as high as 10.007 when there is no relationship between education and recommendation are less than 19 in 1000. The apparent relationship between education and recommendation revealed by the sample data is unlikely to have occurred because of chance. We can safely reject null hypothesis.

Precautions in Interpreting Cross Tabulation Results Two-way tables cannot show conclusive evidence of a causal relationship Watch out for small cell sizes Increases the risk of drawing erroneous inferences when more than two variables are involved

Overview of Techniques for Examining Associations Spearman Correlation Coefficient Technique The technique is appropriate when – The degree of association between two sets of ranks (pertaining to two variables) is to be examined Illustrative Research Question(s) This Technique Can Answer: – Is there a significant relationship between motivation levels of salespeople and the quality of their performance? Assume that the data on motivation and quality of performance are in the form of ranks, say, 1through 20, for 20 salespeople who were evaluated subjectively by their supervisor on each variable

Overview of Techniques for Examining Associations (Cont’d) Pearson Correlation Coefficient Technique This technique is appropriate when – The degree of association between two metric-scaled (interval or ratio) variables is to be examined Illustrative Research Question(s) This Technique Can Answer: – Is there a significant relationship between customers' age (measured in actual years) and their perceptions of our company's image (measured on a scale of 1to 7)?

Spearman Correlation Coefficient A Spearman correlation coefficient is a measure of association between two sets of ranks d i = the difference between the ith sample unit's ranks on the two variables n = the total sample size

The Pearson correlation coefficient is the degree of association between variables that are interval-or ratio-scaled. Pearson correlation coefficient (r xy ) between them is given by n = sample size (total number of data points) X and Y = means X i and Y i = values for any sample unit i s x and s y = standard deviations n  i = 1 (X i – X)(Y i – Y) r xy = ----------------------------- (n-1) s x s y Pearson Correlation Coefficient

National Insurance Company– Computing Pearson Correlation Among Service Quality Constructs National Insurance Company was interested in the correlations between respondents’ overall service- quality perceptions (on the 10-point scale) and their average ratings along each of the five dimensions of Service Quality

National Insurance Company– Computing Pearson Correlation Among Service Quality Constructs Using SPSS

Interpreting Pearson Correlation Coefficients Each of the five service-quality measures (reliability, empathy, tangibles, responsiveness, and assurance) is significantly related to the overall quality (OQ) at the.001 level of significance Responsiveness has the strongest correlation (.8625) Tangibles have the weakest correlation (.5038) All the correlations are strong enough to be meaningful

Comparing Means Mainly T-tests and ANOVAs T-test on OQ and gender.

Independent T-tests Independent Variable with 2 categories max. Equality of variance (cf output) 88% of chance that the difference of.04 is due to chance (random effect). Cannot reject the null hypothesis.

Analysis of Variance ANOVA is appropriate in situations where the independent variable is set at certain specific levels (called treatments in an ANOVA context) and metric measurements of the dependent variable are obtained at each of those levels

Example 24 Stores Chosen randomly for the study 8 Stores randomly chosen for each treatment Treatment 1 Store brand sold at the regular price Treatment 2 Store brand sold at 50¢ off the regular price Treatment 3 Store brand sold at 75¢ off the regular price monitor sales of the store brand for a week in each store

Table 15.2 Unit Sales Data Under Three Pricing Treatments

ANOVA –Grocery Store Hypothesis Grocery Store Example – H o  1 =  2 =  3 – H a At least one  is different from one or more of the others Hypotheses for K Treatment groups or samples – H o  1 =  2 = ………..  k – H a At least one  is different from one or more of the others

Exhibit 15.1 SPSS Computer Output for ANOVA Analysis

Exhibit 15.1 SPSS Computer Output for ANOVA Analysis (Cont’d) There is less than a.001 probability of obtaining an F- value as high as 137.447

ANOVA OQ recommendation and OQ, individual variable OQ and EDUC (Graph)..and post hoc

Overview of Techniques for Examining Associations (Cont’d) Simple Regression Analysis Technique This technique is appropriate when – A mathematical function or equation linking two metric-scaled (interval or ratio) variables is to be constructed, under the assumption that values of one of the two variables is dependent on the values of the other

Overview of Techniques for Examining Associations–Simple Regression Analysis (Cont’d) Illustrative Research Question(s) this Technique Can Answer: – Are sales (measured in dollars) significantly affected by advertising expenditures (measured in dollars)? – What proportion of the variation in sales is accounted for by variation in advertising expenditures? How sensitive are sales to changes in advertising expenditures?

Overview of Techniques for Examining Associations (Cont’d) Multiple Regression Analysis Technique This technique is appropriate when – Under the same conditions as simple regression analysis except that more than two variables are involved wherein one variable is assumed to be dependent on the others

Overview of Techniques for Examining Associations (Cont’d) Illustrative Research Question(s) this Technique Can Answer: – Are sales significantly affected by advertising expenditures and price (where all three variables are measured in dollars)? – What proportion of the variation in sales is accounted for by advertising and price? How sensitive are sales to changes in advertising and price?

Simple Regression Analysis Generates a mathematical relationship (called the regression equation) between one variable designated as the dependent variable (Y) and another designated as the independent variable (X)

Independent Variable Vs. Dependent Variable Independent variable – Explanatory or predictor variable – Often presumed to be a cause of the other Dependent variable – Criterion Variable – Influenced by the independent variable

Practical Applications of Regression Equations The regression coefficient, or slope, can indicate how sensitive the dependent variable is to changes in the independent variable The regression equation is a forecasting tool for predicting the value of the dependent variable for a given value of the independent variable

Precautions In Using Regression Analysis Only capable of capturing linear associations between dependent and independent variables A significant R 2- value does not necessarily imply a cause-and-effect association between the independent and dependent variables A regression equation may not yield a trustworthy prediction of the dependent variable when the value of the independent variable at which the prediction is desired is outside the range of values used in constructing the equation

Precautions In Using Regression Analysis (Cont’d) A regression equation based on relatively few data points cannot be trusted The ranges of data on the dependent and independent variables can affect the meaningfulness of a regression equation

Multiple Regression Analysis Yi = a + b 1 X 1i + b 2 X 2i + … + b k X ki Y i is the predicted value of the dependent variable for some unit i; X 1i, X 2i, …, X ki are values on the independent variables for unit i; b l, b 2,..., b k are the regression coefficients; a is the Y-intercept representing the prediction for Y when all independent variables are set to zero

National Insurance Company– Multiple Regression Using SPSS Jill and Tom were interested in conducting a multiple regression analysis wherein overall service quality perceptions is the dependent variable and the average ratings along the five dimensions are the indpendent variable

Factor Analysis A data and variable reduction technique that attempts to partition a given set of variables into groups of maximally correlated variables

Factor Analysis Output and Its Interpretation Primary output of factor analysis is a factor- loading matrix

Table 15.4 Factor-Loading Matrix Based on Data from Study of Star Customers 3 Variables load high on factor 1 3 Variables load high on factor 2

Reducing Star Data X 1, X 4, and X 6 can be combined into one factor X 2, X 3, and X 5 can be into a second factor 6 variables can be reduced to two factors

Potential Applications of Factor Analysis Used to – Develop concise but comprehensive, multiple- item scales for measuring various marketing constructs – Illuminate the nature of distinct dimensions underlying an existing data set – Convert a large volume of data into a set of factor scores on a limited number of uncorrelated factors

Cluster Analysis Segment objects into groups so that members within each group are similar to one another in a variety of ways Useful for segmenting customers, market areas, and products

Use of Cluster Analysis Firm offering recreational services wanted to enter a new region of the country They gathered data on more than 100 characteristics including – Demographics – Expenditures on recreation – Leisure time activities – Interests of household members The firm identified one or several household segments that are likely to be most responsive to its advertising and to its services

How Does Cluster Analysis Work? Cluster analysis measures the similarity between objects on the basis of their values on the various characteristics

Exhibit 15.8 Clusters Formed by Using Data on Two Characteristics High Low Extent of participation in outdoor sporting events Extent of watching outdoor sporting events on TV

Data Analysis: Data Analysis: Review and Practical Application using SPSS.

Similar presentations

Presentation on theme: "Data Analysis: Data Analysis: Review and Practical Application using SPSS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Analysis: Data Analysis: Review and Practical Application using SPSS.

Similar presentations

Presentation on theme: "Data Analysis: Data Analysis: Review and Practical Application using SPSS."— Presentation transcript:

Similar presentations

About project

Feedback