Presentation on theme: "Five types of statistical analysis"— Presentation transcript:
1 Five types of statistical analysis DescriptiveWhat are the characteristics of the respondents?InferentialWhat are the characteristics of the population?DifferencesAre two or more groups the same or different?AssociativeAre two or more variables related in a systematic way?PredictiveCan we predict one variable if we know one or more other variables?
2 General Procedure for Hypothesis Test Formulate H0 (null hypothesis) and H1 (alternative hypothesis)Select appropriate testChoose level of significanceCalculate the test statistic (SPSS)Determine the probability associated with the statistic.Determine the critical value of the test statistic.
3 General Procedure for Hypothesis Test a) Compare with the level of significance, b) Determine if the critical value falls in the rejection region. (check tables)Reject or do not reject H0Draw a conclusion
4 1. Formulate H1and H0The hypothesis the researcher wants to test is called the alternative hypothesis H1.The opposite of the alternative hypothesis is the null hypothesis H0 (the status quo)(no difference between the sample and the population, or between samples).The objective is to DISPROVE the null hypothesis.The Significance Level is the Critical probability of choosing between the null hypothesis and the alternative hypothesis
5 2. Select Appropriate Test The selection of a proper Test depends on:Scale of the datanominalintervalthe statistic you seek to compareProportions (percentages)meansthe sampling distribution of such statisticNormal DistributionT Distribution2 DistributionNumber of variablesUnivariateBivariateMultivariateType of question to be answered
6 Men Women Aware 50 10 Unaware 15 25 65 35 Example A tire manufacturer believes that men are more aware of their brand. To find out, a survey is conducted of 100 customers, 65 of whom are men and 35 of whom are women.The question they are asked is:Are you aware of our brand: Yes or No. 50 of the men were aware and 15 were not whereas 10 of the women were aware and 25 were not.Are these differences significant?Men WomenAwareUnaware
7 1. Formulate H1and H0We want to know whether brand awareness is associated with gender. What are the HypothesesH0:H1:There is no difference in brand awareness based on genderThere is a difference in brand awareness based on gender
8 2. Select Appropriate Test X2 (Chi Square) Used to discover whether 2 or more groups of one variable (dependent variable) vary significantly from each other with respect to some other variable (independent variable).Are the two variables of interest associated:Do men and women differ with respect to product usage (heavy, medium, or light)Is the preference for a certain flavor (cherry or lemon) related to the geographic region (north, south, east, west)?H0: Two variables are independent (not associated)H1: Two variables are not independent (associated)Must be nominal level, or, if interval or ratio must be divided into categories
9 Awareness of Tire Manufacturer’s Brand Men Women TotalAware 50/ /Unaware / /Estimated cell FrequencynCREjiij=Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample sizeEij = estimated cell frequency
10 3. Choose Level of Significance Whenever we draw inferences about a population, there is a risk that an incorrect conclusion will be reachedThe real question is how strong the evidence in favor of the alternative hypothesis must be to reject the null hypothesis.The significance level states the probability of incorrectly rejecting H0. This error is commonly known as Type I error, The value of is called the significance level of the testIn the example a Type I error would be committed if we said thatThere is a difference between men and women with respect to brand awareness when in fact there was no difference
11 Significance Level selected is typically .05 or .01 i.e 5% or 1%In other words we are willing to accept the risk that 5% (or 1%) of the time the results we get indicate that there is a difference between men and women with respect to brand awareness when in fact there is no difference
12 3. Choose Level of Significance We commit Type error II when we incorrectly accept a null hypothesis when it is false. The probability of committing Type error II is denoted by .In our example we commit a type II error when we say that.there is NO difference between men and women with respect to brand awareness (we accept the null hypothesis) when in fact there is
13 Type I and Type II Errors Accept nullReject nullNull is trueCorrect-no errorType IerrorNull is falseType IIerrorCorrect-no error
14 Which is worse?Both are serious, but traditionally Type I error has been considered more serious, that’s why the objective of hypothesis testing is to reject H0 only when there is enough evidence that supports it.Therefore, we choose to be as small as possible without compromising .Increasing the sample size for a given α will decrease β (I.e. accepting the null hypothesis when it is in fact false)
15 Awareness of Tire Manufacturer’s Brand Men Women TotalAware 50/ /Unaware / /Estimated cell FrequencynCREjiij=Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample sizeEij = estimated cell frequency
16 Chi-Square Test: Differences Among Groups 4. Calculate the Test StatisticChi-Square Test: Differences Among Groups21)10(39502-+=X14)25(26152-+161.2264386544762510232=+c1))((-fdCRChi-square test results are unstable if cell count is lower than 5
17 Degrees of Freedomthe number of values in the final calculation of a statistic that are free to varyFor example To calculate the standard deviation of a random sample, we must first calculate the mean of that sample and then compute the sum of the squared deviations from that meanWhile there will be n such squared deviations only (n - 1) of them are free to assume any value whatsoever.This is because the final squared deviation from the mean must include the one value of X such that the sum of all the Xs divided by n will equal the obtained mean of the sample.All of the other (n - 1) squared deviations from the mean can, theoretically, have any values whatsoever..
18 5. Determine the Probability-value (Critical Value) The p-value is the probability of seeing a random sample at least as extreme as the sample observed given that the null hypothesis is true.given the value of alpha, we use statistical theory to determine the rejection region.If the sample falls into this region we reject the null hypothesis; otherwise, we accept itSample evidence that falls into the rejection region is called statistically significant at the alpha level.
19 Significance from p-values -- continued How small is a “small” p-value? This is largely a matter of semantics but if thep-value is less than 0.01, it provides “convincing” evidence that the alternative hypothesis is true;p-value is between 0.01 and 0.05, there is “strong” evidence in favor of the alternative hypothesis;p-value is between 0.05 and 0.10, it is in a “gray area”;p-values greater than 0.10 are interpreted as weak or no evidence in support of the alternative.
20 Chi-square Test for Independence 5. Determine the Probability-value (Critical Value)Chi-square Test for IndependenceUnder H0, the probability distribution is approximately distributed by the Chi-square distribution (2).2Reject H03.8422.16Chi-squareX2 with 1 d.f. at .05 critical value = 3.84
21 Reject or do not reject H0 a) Compare with the level of significance, b) Determine if the critical value falls in the rejection region. (check tables)22.16 is greater than 3.84 and falls in the rejection areaIn fact it is significant at the .001 level, which means that the chance that our variables are independent, and we just happened to pick an outlying sample, is less than 1/1000Reject or do not reject H0Since is greater than 3.84 we reject the null hypothesisDraw a conclusionMen and women differ with respect to brand awareness, specifically, men are more brand aware then women
22 Example 2:The manager of Pepperoni Pizza Restaurant has recently begun experimenting with a new method of baking its pepperoni pizzas.He believes that the new method produces a better-tasting pizza, but he would like to base a decision on whether to switch from the old method to the new method on customer reactions.Therefore he performs an experiment.
23 The ExperimentFor 40 randomly selected customers who order a pepperoni pizza for home delivery, he includes both an old style and a free new style pizza in the order.All he asks is that these customers rate the difference between pizzas on a -10 to +10 scale, where -10 means they strongly favor the old style, +10 means they strongly favor the new style, and 0 means they are indifferent between the two styles.-10+10Old pizzaNew pizza
24 One-Tailed Versus Two-Tailed Tests 1. Formulate H1and H0One-Tailed Versus Two-Tailed TestsThe form of the alternative hypothesis can be either a one-tailed or two-tailed, depending on what you are trying to prove.A one-tailed hypothesis is one where the only sample results which can lead to rejection of the null hypothesis are those in a particular direction, namely, those where the sample mean rating is positive.A two-tailed test is one where results in either of two directions can lead to rejection of the null hypothesis.
25 One-Tailed Versus Two-Tailed Tests -- continued 1. Formulate H1and H0One-Tailed Versus Two-Tailed Tests -- continuedOnce the hypotheses are set up, it is easy to detect whether the test is one-tailed or two-tailed.One tailed alternatives are phrased in terms of “>” or “<“ whereas two tailed alternatives are phrased in terms of “”The real question is whether to set up hypotheses for a particular problem as one-tailed or two-tailed.There is no statistical answer to this question. It depends entirely on what we are trying to prove.
26 1. Formulate H1and H0As the manager you would like to observe a difference between both pizzasIf the new baking method is cheaper, you would like the preference to be for it.Null HypothesisAlternativeH0 =0 (there is no difference between the old style and the new style pizzas) (The difference between the mean of the sample and the mean of the population is zero)H 0 or H1 >0Two tailtestOne tailtest= mu=population mean
27 2. Select Appropriate Test The one-sample t test is used to test whether the mean of the data sample is equal to a hypothesized value of the population from which the sample is is drawn.What we want to test is whether consumers prefer the new style pizza to the old style. We assume that there is no difference (i.e. the mean of the population is zero) and want to know whether our observed result is significantly (I.e. statistically) different.
28 Type I ErrorRejecting the null hypothesis that the pizzas are equal, when they really are perceived equal by the customers of the entire population.Type II errorNot rejecting the null hypothesis that the pizzas are equal, when they are perceived to be different by the customers of the entire population.
29 3. Choose Level of Significance Significance Level selected is typically .05 or .01I.e 5% or 1%
30 The ratings of 40 randomly selected customers and produce the following table and statistics From the summary statistics, we see that the sample mean is 2.10 and the sample standard deviation is 4.717
31 Summary StatisticsThe positive sample mean provides some evidence in favor of the alternative hypothesis, but given the rather large standard deviation does it provide enough evidence to reject H0?
32 4. Calculate the Test Statistic X- 0s/n T(n-1)
33 5. Determine the Probability-value (Critical Value) We use the right tail because the alternative is one-tailed of the “greater than” varietyThe probability beyond this value in the right tail of the t distribution with n-1 = 39 degrees of freedom is approximately 0.004The probability, 0.004, is the p-value for the test. It indicates that these sample results would be very unlikely if the null hypothesis is true.
34 7. Reject or do not reject H0 6. Compare with the level of significance, (.05)and determine if the critical value falls in the rejection regionDo not Reject H01-2.8.162.074-2.074/2Reject H0Reject H07. Reject or do not reject H0Since the statistic falls in the rejection area we reject Hoand conclude that the perceived difference between thepizzas is significantly different from zero.
35 8 ConclusionShould the manager switch to the new-style pizza on the basis of these sample results?We would probably recommend “yes”. There is no indication that the new-style pizza costs any more to make than the old-style pizza, and the sample evidence is fairly convincing that customers, on average, will prefer the new-style pizza.Therefore, unless there are reasons for not switching (for example, costs) then we recommend the switch.
36 Example 3Suppose you are the brand manager for Tylenol, and a recent TV ad tells the consumers that Advil is more effective (quicker) at treating headaches than Tylenol.An independent random sample of 400 people with a headache is given Advil, and 260 people report they feel better within an hour.Another independent sample of 400 people is taken and 252 people that took Tylenol reported feeling better.Is the TV ad correct? Or, in other words, is there a difference between the means of the two samples
37 Hypothesis Test for Two Independent Samples Test for mean difference:Null HypothesisAlternativeH0 1= 2H1 1 2Under H0 1- 2 = 0. So, the test concludes whether there is a difference between the parameters or not.
38 Comparison of means: Graphically Are the means equal?
39 2. Select Appropriate Test In this example we have two independent samplesOther examplespopulations of users and non-users of a brand differ in perceptions of the brandhigh income consumers spend more on the product than low income consumersThe proportion of brand-loyal users in Segment 1 is more than the proportion in segment IIThe proportion of households with Internet in Canada exceeds that in USACan be used for examining differences between means and proportions
40 2. Select Appropriate Test The two populations are sampled and the means and variances computed based on the samples of sizes n1 and n2If both populations are found to have the same variance then A t-statistic is calculated.The comparison of means of independent samples assumes that the variances are equal.If the variances are not known an F-test is conducted to test the equality of the variances of the two populations.Ff
42 Tylenol vs AdvilWe would need to test if the difference is zero or not.H0: A - T = 0;H1: A - T 0pA = 260/400= 0.65pT = 252/400= 0.63z == 0.66(.65)(.35)/400+ (.63)(.37)/400For large samples the t-distribution approaches the normal distribution and so the t-test and the z-test are equivalent.
43 Differences Between Groups when Comparing Means Ratio scaled dependent variablest-testWhen groups are smallWhen population standard deviation is unknownz-testWhen groups are large
44 Degrees of Freedom d.f. = n - k n = n1 + n2 k = number of groups where:n = n1 + n2k = number of groupsThe degrees of freedom is (n1 + n2 –2)
45 Tylenol vs Advil = 0.10 N(0,1) = 1.64 -1 /2 /2 - -1.64 1.64 0.66Since 0.66 is less than the critical value of 1.64 we accept the null hypothesis: there is no difference between Advil and Tylenol users
46 Test for Means Difference on Paired Samples What is a paired sample?When two sets of observations relate to the same respondentsWhen you want to measure brand recall before and after an ad campaign.Shoppers consider brand name to be more important than priceHouseholds spend more money on pizza than on hamburgersThe proportion of a bank’s customers who have a checking account exceeds the proportion who have a savings accountSince it is the same population that is being sampled the observations are not independent.
47 Test for Means Difference on Paired Samples Since both samples are not independent we employ the differences as a random sampledi=x1i-x2i i=1,2,…,nThe appropriate test is a paired-t-test
48 ExampleQ1. When purchasing golf clubs rate the importance 1-5 of priceQ2. When purchasing golf clubs rate the importance 1-5 of brandH0H1 One tailedH1 Two TailedThere is no difference in importance between brand and pricePrice is more important than brandThere is a difference in importance between brand and price
49 What is an ANOVA? One-way ANOVA stands for Analysis of Variance Purpose:Extends the test for mean difference between two independent samples to multiple samples.Employed to analyze the effects of manipulations (independent variables) on a random variable (dependent).
50 DefinitionsDependent variable: the variable we are trying to explain, also known as response variable (Y).Independent variable: also known as explanatory variables or Factors (X).Research normally involves determining whether the independent variable has an effect on the variability of the dependent variable
51 What does ANOVA tests?The null hypothesis tests whether the mean of all the independent samples is equalH0 1= 2 = 3 …..= nH1 1 2 3 ….. nThe alternative hypothesis specifies that all the means are not equal
52 Comparing Antacids Non comparative ad: Explicit Comparative ad: The maker of Acid-off, an antacid stomach remedy wants to know which type of ad results in the most positive brand attitude among consumers.Non comparative ad:Acid-off provides fast reliefExplicit Comparative ad:Acid-off provides faster relief than TumsNon explicit comparative adAcid-off provides the fastest reliefThree groups of people are exposed to one type of ad and asked to rate their attitude towards the ad.
53 Comparing Antacids Brand Attitude Means Non Comparative Explicit Non ExplicitComparativeType of Ad
54 The dependent variable is called the response variable and in this case it is brand attitude The independent variables are called factors, in this case type of adAnd the different levels of the factor are called treatments. In this case the treatments are each of the three types of ads: non-comparative, explicit comparative, non-explicit comparative.There will be two sources of variation.Variation within the treatment (e.g. within the non-comparative ad)Variation between the treatments (I.e. between the three types of ads)
55 Decomposition of the Total Variation Independent Variable XCategories Total SampleX1 X2 X3 …. XcY1 Y1 Y1 …. Y1 Y1Y2 Y2 Y2 …. Y2 Y2Yn Yn Yn …. Yn YnY1 Y2 Y3 Yc YWithinCategoryVariationSSwithinTotalVariationSSyCategoryMeanGrandMeanBetween Category Variation SSbetween
56 ANOVA Test The null hypothesis would be tested with the F distribution f(c-1)(N-c)Reject H0F distributionDegrees of Freedomcn-1 where c=number of groups, n= number of observations in a group
57 One way ANOVA investigates: Main effects factor has an across-the-board effecte.g., type of adOr ageor involvement
58 A TWO-WAY ANOVA investigates: INTERACTIONS effect of one factor depends on another factore.g., larger advertising effects for those with no experienceimportance of price depends on income level and involvement with the product