Presentation on theme: "School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 19 Statistical Data Analysis:"— Presentation transcript:
School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 19 Statistical Data Analysis: Hypothesis Checking November 2011
In the previous lecture Introduction to Statistical Modelling Types of data Sampling methods Descriptive statistics Normal distribution Correlation between variables Correlation coefficient r Positive, Negative, or No correlation
In this lecture Hypothesis testing - introduction Steps Significance How to use significance tables Differences between two groups t-test Man-Whitney U-test
Sample Subset of the population; some of the measurements of the characteristics of the population. population sample Measures describing population characteristics PARAMETERS Measures describing sample characteristics STATISTICS Statistics estimate the parameters
Hypothesis Statement about: the value of a population parameter (e.g. its mean), or about a particular relationship that may hold between variables (e.g. correlation), or about difference between two populations. Null hypothesis H 0 : (No effect) We aim at rejecting it, which will support its alternative. Alternative hypothesis H 1 : (positive effect) We test H 0 against H 1, if H 0 is rejected, H 1 is accepted
Research and Null Hypotheses Operationalise – find parameter to compare (e.g. mean) H 1 (research hypothesis) - H 0 (null hypothesis) - H 1 can be: Two-directional (2-tailed) One-directional (1-tailed)
Example Hypotheses Consider the example with the European Area current accounts. Notice the difference between the means. Formulate a hypothesis based on your observation. population sample Hypotheses:
Error Sample Population Use the sample to test hypothesis about the population Can we be sure that the result observed in the sample is valid for the whole population, or could this be pure chance? Could it happen that we erroneously reject H 0 ?
Statistical significance p-value The probability of an error in accepting the observed result as valid (i.e. the risk of rejecting H 0 when H 0 is valid). We want p to be as small as possible:
Hypothesis Testing Procedure Formulate Alternative & Null Hypotheses Decide which sampling method to use Decide what statistical method to use Define Significance level and sample size Find the sampling distribution under the assumption H 0 and obtain the p-value Derive conclusions about H 1
What statistical test to use? Parametric Assumptions Dependent variable is continuous Random sample Independence of observations (tricky! – e.g. people working in groups, influences on behaviour) Normal distribution Homogeneity of variance (samples are from populations with equal variance) Non-Parametric Assumptions At least one of the above is violated Powerful tests Limited power tests
Statistical tests to compare groups Two groups Multiple Groups t-tests Analysis of variance Paired samples Independent samples the same sample several different samples 1 independent var. 2 independent var. m-independent var.
Paired t-test Suppose that a new functionality is introduced to an online banking system. We want to compare whether it improves the customer satisfaction. Suppose parametric assumptions are met H 1 (research hypothesis) - H 0 (null hypothesis) -
Paired t-test calculations OldNew d Run the calculations automatically
Independent (unpaired) t-test Consider A-level students applying to a university. A random sample is obtained including students applying for BA and BSc. The A-level score for each student is calculated. A difference in the means is observed. Test whether the difference is valid for the whole population. Suppose parametric assumptions are met H 1 (research hypothesis) - H 0 (null hypothesis) -
Independent (unpaired) t-test calculations Data in unpaired-t-test-example.xls Run the calculations automatically Calculations based on a weighted average of the two sample variances and
Non-parametric equivalent to t-test Man-Whitney U-test Relaxed assumptions (discrete data) Ranking (ordering) Compares medians Counts how many elements from each sample are before the elements from the other sample Less power but can be helpful Need tables for non-parametric tests
Practical task For each of the claims below identify an appropriate sampling method and statistical test that can be used to test the claim assuming that appropriate data is collected and the parametric conditions are satisfied. Formulate the research and null hypotheses. TV advertisement is more powerful than Radio advertisement. Women are more likely to buy computer games than men. Students use social computing sites more often than staff.
Summary: Hypothesis checking Hypothesis – null & alternative Comparing two groups T-test (parametric) Man Whitney (non-parametric) Significance level & Degree of Freedom min expected value to reject H 0 Calculation using specific software Online (see links on the slides) Software package, e.g. SPSS or Matlab Decision rejecting H 0 ? Statistical significance?
References Rees D.G., Essential Statistics, Chapman & Hall/CRC, Cohen, L., Holliday, M., Practical Statistics for Students, Chapman,