ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:

ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: http://people.math.yorku.ca/~zyang/ite c6310.htm Office: Tel 3049

2 Measures of Association Used when you want to determine the direction and degree of relationship between variables Various measures of association available for different applications

3 The Pearson Product–Moment Correlation (r) Most widely used measure of association Value of r can range from +1 through 0 to –1 Magnitude of r tells you the degree of LINEAR relationship between variables Sign of r tells you the direction (positive or negative) of the relationship between variables

4 Scatterplots Showing a Positive (a), Negative (b) and No Correlation (c)

5 Presence of outliers affects the sign and magnitude of r Variability of scores within a distribution affects the value of r Used when scores are normally distributed The Pearson Product–Moment Correlation (r)

6 Measures of Association Pearson Product-Moment Correlation –Index of linear relationship between two continuously measured variables Point-Biserial Correlation –Index of correlation between two variables, one of which is measured on a nominal scale and the other on at least an interval scale Spearman Rank-Order Correlation (rho) –Index of correlation between two variables measured along an ordinal scale or greater Phi Coefficient –Index of correlation between two variables measured along a nominal scale

7 Example In a recent study, researchers were interested in determining the relationship between gender and amount of time spent studying for a group of students. Which correlation coefficient should be used to assess this relationship? The researcher wants to correlate class rank with SAT scores for a group of 50 individuals, which correlation coefficient would (s)he use?

8 Linear Regression and Prediction Used to find the straight line that best fits the data plotted on a scatterplot The best fitting straight line is known as the least squares regression line The regression line is defined mathematically:

9 The regression weight (b) is based on raw scores and is difficult to interpret Both independent and dependent variables must be measured at the interval or ratio scales. You can predict a value of Y from a value of X once the regression equation has been calculated –The difference between predicted and observed values of Y is the standard error of estimate Linear Regression and Prediction

10 Example Correlation Matrix

11 Basic Concepts Inferential statistics Sampling Distribution –The distribution of every possible sample taken from a population –The critical values of a statistic are the sampling distribution for that statistic Sampling Error –The difference between a sample mean and the population mean –The standard error of the mean is a measure of sampling error

12 Degrees of Freedom –The number of scores in sample with a known mean that are free to vary and is defined as n-1 –Used to find the appropriate tabled critical value of a statistic Parametric vs. Nonparametric Statistics –Parametric statistics make assumptions about the nature of an underlying population –Nonparametric statistics make no assumptions about the nature of an underlying population Basic Concepts

13 Relationship between Population and Samples When a Treatment Had No Effect

14 Relationship between Population and Samples When a Treatment Had an Effect

15 Null and Alternative Hypotheses Null hypothesis –Always predict that there is no difference between the groups being compared. –Typically what the researcher does not expect to find Alternative or research hypothesis

16 Statistical Errors

17 Statistical Significance Alpha level One-tailed test Two-tailed test

18 Example How do inferential statistics differ from descriptive statistics? A researcher hypothesizes that the students accepted by Computer Science department have a higher academic average than the national average. Identify H 0 and H a. Is this a one- or two-tailed test? A researcher collects data on children's weights from a random sample of children in the South and concludes that children in the south weigh less than the national average. The researcher, however, does not realize that the sample includes many children who are small for their age and that in reality there is no difference in weight between children in the South and the national average. What type of error is the researcher making? If a researcher decides to use the.10 level rather than the conventional.05 level of significance, what type of error is more likely to be made? Why? Of the.01 level is used, what type of error is more likely? Why?

19 Parametric Statistics Assumptions –Scores are sampled randomly from the population –The sampling distribution of the mean is normal –Within-groups variances are homogeneous Serious violation of one or more assumption(s) may bias a statistical analysis

20 The t test and the z test The t test –Two-Sample Tests t test for independent samples used when subjects were randomly assigned to your two groups t test for correlated samples The z test –For the difference between two proportions is used to determine if two proportions differ significantly

21 The Analysis of Variance is used when you have more than two groups in an experiment The F-ratio is the statistic computed in an Analysis of Variance and is compared to critical values of F A significant overall F may require further planned or unplanned (post hoc) follow-up analyses The analysis of variance may be used with unequal sample size (weighted or unweighted means analysis) Beyond Two Samples

22 Interpreting your F Ratio Planned comparison Unplanned comparison Sample size Unweighted means analysis Weighted means analysis The One-Factor Between-Subject ANOVA

23 The One-Factor Within-Subjects ANOVA The Latin Square ANOVA Interpreting your F ratio

24 The Two-Factor Between-Subjects ANOVA Main effects and interactions Sample size Interpreting the results

25 More about Factorial Design The Two-Factor Within-Subjects ANOVA Mixed design Higher-order and special-case ANOVAs

ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:

Similar presentations

Presentation on theme: "ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:

Similar presentations

Presentation on theme: "ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:"— Presentation transcript:

Similar presentations

About project

Feedback