Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 6960- Research Methods - 1 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Research variables: Every research project – irrespective.

Similar presentations


Presentation on theme: "CSCI 6960- Research Methods - 1 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Research variables: Every research project – irrespective."— Presentation transcript:

1 CSCI 6960- Research Methods - 1 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Research variables: Every research project – irrespective of its level of constraint – includes one or several variables that is observed with respect to one or several other ones. The variables that are manipulated by the researcher are called the independent variables. The variables the changes of which the researcher is interested in often as a result of manipulating the independent variables are called the dependent variables.

2 CSCI 6960- Research Methods - 2 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) In scientific research, variables must be measured. This means we have to assign a number or level to represent that variable. The measurement thus obtained yields therefore a numerical value called a data-point. These data are the basic units for subsequent data analysis and interpretation. Needless to say, the particular statistical analysis chosen will be determined largely by the scale used to collect the data. We have already studied the concept of measurement scales and their characteristics.

3 CSCI 6960- Research Methods - 3 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Controlling variables: Measurement is not a straightforward assignment of a number to an observed phenomenon. It should be, we have already learnt: 1.Reliable 2.Valid 3.Range effective 4.Consonant 5.Practical, and 6.Dimensional

4 CSCI 6960- Research Methods - 4 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) For the measure to be reliable and valid, we must ensure its repeatability and that it measures what it is supposed to measure. We therefore must: Control errors in measurement this is best done when we; Define as exactly as possible the circumstances and procedures used to make such measurements.

5 CSCI 6960- Research Methods - 5 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Operational Definition: An operational definition is a definition of a variable in terms of the procedures used to measure or manipulate it. In other words an operational definition specifies the activities of the researcher in measuring and / or manipulating a variable. Example: In the course of research in defect management in software a researcher needed to measure the number of lines of code of a number of code segments. A precise operational definition was required to ensure reliability and validity.

6 CSCI 6960- Research Methods - 6 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Need for Objectivity: Science stresses objectivity. But why is objectivity superior to subjectivity? One reason is that objectivity is non person-specific. Objectivity is more repeatable and therefore their measures more reliable as personal taste, view-point and inclinations do not enter into the equation. Good quantitative research therefore needs objective measures that can be performed by anyone properly trained to use them and give the same result irrespective of who does the measuring.

7 CSCI 6960- Research Methods - 7 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Statistical Analysis: Statistical analysis provides objective ways of evaluating patterns of events or patterns in our data by computing the probability of observing such patterns by chance alone. Insisting on the use of statistical analyses on which to draw conclusions is an extension of the argument that objectivity is critical in science. Without the use of statistics, little can be learnt from most research studies.

8 CSCI 6960- Research Methods - 8 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Types of Statistical Analysis: There are two major types of statistical procedures used in research: 1.Descriptive Statistics 2.Inferential Statistics Descriptive statistics simplify and organize data collected. Inferential statistics assist in making inferences about the population of data-points represented and the relationship between data-point populations. In other words it helps us understand what the data means in our context.

9 CSCI 6960- Research Methods - 9 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Descriptive Statistics: There are two important groups of descriptive statistics: 1.Frequency counts and frequency distributions 2.Summary statistics A frequency count is the computation of the frequency of a particular subject that fall in a given category. The placement of all the subjects in their categories forms a frequency distribution. Summary statistics describe the data with just one or two numbers to make comparison of groups easier and also to provide a basis for later analysis when inference would be made.

10 CSCI 6960- Research Methods - 10 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Frequency counts and distribution; Nominal and ordinal data: For nominal and ordinal data, simple frequencies are computed. Sometimes frequencies are also shown in percentages. Frequency counts and distribution; Score data: For score data (interval and ratio), simple frequencies or frequency bands may be used. A frequency band or a grouped frequency distribution is used to make long tables more manageable and to allow working with continuous variables.

11 CSCI 6960- Research Methods - 11 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Example: In a –not so politically correct – research project, there were 24 subjects. Their age, income, number of years of programming experience in a 3GL, their gender (Male, Female) and their latest performance evaluation score (Excellent, Good, Fair,Poor) was recorded. We have the data presented to you on the next page:

12 CSCI 6960- Research Methods - 12 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1)

13 CSCI 6960- Research Methods - 13 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) What type (scale) of data do each of these variables generate? Now for some frequencies: 1. Frequency of males and females in our sample: 2. Frequency distribution of income:

14 CSCI 6960- Research Methods - 14 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Summary Statistics: Measures of central tendency: These provide an indication of the center of the distribution where most of the scores tend to cluster. There are three principal measures of central tendency: Mode, Median, and Mean. Mode: The most frequently occurring score in a distribution Median: The middle score Mean: The arithmetic average of the scores in a distribution.

15 CSCI 6960- Research Methods - 15 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Measures of Variability: Variability is the measure of the spread in the data. There are four measures for this concept. These are: Range, Average deviation, Variance and Standard deviation. Range: The distance from the lowest to the highest score. Given either by providing the both the lower and the higher scores, or just the difference. Average deviation: The arithmetic average of the distance that each score is from the mean.

16 CSCI 6960- Research Methods - 16 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Variance: The average squared distance from the mean. It is computed by summing the squared distances from the mean and dividing by the number of scores minus 1 (called the degree of freedom). The variance is in square units of the original score, whereas the mean is in the same units. In order to have a good measure of spread in the original units, we take the square root of the variance. This is called the standard deviation.

17 CSCI 6960- Research Methods - 17 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Measuring relative performance: It is often useful to observe how a subject scored relative to the rest of the sample. This is a normalization technique that reports the deviation from the sample mean for each score normalized by the standard deviation. This is usually called the Z score. Note that the Z score is dimensionless and will be positive if the subject scored above the mean and negative if the score was below the mean. The value of Z shows how many standard deviations the score is distant from the mean.

18 CSCI 6960- Research Methods - 18 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Imagine a nationwide standardized test such as the SATs. We can always calculate the mean and the standard deviation of the scores. Each individual student’s Z score would give us a range of numbers, say from –n to +m (n and m may be different but usually they are no larger than 6). In a perfect sample, there would be as many people who score higher than the mean as those who score lower. In fact in such a sample for any person scoring x above there would be one who has scored x below the mean. In a distribution that has these characteristics the mode, the median and the mean coincide. In a typical sample however, there is tendency for individuals to be “average”. In that there are a much greater number of individuals who score average or near it than significantly above or

19 CSCI 6960- Research Methods - 19 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) below. As we move away from the average, therefore, the number of students having scored that score reduces. In closed samples (where there is a minimum and a maximum score) the distribution is between the minimum and the maximum Z score, in open samples, the distribution is asymptotic. The mathematical abstraction that describes such a curve is called the normal distribution. As mentioned before, a perfect (or bilaterally symmetrical normal curve) is one in which the mean, the median and the mode coincide and the right hand side and the left hand side of the curve are identical in shape. A negatively (positively) skewed curve is one in which the median and the mode are larger (smaller) than the mean.

20 CSCI 6960- Research Methods - 20 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Many (in fact infinitely many) other distributions exist. The normal distribution or one of its skewed forms however is the one that tends to describe the behavior of populations and samples that range freely and naturally in their score. In other words the normal distribution is the distribution for chance occurrence or random, natural variability. Looking at a normal curve, one can see that at the mean, there are as many scores above it as there is below. So the probability of getting a score above/below the mean would be p=0.5. As we move to the right, the probability of getting a higher score diminishes, at one standard deviation (1s), the probability would be 0.16. At 2s it would be 0.023 and at 3s it would be 0.001.

21 CSCI 6960- Research Methods - 21 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) So if your SAT score was 3s above the mean, only one in a thousand of your peers did as well as you did. It is important to recognize that samples are drawn from populations, they represent population (to a degree) but they are not identical to populations – unless they contain data on every member of the population (1:1 sample). The interesting question then will be; Given a sample from a given population, to what degree are we confident that the sample represents the population?

22 CSCI 6960- Research Methods - 22 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Confidence Intervals Given the sample of 1, multiple measures of X wrt to the one sample will provide a wide range of values centered at Given a large sample, say n, multiple measures of X wrt to each n will provide a narrower range of values centered at Without proof, the measure of this range is Which, for large n, is a good estimate for the variability in the population

23 CSCI 6960- Research Methods - 23 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Now we had: which shows the deviation of a measure within a sample, Now we want to see how far is the deviation of a given measure from the mean of the entire population: Substituting the sample derived variability estimate for the population estimate:

24 CSCI 6960- Research Methods - 24 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Rearranging, we get: as deviation could be from either direction: Different values of Z (also called critical value) corresponds to different probabilities that X belongs to the population.

25 CSCI 6960- Research Methods - 25 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Confidence Level 90%95%99% Critical Value 1.6451.9602.576 So, for example, there will be 90% statistical confidence that a measure that is: From the population mean, belongs to the population.

26 CSCI 6960- Research Methods - 26 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Measures of Relationship: When we need to know about the relationship of a variable with respect to another, we need a statistic that is a measure of relationship. This relationship or association is best indexed with a correlation coefficient. Correlation is a descriptive statistic that involves at least two variables and provides an index of their relationship. With score data, the Pearson product-moment correlation should be used; With ordered data, the Spearman rank-order correlation is the choice.

27 CSCI 6960- Research Methods - 27 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Pearson product-moment correlation: It can range from –1.00 to +1.00. A correlation of +1.00 means that the two variables are perfectly related in the positive direction (as one variable increases so does the other by a predicted amount). A correlation of –1.00 is perfect negative correlation. X and Y are the two variables and r is the correlation.

28 CSCI 6960- Research Methods - 28 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Important Note: The Pearson product-moment correlation is therefore an index of the degree of linear relationship between two variables. For non-linear relationship, Pearson correlation is not only not helpful but also misleading. This is why producing a scatter plot is always an excellent idea.

29 CSCI 6960- Research Methods - 29 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Spearman rank-order correlation: For ordinal scale data, the spearman-rank order would provide a measure of correlation. Similarly, a correlation of –1.00 is perfect negative correlation and a correlation of +1.00 is a perfect positive one. The variable d represents the difference in the rank of X and Y for each subject.

30 CSCI 6960- Research Methods - 30 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Linear regression: Prediction of the value of one variable from the value of another is called regression. When the relationship is assumed to be linear, such as when Pearson or Spearman correlation is used, then we have linear regression. The line of best fit on a scatter plot represents such concept. The equation below gives the formula for predicting Y in a linear equation given by Y=bX+a

31 CSCI 6960- Research Methods - 31 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Perfect positive correlation r=1.00 Perfect negative correlation r=-1.00 No correlation r= 0.00Strong positive correlation r= 0.96

32 CSCI 6960- Research Methods - 32 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) But note: Very weak correlation r=0.13 But very strong non-linear relationship


Download ppt "CSCI 6960- Research Methods - 1 - HO 6 © Houman Younessi 2012 Lecture 6 Quantitative Procedures(1) Research variables: Every research project – irrespective."

Similar presentations


Ads by Google