Presentation on theme: "1 Multiple-choice example. 2 Answer No, its zero. A is wrong. No, the standard deviation is 1. B is wrong. Yes, it is zero. C is correct. We have our."— Presentation transcript:
1 Multiple-choice example
2 Answer No, its zero. A is wrong. No, the standard deviation is 1. B is wrong. Yes, it is zero. C is correct. We have our answer – its C.
3 Second example
4 Answer No, there is only one standard normal distribution, to which any other normal distribution can be transformed. Yes, just one. No, it is bell-shaped and symmetrical. We have our answer: its B.
5 SPSS exercise 1 Open the SPSS data file containing 4000 IQs. Use the Compute procedure (in the Transform menu) to standardise the scores. Use Descriptives to obtain the mean and standard deviation of z.
6 What you should find You should find that when you STANDARDISE the IQs by subtracting the mean from each score and dividing the difference by the standard deviation, the mean of the new distribution is zero and the SD is 1.
7 SPSS exercise 2 Assuming that height is normally distributed with a mean of 69 inches and an SD of 2.6 inches, what is the probability of a person having a height between 74.2 inches and 76.8 inches? Solve by using the CDF to find the cumulative probabilities directly and subtracting. Transform the heights to z and compare the cumulative probabilities you obtain with those you obtained using the first approach.
8 A data set? Several of you e-mailed me saying that, in order to solve this problem, you needed data on height. In fact, to solve this one, you only need ONE value in your data set – and that can be any number you like! However, for other purposes. I have also put a set of heights on my Website.
9 A small data set You only need one number in the data set. Any value will do. In Variable View, I defined one variable, Nine and, in Data View, entered a single value: 9.
10 Find the equivalent z values To transform the values 74.2 and 76.8 into z, subtract the mean (69) from each value of X and divide the difference by the standard deviation (2.6). The z values are 2 and 3, respectively.
11 Cumulative probability The CUMULATIVE PROBABILITY of a particular value is the probability of a value LESS THAN OR EQUAL TO the value. The cumulative probability of a value at the 30 th percentile is 0.3. The cumulative probability of a value at the 70 th percentile is 0.70. 30th 70th Cumulative probability of a score at the 30 th percentile = 0.30 Cumulative probability of a score at the 70 th percentile = 0.70.
12 65 70 CumProb (65) CumProb (70) 65 70 Pr of a height between 65 & 70 Subtract the two cumulative probabilities to obtain the probability of a height between 65 and 70. Last weeks example
13 Finding the cumulative probability of 76.8 Adjust the Decimals settinto 5 again; otherwise the robability will appear in Data View as 1.00. Adjust the Decimals setting in Variable View to 5; otherwise the probability in Data View will appear as 1.00. We are specifying a normal distribution with mean 69 and SD 2.6.
14 Adjust the decimals in Variable View To see the whole Name, click-and-drag the bar between the Name and Type columns. Adjust the Decimals to 5; otherwise, the probability will appear as 1.00. Widen the Name column by clicking on the vertical line and dragging it with the mouse.
15 Finding the cumulative probability of 74.2 Adjust the Decimals setting in Variable View.
16 Probability of a height between 74.2 and 76.8 inches Adjust the Decimals setting to 5.
17 The answer To see the complete variable names, you will have to click and drag the boundaries with the mouse as before. The probability of a height between 74.2 and 76.8 = 0.02
18 Using the equivalent values of z Now we shall see whether the same operations with the equivalent z values of 2 and 3 produce the same results.
19 The cumulative probability of z = 3 Specifies the normal distribution with mean = 0 and SD = 1. This is the standard normal distribution.
20 The cumulative probability of z = 2
21 Probability z between 2 and 3
22 The results are exactly the same. As before, adjust the Decimals setting in Variable View to 5 in order to see the probabilities. A question about the probability of a specified range of values of a normal variable X can always be translated into one about the probability of the equivalent range of the standard normal variable z.
23 Lecture 8 THE PEARSON CORRELATION
24 Three strategies in psychological research 1.Experimental 2.Correlational 3.Observational
25 1. Experimental research In experimental research, the experimenter MANIPULATES one variable to demonstrate that it has a CAUSAL EFFECT upon another, which is MEASURED or recorded during the course of the experiment. Such manipulation is the hallmark of a true experiment.
26 2. Correlational research In CORRELATIONAL RESEARCH, all variables are measured as they occur in the participants. There is NO MANIPULATED VARIABLE. All variables are properties or characteristics of the people we are studying.
27 Screen violence and actual violence Does watching violent material promote actual violence in real life? Ethical and practical considerations make it difficult to manipulate the amount of violent material that children watch. It is easier to measure children on the amount of screen violence to which they are exposed (or choose to view) and upon their actual violence.
28 Method We devise a scale measuring actual violence. We devise another scale measuring amount of exposure to violent programmes. We measure some children on both Exposure to screen violence and Actual violence.
29 Number of measured variables It is useful to classify data sets according to the number of MEASURED variables they contain. In the data from the Caffeine experiment, there is only one measured variable – Performance. Other kinds of research (particularly correlational studies) involve two or more measured variables.
30 Classification In UNIVARIATE data sets, there is only ONE measured variable. In BIVARIATE data sets, there are TWO measured variables. In MULTIVARIATE data sets, there are THREE OR MORE measured variables.
31 A bivariate data set If we measure people on TWO variables, we shall obtain a BIVARIATE data set. In the Violence study, we have measured children on TWO variables: Exposure (to violence) and Actual violence.
32 Correlation A statistical ASSOCIATION or CORRELATION is a tendency for events or values to occur together. If exposure to screen violence promotes actual violence, we should expect those who watch more violence to be more violent and those who watch less to be less violent. We can expect an ASSOCIATION or CORRELATION between Exposure and Actual violence.
33 A scatterplot Here is a picture of the results of our study. In this SCATTERPLOT, each point represents one of the children. Richard got a score of 2 on Exposure and 4 on Actual. John got 9 on Exposure and 8 on Actual. Jim got scores of 5 on both Exposure and Actual. Richard John Jim
34 The meaning of linear The Violence scatterplot shows evidence of a LINEAR association between Exposure to and Actual violence. Here linear means of the nature of a straight line.
35 The equation of a straight line Degrees Fahrenheit Degrees Celsius (0, 0) Intercept 32 Q P
36 The slope of the line The COEFFICIENT 9/5 in front of the Celsius variable is the SLOPE of the straight line. When the Celsius temperature increases by FIVE degrees, the Fahrenheit temperature increases by NINE degrees. When the Celsius temperature increases by one degree, the Fahrenheit temperature increases by 1.8 degrees.
37 Linear equations
38 Linear, but imperfect association If the scatterplot is elliptical in shape, a linear association is indicated. In psychology, all measurement is subject to random error. No association between measured variables is ever perfect. That is why the points do not all lie on a straight line.
39 A negative association … It is sometimes arbitrary whether the numerical scale increases in the direction of the dimension concerned. If the direction of the Exposure scale is reversed, so is the slope of the long axis of the ellipse.
40 Independence Select a large sample at random from a population and place the values in a column. Select another sample from the same population at random and place those values alongside the values of the first sample. The two samples are independent, because the data are not paired in any meaningful sense. There should be no association between the two variables and the scatterplot will reflect this independence.
41 Scatterplot indicating no association When the cloud of points is circular, there is NO ASSOCIATION between the variables.
42 The Pearson correlation (r) The PEARSON CORRELATION is a measure of a supposed linear association between two variables.
43 The Pearson correlation Sums of squares Sum of products
Explanation The numerator of r is known as a SUM OF PRODUCTS (SP). It is the sum of products that captures the extent to which X and Y are associated, or CO- VARY. The sums of squares in the denominator merely constrain the range of variation of r.
The sum of products captures covariation Points in the upper right quadrant have positive deviation products; points in the lower left also have positive deviation products (a minus times a minus is a plus). Points in the other two quadrants have negative products. Since the positive products predominate, we can expect the covariance to be very large. The negative products are small: the points are near the intersection of the mean lines. Mean Preference score Mean Actual Violence score
46 A strong correlation A correlation is evident from the scatterplot. When the shape of a scatterplot is a narrow ellipse like this, a strong correlation is indicated. This correlation is +0.85.
47 A negative association … This scatterplot shows an association as strong as the first one. The value of the Pearson correlation is -0.85 The sign of r indicates only the pattern (positive or negative): it does not express the STRENGTH of the association. A correlation of -0.85 is as strong an association as one of +0.85. A correlation of -0.6 is stronger than one of +0.3.
48 Perfect linear association If two variables have a PERFECT linear relationship, the graph of one against the other is a straight line. All the points lie along the line. The graph of temperature in degrees Fahrenheit against the equivalent Celsius temperature is a straight line. The Pearson correlation between Celsius and Fahrenheit is +1.
49 A warning The Pearson correlation r is a measure of a SUPPOSED linear association between two variables. A Pearson correlation, however, cannot be taken at its face value: the assumption of linearity of association must be verified. This is done by ROUTINELY examining the scatterplot.
50 Always explore your data Examine your data to make sure your statistics make sense. SPSS can easily calculate a correlation and report a p-value in a test of significance. With small samples, OUTLIERS and INAPPROPRIATE DISTRIBUTIONS can create problems for a statistical analysis. This is as true for a correlation as it is for the results of a t-test.
51 Draw pictures of the data A GRAPH can be of great assistance in getting to know your data set. A glance at a SCATTERPLOT will show immediately whether a Pearson correlation is an appropriate measure of the strength of association between two variables.
52 What to look for in a scatterplot The cloud of points should either be ELLIPTICAL or CIRCULAR. An ellipse indicates a linear relationship; a circular cloud indicates INDEPENDENCE. In either case, the Pearson correlation tells the true story.
53 Question We have been told of a bivariate data set, from which the calculated Pearson correlation is ZERO: r = 0. From this information alone, can we conclude that the two variables are independent, that is, there is no association between them?
54 The scatterplot
55 A perfect (but non-linear) association
56 Perfect association, zero correlation The Pearson correlation is a measure of a supposed LINEAR relationship. Here we have a perfect, but NON-LINEAR association. The correlation will be zero, suggesting that there is no relationship.
57 Anscombes data set Many years ago, Fred Anscombe (American Statistician, 1973) published a famous paper warning readers of the pitfalls awaiting the unwary user of information about correlations. There were four bivariate data sets, all of which produced a Pearson correlation with a value of +.82.
58 An elliptical scatterplot This is fine. The elliptical scatterplot indicates that there is indeed a basically linear relationship between variable Y1 and variable X1.
59 A non-linear relationship There is indeed a perfect association between variable Y2 and variable X1. This relationship, however, is non-linear and is understated by the value of r.
60 Another understatement by r There is a substantial correlation. The scatterplot, however, is not elliptical. Basically there is a perfect linear relationship between Y3 and X1. The outlier (a typo?) has depressed the value of r.
61 No association There is NO association between Z and Y. The high value of r is driven solely by the presence of a single OUTLIER.
62 Anscombes rule When you examine a scatterplot (something you should ALWAYS do when interpreting a correlation), ask yourself the following question: Would the removal of one or two points at random affect the basically ellipical shape of the scatterplot? If the shape would remain essentially the same, the value of r accurately reflects the association between the variables.
63 Summary The Pearson correlation r is a measure of the strength of a supposed linear relationship between 2 variables. It is one of the most widely used of statistical measures; but it is also one of the most misused. Wherever possible, a value of r should be interpreted in the context of the scatterplot.
64 Exercise I have placed the Violence data on my website. Order a scatterplot of these data and obtain the Pearson correlation.