S3: Chapter 5 – Regression and Correlation Dr J Frost Last modified: 13 th February 2015

What this chapter is mostly about NVR Score Avg AS point score * Disclaimer: These values are made up! One question that naturally arises amongst teachers at Tiffin is whether the 11+ tests are an effective predictor of academic success later on. Tiffin has recently dropped its Non- Verbal/Verbal tests 11+ in favour of English/Maths tests. It will be interesting to see to what extent the correlation between 11+ scores and later metrics (e.g. average A2 point scores) increases. We could just calculate the PMCC of the two variables, but we might just be interested in comparing the rankings.

Tiffin Data Fun Facts Fun True Fact: The PMCC between the 11+ ranks of students in the current L6 and their last JMC score ranks is Fun True Fact: The PMCC between the NVR ranks and C1 test ranks (when taken in Year 11) is For Year 11s in Fun True Fact: The PMCC between Year 7 end- of-year test rank and Year 9 test rank is For Year 11s in Fun True Fact: The PMCC between Year 8 end- of-year test rank and Year 9 test rank is For Year 11s in Fun True Fact: The PMCC between: NVR + Year 9 test rank: VR + Year 9 test rank: 0.16 For Year 9s in NVR%VR% All Tiffinians84% Oxbridge Tiffs89%85% ? ? ? ? ? ? ?? Year 7s in 2007

RECAP: Product Moment Correlation Coefficient ?? ? ? ? ? ? ? ?

Spearman’s rank correlation coefficient However, if we’re simply interested in how the rankings are correlated, we might discard the original data and use the rankings instead ?

Rankings in perfect agreement. Ranks in reverse order. No correlation in rankings. ?? ?

? ? ? ? ? ?

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (Not in textbook/exam)

Test Your Understanding Edexcel S3 June 2011 Q2 ?

(Bro Exam Tip: This can be tested!) Spearman’s Rank: Makes no assumption about original data: original data need not be linear. PMCC: We can only do a hypothesis test if the variables are (jointly) normally distributed. (We’ll do hypothesis testing in a sec)

Exercise 5A

Hypothesis Testing What would you think would be a suitable null hypothesis what analysing the correlation of two variables? The null hypothesis in general is when the data is random, i.e. in this case, that there is no linear correlation between them. Now suppose the two variables were each normally distributed. ? See Demo > (File Ref: PMCC_Correlation_Model)

Questions from Demo Given the points were randomly generated, what do we expect the correlation to be? 0: if the data was randomly generated and the variables were independent there’s no inherent connection between them. Is it possible that for some randomly generated independent data, the correlation may be high? Yes, just by chance they could show either positive or negative correlation % ? ?

Correlation Coefficient Table ? ? ?

Example Hypothesis Test Null/Alternative Hypotheses? Critical Region? Conclusion?

Test Your Understanding The table shows the BMI (Body Mass Index) of a number of people along with their age. a)What assumption are we making about the data in order to carry out a hypothesis test on the Product Moment Correlation Coefficient? b)Carry out a suitable hypothesis test at the 5% level that age and BMI are correlated. Age BMI ? ?

Hypothesis Testing with Spearman’s Rank ? ? ? ? ? ?

Example ? ?

Test Your Understanding Edexcel S3 June 2011 Q2 ?

