17 Proximate variables dominate ConclusionPupils vary enormouslyTeachers have the greatest impactSchools are relevantAuthorities hardly vary at allProximate variables dominate
18 HypothesisThe best teachers will be best at judging their students
19 What is bias?Bias appears in a test when part of an assessment is harder for a particular group.Or when an assessor systematically downgrades a group or an individual for construct irrelevant reasons
23 Teacher reliability How should reliability be assessed By looking at the internal consistency of judgements?By looking at the link to external assessments?By comparing over time?By comparing one teacher with others?Facets model within Rasch measurement
24 Trusting teachers’ judgement Harlen 2005 “The findings of the review by no means constitute a ringing endorsement of teachers’ assessment; there was evidence of low reliability and bias in teachers’ judgements”
25 5-14, Portfolios & single level tests 5-14 assessmentsWhat about portfolios?inter-rater very low for maths and writingEnglish teacher levels in SATsearly 1990s “considerable error”later quite common to find teacher = test resultssingle level tests compromised by teacher judgement
26 How does the power to grade affect relationships? Is it OK for teachers to assess their own pupils for High Stakes exams?How does the power to grade affect relationships?Would you give McEnroy a B?
27 Exam/test reliability Typically around 0.9 but …Distinguish the assessment ofConvergent questionsDivergent questions
28 Exam/test biasPre-tests are often used to address issues of biasBut we put much reliance on judgment.England’s major exams are largely not pre-tested.
29 Are Exams inappropriate snapshots? Issue 1: Questions must be representative samples of the course under exam conditions.Issue 2: Constraint on the nature of the assessmentMulti-method Multi-trait challengeIssue 3: Impact of stress on performancePositive & Negative (links to introversion)
31 We need to match format to content Some things must be assessed by judgement:Social interactionsQuality of researchPoetryArtSome things are best assessed left to testsMental arithmeticSpellingPhonological awarenessDiagnostic assessments (e.g. INCAS)Even so perhaps there is a final arbiter
32 Predictive validity Developed ability test (MidYIS/IQ/etc) Attainment test (Std Grade/Highers)Teacher GradeLater success – degree, salary etc
33 We need the evidence but .. Prediction is often poorTwo major reasons
39 So, poor prediction because of Prior selectionVariable outcome measures
40 Conclusion: Judgements or tests? Should we do both? (Profiles)But, how do we ensure that judgements and tests are independent?How can judgements be kept free from bias?Virtually impossible in high stakes testsEssential for formative work
42 ReferencesCampbell, D. T., & Fiske, D. W. (1959). Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix. Psychological Bulletin, 56,Cooper, B. (1998). Using Bernstein and Bourdieu to understand children's difficulties with "realistic" mathematics testing: an exploratory study. Qualitative Studies in Education, II(4),Eysenck, H. J. (2006) The Biollogical Basis of Personaility.Transaction publishersHarlen, W. (2005). Trusting teachers' judgement: research evidence of reliability and validity of teachers' assessment used for summative purposes. Research Papers in Education, 20(3),Johnson, S., Hennessy, E., Smith, R., Trikic, R., Wolke, D., & Marlow, N. (2009). The EPICure Study: Academic attainment and special educational needs in extremely preterm children at 11 years. London: Nottingham/London/Warwick.Koretz, D., Stecher, B. M., Klein, S. P. & McCaffrey, D. (1994) The Vermont Portfolio AssessmentProgram: findings and implications, Educational Measurement: Issues & Practice, 13, 5–16.Tymms, P. (1997). Value-added Key Stage 1 to Key Stage 2. London: School Curriculum and Assessment Authority.Tymms, P., Jones, P., Albone, S., & Henderson, B. (2009). The first seven years at school. Educational Assessment and Evaluation Accountability, 21,Tymms, P., Merrell, C., Heron, T., Jones, P., Albone, S., & Henderson, B. (2008). The importance of districts. School Effectiveness and School Improvement, 19(3),Tymms, P., Merrell, C., & Jones, P. (2004). Using baseline assessment data to make international comparisons. British Educational Research Journal, 30(5),Willms, J. D. (1987). Differences Between Scottish Educational Authorities in their Examinations Attainment. Oxford Review of Education, 13(2),