Presentation is loading. Please wait.

Presentation is loading. Please wait.

Teacher Assessment versus Exams

Similar presentations

Presentation on theme: "Teacher Assessment versus Exams"— Presentation transcript:

1 Teacher Assessment versus Exams
Peter Tymms CEM, Durham University

2 Overview The Issue The importance of LAs, Schools and teachers
Fairness and bias Coverage and sampling Teacher assessment Exams and tests Predictive validity Conclusions

3 The Issue Teacher assessment is unfair because it is unreliable and biased. Exams are simply snapshots and are unrepresentative of the work that has really be done

4 Which matters most? LA School Teacher Pupil

5 Newcastle Commission: Data Sources
Several national datasets including ASPECTS, PIPS, MidYIS & YELLIS KS1, KS2, KS3 & GCSE Looked a value-added using 3 level multilevel models

6 Example using KS2 English







13 Willms’ Diagram Willms, J. D. (1992). Monitoring School Performance: A Guide for Educators. Lewes, Falmer Press.

14 The Teacher Effect


16 Which matters most? LA School  Teacher    Pupil     

17 Proximate variables dominate
Conclusion Pupils vary enormously Teachers have the greatest impact Schools are relevant Authorities hardly vary at all Proximate variables dominate

18 Hypothesis The best teachers will be best at judging their students

19 What is bias? Bias appears in a test when part of an assessment is harder for a particular group. Or when an assessor systematically downgrades a group or an individual for construct irrelevant reasons

20 Example of item bias Pigeon Turtle

21 Examples of teacher bias
Annecdote By Sex (eg baseline & page 17 Harlen) By ability – judgement anchored by experience By Ethnicity – assault experiments By social class By behaviour (origin of ability testing. Binet) By Age – (EPICure study) By incident – eg spilling a glass of water. The halo (or horns) effect (e.g. P scales)

22 P Scales in 2004

23 Teacher reliability How should reliability be assessed
By looking at the internal consistency of judgements? By looking at the link to external assessments? By comparing over time? By comparing one teacher with others? Facets model within Rasch measurement

24 Trusting teachers’ judgement Harlen 2005
“The findings of the review by no means constitute a ringing endorsement of teachers’ assessment; there was evidence of low reliability and bias in teachers’ judgements”

25 5-14, Portfolios & single level tests
5-14 assessments What about portfolios? inter-rater very low for maths and writing English teacher levels in SATs early 1990s “considerable error” later quite common to find teacher = test results single level tests compromised by teacher judgement

26 How does the power to grade affect relationships?
Is it OK for teachers to assess their own pupils for High Stakes exams? How does the power to grade affect relationships? Would you give McEnroy a B?

27 Exam/test reliability
Typically around 0.9 but … Distinguish the assessment of Convergent questions Divergent questions

28 Exam/test bias Pre-tests are often used to address issues of bias But we put much reliance on judgment. England’s major exams are largely not pre-tested.

29 Are Exams inappropriate snapshots?
Issue 1: Questions must be representative samples of the course under exam conditions. Issue 2: Constraint on the nature of the assessment Multi-method Multi-trait challenge Issue 3: Impact of stress on performance Positive & Negative (links to introversion)

30 Introvert and Extrovert
Effort Stimulus

31 We need to match format to content
Some things must be assessed by judgement: Social interactions Quality of research Poetry Art Some things are best assessed left to tests Mental arithmetic Spelling Phonological awareness Diagnostic assessments (e.g. INCAS) Even so perhaps there is a final arbiter

32 Predictive validity Developed ability test (MidYIS/IQ/etc)
Attainment test (Std Grade/Highers) Teacher Grade Later success – degree, salary etc

33 We need the evidence but ..
Prediction is often poor Two major reasons

34 Prediction of Educational Achievement

35 Correlation = 0.7

36 Select top 15%

37 Correlation = 0.39

38 Cream top 3%; r=0.19

39 So, poor prediction because of
Prior selection Variable outcome measures

40 Conclusion: Judgements or tests?
Should we do both? (Profiles) But, how do we ensure that judgements and tests are independent? How can judgements be kept free from bias? Virtually impossible in high stakes tests Essential for formative work

41 No easy solutions Thank you

42 References Campbell, D. T., & Fiske, D. W. (1959). Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix. Psychological Bulletin, 56, Cooper, B. (1998). Using Bernstein and Bourdieu to understand children's difficulties with "realistic" mathematics testing: an exploratory study. Qualitative Studies in Education, II(4), Eysenck, H. J. (2006) The Biollogical Basis of Personaility.Transaction publishers Harlen, W. (2005). Trusting teachers' judgement: research evidence of reliability and validity of teachers' assessment used for summative purposes. Research Papers in Education, 20(3), Johnson, S., Hennessy, E., Smith, R., Trikic, R., Wolke, D., & Marlow, N. (2009). The EPICure Study: Academic attainment and special educational needs in extremely preterm children at 11 years. London: Nottingham/London/Warwick. Koretz, D., Stecher, B. M., Klein, S. P. & McCaffrey, D. (1994) The Vermont Portfolio Assessment Program: findings and implications, Educational Measurement: Issues & Practice, 13, 5–16. Tymms, P. (1997). Value-added Key Stage 1 to Key Stage 2. London: School Curriculum and Assessment Authority. Tymms, P., Jones, P., Albone, S., & Henderson, B. (2009). The first seven years at school. Educational Assessment and Evaluation Accountability, 21, Tymms, P., Merrell, C., Heron, T., Jones, P., Albone, S., & Henderson, B. (2008). The importance of districts. School Effectiveness and School Improvement, 19(3), Tymms, P., Merrell, C., & Jones, P. (2004). Using baseline assessment data to make international comparisons. British Educational Research Journal, 30(5), Willms, J. D. (1987). Differences Between Scottish Educational Authorities in their Examinations Attainment. Oxford Review of Education, 13(2),

Download ppt "Teacher Assessment versus Exams"

Similar presentations

Ads by Google