Impact, Washback and Consequences of Large-scale Testing

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

Quality Control in Evaluation and Assessment
The Teacher Work Sample
Action Research Not traditional educational research often research tests theory not practical Teacher research in classrooms and/or schools/districts.
Frameworks for Assessment of Student Learning: Questions of Concept, Practice, and Detail Christine Siegel, Ph.D. Associate Professor of School Psychology.
Faculty of Education and Social Work Investigating Motivational Teaching Strategies and Teacher Self- Evaluation in Adult English Language Classrooms in.
Middle Years Programme
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 6 Validity.
The Network of Dynamic Learning Communities C 107 F N Increasing Rigor February 5, 2011.
1 © 2006 Curriculum K-12 Directorate, NSW Department of Education and Training Implementing English K-6 Using the syllabus for consistency of teacher judgement.
1 The IELTS Academic Reading Module Background information Question types Skills Challenges Helping Ss prepare Questions?
Assessment as a washback tool: is it beneficial or harmful? Nick Saville Director, Research and Validation University of Cambridge ESOL Examinations October.
The Importance of Technology in High School Science Amy Roediger.
Delmar Learning Copyright © 2003 Delmar Learning, a Thomson Learning company Nursing Leadership & Management Patricia Kelly-Heidenthal
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Consequential Validity Inclusive Assessment Seminar Elizabeth.
Chapter 5 Instrument Selection, Administration, Scoring, and Communicating Results.
E-portfolio in TaskStream (DRF) Signature Assignments Signature Assignments Classroom Community (1 st & 2 nd semesters) Classroom Community (1 st & 2 nd.
Reporting and Evaluating Research
Developed by Marian Hargreaves for NEAS 2013
Maria Cristina Matteucci, Dina Guglielmi
Nikki Pinakidis Tiffany Rehak Rachel Sager. What is Culturally Relevant Mathematic Teaching? Culturally relevant teaching is a pedagogy that empowers.
LANGUAGE PROFICIENCY TESTING A Critical Survey Presented by Ruth Hungerland, Memorial University of Newfoundland, TESL Newfoundland and Labrador.
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Welcome to the Athens, Greece June17, Teaching and Testing: Promoting Positive Washback Kathleen M. Bailey Monterey Institute of International Studies.
LANGUAGE TESTING AND ASSESSMENT ASSESSMENT LITERACY Prepared by Marina Gvozdeva, Natalya Milyavskaya, Tatiana Sadovskaya, Violetta Yurkevich Based on material.
Matt Moxham EDUC 290. The Idaho Core Teacher Standards are ten standards set by the State of Idaho that teachers are expected to uphold. This is because.
Principles of Language Assessment Ratnawati Graduate Program University State of Semarang.
Dimensions of Test Washback
School Innovation in Science Formerly Science in Schools An overview of the SIS Model & supporting research Russell Tytler Faculty of Education, Deakin.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
What do ESL Teachers Need for Their Professional Development?: The Voice From Ontario Kangxian Zhao, University of Toronto Hong Wang, Mount Saint Vincent.
Chapter 4 Evaluating and Creating Interactive and Content- Based Assessment.
Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?
1 A proposed skills framework for all 11- to 19-year-olds.
Communication Degree Program Outcomes
Instructional leadership: The role of promoting teaching and learning EMASA Conference 2011 Presentation Mathakga Botha Wits school of Education.
DEVELOPING ART LESSONS WITH AT-RISK YOUTH AND ELLS IN MIND Delanie Holton Art Teacher Fletcher Primary and Intermediate Aurora, CO.
Lynn Thompson Center for Applied Linguistics Startalk Network for Program Excellence Chicago, Illinois October 16-18, 2009 Formative and Summative Assessment.
Thomas College Name Major Expected date of graduation address
© 2005 Pearson Education Canada Inc. Chapter 2 Sociological Investigation.
Principles in language testing What is a good test?
Chapter 11: Qualitative and Mixed-Method Research Design
The Areas of Interaction are…
Washback of BiH STANAG 6001 test
Nick Saville Bridging the gap between theory and practice EALTA Krakow May 2006 Investigating the impact of language assessment systems within a state.
1: Overview and Field Research in Classrooms ETL329: ENTREPRENEURIAL PROFESSIONAL.
FEBRUARY KNOWLEDGE BUILDING  Time for Learning – design schedules and practices that ensure engagement in meaningful learning  Focused Instruction.
USEFULNESS IN ASSESSMENT Prepared by Vera Novikova and Tatyana Shkuratova.
DVC Essay #2. The Essay  Read the following six California Standards for Teachers.  Discuss each standard and the elements that follow them  Choose.
Quantitative and Qualitative Approaches
1 Comprehensive Accountability Systems: A Framework for Evaluation Kerry Englert, Ph.D. Paper Presented at the Canadian Evaluation Society June 2, 2003.
CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:
Programming the New Syllabuses (incorporating the Australian Curriculum)
Promoting Positive Washback
New Pathways to Academic Achievement for K-12 English Learners TESOL March 26, 2009 Anna Uhl Chamot The George Washington University.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
Chapter 14: Affective Assessment
Culturally Responsive Teaching in Diverse Classrooms By Kenny and Maria CHAPTER 3.
Best Practices in ELL Instruction: Multimodal Presentation Professional Development by: Heather Thomson T3 845.
Foundations of American Education: Perspectives on Education in a Changing World, 15e © 2011 Pearson Education, Inc. All rights reserved. Chapter 11 Standards,
AEMP Grade Level Collaborative Module 8 Office of Curriculum, Instruction, and School Support Language Acquisition Branch Academic English Mastery Program.
PGES Professional Growth and Effectiveness System.
Author: Zhenhui Rao Student: 范明麗 Olivia I D:
Focus Questions What is assessment?
BILC Seminar, Budapest, October 2016
The Importance of Technology in High School Science
BILC Professional Seminar - Zagreb, October 16, 2018 Maria Vargova
BASIC PRINCIPLES OF ASSESSMENT
Presentation transcript:

Impact, Washback and Consequences of Large-scale Testing Liying Cheng (Ph.D) Queen’s University chengl@educ.queensu.ca

Overview Define the research terms – washback, impact and consequences Discuss this phenomenon in relation to test validity and social consequences Argue for conducting further empirical evidence beyond Alderson & Wall, 1993 and Cheng et al 2004 Illustrate a series of empirical studies using different methodologies Focusing on the influence of testing on students

Impact, washback, and consequences There is a set of relationships, intended and unintended, positive and negative, between teaching, learning and testing (Alderson & Wall, 1993). measurement-driven instruction (e.g. Popham, 1987), test-curriculum alignment (Shepard, 1990), and consequences (Cizek, 2001) (see Cheng & Curtis, 2004 for a detailed review)

Impact, washback, and consequences Test Impact - the effects of tests on macro-levels of education and society, and washback - the effects of language tests on micro-levels of language teaching and learning, i.e. inside the classroom (Bachman & Palmer, 1996; McNamara, 2000; Wall, 1997). A view of test influence falling between the narrow one of washback and the all-encompassing one of impact (Hamp-Lyons, 1997).

Validity (theoretical models) Washback - ‘only one form of testing consequences that need to be weighted in evaluating validity’ (Messick, 1996, p.243) promoting the examination of the two threats to test validity, construct under-representation and construct-irrelevant variance, to decide the possible consequences that a test can have on teaching and learning. Bachman (2005) proposes a framework with a set of principles and procedures for linking test scores and score-based inferences to test use and the consequences of test use

Social consequences to the society (philosophical models) Critical language testing - political uses and abuses of language tests (Shohamy, 2001) Fairness framework (Kunnan, 2004) - drew on research in ethics to link validity and consequences - tests as instruments of social policy and control. An encompassing ethics framework to examine the consequences of testing on language learning at the classroom as well as the educational, social and political levels Hamp-Lyons (1997).

Model of Washback

Washback studies Two areas of washback studies have recently been conducted: those relating to ‘traditional’ or existing tests which are thought to stifle innovative teaching, and those relating to cases where a test has been specifically changed in order to encourage innovation in the classroom. Methods Survey methods – interviews and questionnaire Classroom observations Cheng, L., & Watanabe, Y., with Curtis, A. (Eds.) (2004). Washback in language testing: Research contexts and methods. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Major work on washback

Alderson & Wall (1993) 15 hypotheses A test will influence teaching*. A test will influence learning?. A test will influence what teachers teach*. A test will influence how teachers teach*. A test will influence what learners learn*. A test will influence how learners learn?. A test will influence the rate and sequence of teaching?. A test will influence the rate and sequence of learning?.

Alderson & Wall (1993) 15 hypotheses A test will influence the degree and depth of teaching?. A test will influence the degree and depth of learning?. A test will influence attitudes to the content, method, etc., of teaching and learning*?. Tests that have important consequences will have washback*. Tests that do not have important consequences will have no washback*?. Tests will have washback on all learners and teachers?. Tests will have washback effects for some learners and some teachers, but not for others?.

Why studying the impact on students? … of the many millions of people who will sit down to take (English) tests …, virtually none will have participated in the test’s design, in writing test items, in critiquing the test methods, in setting cut scores or in writing or commenting on the performance descriptions that tie to their all-important score. Of all stakeholders in testing events, test takers surely have the highest stake of all (Hamp-Lyons, 2000, p. 581).

The Impact Study of Ontario Secondary School Literacy Test

Impact of test/task types, skills and strategies on Students

Impact of test formats on students

Impact of L2 test takers’ characteristics and test performance

Focus group (Cognitive lab) The research questions: How do L1 and L2 test takers’ accounts of the OSSLT differ? Is it the same for both L1 and L2 groups or does the construct change in important ways in relation to language background? Do these differences pose a threat to the inferences drawn from the test results? What other differences are evident in these test takers’ accounts of the OSSLT? Fox, J. & Cheng, L. (in press). Did we take the same test? Differing accounts of the Ontario Secondary School Literacy Test by first and second language test takers. Assessment in Education: Principles, Policy & Practice, 14(1).

Initial Findings Key differences between English speakers and ESL/ELD students in behaviour and accounts of: Test knowledge Knowledge of test genre --formats; space; of what’s expected ; what raters want/will reward. Test-wiseness Strategic vs. non-strategic responses connects with background in test taking. The construct or what’s being tested Language proficiency or writing? Problem -- prompts that rely on a key word for response: “junk food”; “invention”; how L2 test takers engage with the test – the importance of pictures and other cues vs. reliance on texts. Affect/investment Emotional investment – anxiety, sadness, confidence, perceptions of difficulty

Students’ attitude and their CET performance (Zhao & Cheng, 2006) What are the attitudes of students toward CET-4? What relationships exist between students’ attitudes and their performance on the CET-4? Does sex difference exist in attitudes and their relation to test performance? What attitudes differentiate high achievers (who score above 80 percentile) from low achievers (below 20 percentile)? What is the relationship between the two?

Four Attitudinal Factors Factor Item Mean SD 1 Test-taking Anxiety/ 2,3,4,8,10,19,20,23,24 3.87 .72 Lack of Concentration 26,27,29,32,34,35,36,38 2 Test-taking Motivation 5, 7, 9, 22, 37 2.79 .58 3 Belief in CET-4 6,13,17,21,25,39 2.43 .58 4 Test Ease 12,14,30 2.35 .75 N=212

Multiple Regression

Multiple Regression Cont’d Multiple Regression: females’ attitudes toward CET-4 and their test performance (N=145) Model Factor β t p R2 1 Test-taking Anxiety/Lack of Concentration -.28 -3.44 .001 070 2 Test-taking Anxiety/Lack of Concentration -.26 -.327 .001 .104 Test-taking Motivation -.20 2.55 .012 Multiple Regression: males’ attitudes toward CET-4 and their test performance (N=63) Model Factor β t p R2 1 Belief in CET-4 .46 4.02 <.001 .197

Multiple Regression Cont’d Multiple Regression: High achievers’ attitudes toward CET-4 and their test performance (N=42) Factor β t p R2 2 Test-taking Motivation .58 4.18 <.001 .338 Multiple Regression: Low achievers’ attitudes toward CET-4 and their test performance (N=42) Factor β t p R2 2 Test-taking Motivation .32 2.11 .041 .322

Strategy use (Song & Cheng, 2006)

Students’ strategy use and their CET performance

How to establish the relationship between testing and its impact? Work backward from the test items (test design) Explore test-takers’ characteristics over testing learners’ academic background L1 (native language), Culture, Ethnicity Gender, Age Learning Strategies Learning styles and personality (Field in/dependence) Test anxiety Motivation Longitudinal/cross-group studies Linking the effects to student test-performance using higher level analysis

Qualities of language tests Bachman and Palmer’s test usefulness framework (1996) Reliability + Construct Validity + Authenticity + Interactiveness + Impact + Practicality Kunnan’s test fairness framework (2004) Validity + Absence of Bias + Access + Administration + social consequences

Future directions Washback/impact researchers need to fully analyze the test under study and understand its test use. ‘the extensive research on validity and validation has tended to ignore test use, on the one hand, while discussions of test use and consequences have tended to ignore validity, on the other’. It is, then, essential for us to establish the link between test validity and test consequences (Bachman, 2005, p.7). Therefore, it is imperative that washback/impact researchers work together with other language testing researchers as well as educational policy makers and test agencies to address the issue of validity, in particular, fairness and ethics of our tests.

Reflections It is clear that “testing is never a neutral process and always has consequences” (Stobart, 2003, p. 140). Tests are a differentiating ritual for students: “for every one who advances there will be some who stay behind” (Wall, 2000, p. 500). This is particular true to the large scale language tests. Assessment (testing) is central to the teaching and learning process.

references Alderson, J.C. and Wall, D. (1993). Does washback exist? Applied Linguistics 14, 115-129. Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1-34. Bachman, L. F. and Palmer, A.S. (1996). Language Testing in Practice, Oxford University Press, Oxford, England. Bailey, K. M. (1996). Working for washback: A review of the washback concept in language testing, Language Testing 13, 257-279. Cheng, L. (2005). Changing language teaching through language testing: A washback study. Studies in Language Testing: Volume 21, Cambridge University Press, Cambridge. Cheng, L. and Curtis, A. (2004). Washback or backwash: A review of the impact of testing on teaching and learning, in L. Cheng and Y. Watanabe with A. Curtis. (eds.), Washback in Language Testing: Research Contexts and Methods, Lawrence Erlbaum Associates, Mahwah, New Jersey. Cheng, L., & Watanabe, Y., with Curtis, A. (Eds.) (2004). Washback in language testing: Research contexts and methods. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

references Cheng, L., Klinger, D., & Zheng, Y. (2007). The challenges of the Ontario Secondary School Literacy Test for second language students. Language Testing, 24(2). Cizek, G. J. (2001). More unintended consequences of high-stakes testing. Educational Measurement: Issues and Practrice, 23(3),1-17. Hamp-Lyons, L. (1997). Washback, impact and validity: ethical concerns. Language Testing, 14(3), 295-303. Hawkey, R. (2006). Impact Theory and Practice: Studies of the IELTS test and Progetto Lingue 2000. Cambridge University Press, Cambridge. Klinger, D., Cheng, L., & Zheng, Y. (under review). Factors influencing ESL/ELD students’ performance on the Ontario Secondary School Literacy Test. Educational Assessment. Kunnan, A. J. (2004). Test fairness. In M. Milanovic, C. Weir, & S. Bolton (Eds.). Europe language testing in a global context: Selected papers from the ALTE conference in Barcelona. Cambridge: Cambridge University Press. Messick, S. (1996). Validity and washback in language testing, Language Testing 13, 243-256. Popham, W. J. (1987). The merits of measurement-driven instruction. Phi Delta Kappa, 68, 679-682.

references Qi, L. (2005). Stakeholders’ conflicting aims undermine the washback function of a high-stakes Test, Language Testing 22, 142-173. Shepard, L. A. (1990). Inflated test score gains: Is the problem old norms or teaching the test? Educational Measurement: Issues and Practice 9, 15‑22. Shohamy, E. (2001). The Power of Tests: A Critical Perspective on the Uses of Language Tests, Longman, Essex, England. Song, X., & Cheng, L. (2006). Language learner strategy use and test performance of Chinese learners of English. Language Assessment Quarterly: An International Journal, 3(3), 241-266. Wall, D. (1997). Impact and washback in language testing. In Clapham, C. & Corson, D. (Eds.). Encyclopedia of Language and Education (p. 291-302). Zhao, J. & Cheng, L. (2006, May). Exploring the relationship between students’ attitudes toward testing and their test performance. Paper presented at the Canadian Society for Study of Education, Toronto, Ontario. Zheng, Y., Cheng, L. & Klinger, D. (under review). Do test formats in reading comprehension affect ESL/ELD and non-ESL/ELD students’ test performance differently? TESL Canada.