Chap. 2 Principles of Language Assessment

Slides:



Advertisements
Similar presentations
Assessment types and activities
Advertisements

Assessment & Evaluation adapted from a presentation by Som Mony
MCR Michael C. Rodriguez Research Methodology Department of Educational Psychology.
“Go placidly amidst the noise and haste And remember what peace there may be in silence.”
Alternative Strategies for Evaluating Teaching How many have used end-of-semester student evaluations? How many have used an alternative approach? My comments.
L2 program design Content, structure, evaluation.
Lesson Six Reliability.
Testing What You Teach: Eliminating the “Will this be on the final
Types of Tests. Why do we need tests? Why do we need tests?
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.
Assessment: Reliability, Validity, and Absence of bias
TOPIC 3 BASIC PRINCIPLES OF ASSSESSMENT
BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY
Creating Effective Classroom Tests by Christine Coombe and Nancy Hubley 1.
Linguistics and Language Teaching Lecture 9. Approaches to Language Teaching In order to improve the efficiency of language teaching, many approaches.
Cognitive and Academic Assessment
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
ASSESSMENT IN EDUCATION ASSESSMENT IN EDUCATION. Reliability  Test-re-test, equivalent forms, internal consistency.  Test-re-test, equivalent forms,
Introduction to Assessment ESL Materials and Testing Week 8.
Principles of Language Assessment Ratnawati Graduate Program University State of Semarang.
Shawna Williams BC TEAL Annual Conference May 24, 2014.
Validity and Reliability
Principles of Assessment
Chap. I Testing, Assessing, and Teaching
Chap. 3 Designing Classroom Language Tests
Principles of Language Assessment
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Reliability Lesson Six
I Can Distinguish the types of validity Distinguish the types of reliability Identify if an example is objective or subjective Copyright © Allyn & Bacon.
Four Basic Principles to Follow: Test what was taught. Test what was taught. Test in a way that reflects way in which it was taught. Test in a way that.
Validity & Practicality
Principles in language testing What is a good test?
Measuring Complex Achievement
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. Concepts in language assessment and testing theory. You will be.
Lecture 7. The Questions: What is the role of alternative assessment in language learning? What are the Reasons.
Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.
USEFULNESS IN ASSESSMENT Prepared by Vera Novikova and Tatyana Shkuratova.
Module 6 Testing & Assessment Part 1
Common Formative Assessments for Science Monica Burgio Daigler, Erie 1 BOCES.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
Assessment and Differentiation of Instruction. Assessment for Learning.
Validity in Testing “Are we testing what we think we’re testing?”
Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.
Evaluation, Testing and Assessment June 9, Curriculum Evaluation Necessary to determine – How the program works – How successfully it works – Whether.
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Assessment My favorite topic (after grammar, of course)
Language Assessment. Evaluation: The broadest term; looking at all factors that influence the learning process (syllabus, materials, learner achievements,
PRINCIPLES OF LANGUAGE ASSESSMENT Riko Arfiyantama Ratnawati Olivia.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Standards-Based Tests A measure of student achievement in which a student’s score is compared to a standard of performance.
Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.
Unit 3 L2 Testing (2): The cornerstones of language testing.
Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.
Applying principles to the evaluation of classroom tests Olyvia R. Candraloka.
Language Assessment.
EVALUATING EPP-CREATED ASSESSMENTS
Principles of Language Assessment
Introduction to Assessment
Concept of Test Validity
پرسشنامه کارگاه.
Learning About Language Assessment. Albany: Heinle & Heinle
The extent to which an experiment, test or any measuring procedure shows the same result on repeated trials.
Alternative Assessment
BASIC PRINCIPLES OF ASSESSMENT
Why do we assess?.
Presentation transcript:

Chap. 2 Principles of Language Assessment

Practicality Practical means: (1) is not excessively expensive (2) stays within appropriate time constraints (3) is relatively easy to administer, and (4) has a scoring/evaluation procedure that is specific and time-efficient

Reliability A reliable test is consistent and dependable. On two different occasions or by different people, the test should yield similar results. Student-Related Reliability may be caused by temporary illness, fatigue, “bad day”, anxiety, and other physical or psychological factors.

Rater Reliability Human error, subjectivity, and bias may enter into the scoring process. Inter-rater reliability occurs when two/more scorers yield inconsistent scores of the same test (scoring criteria, inexperience, inattention,preconceived biases).

Intra-rater reliability occurs because of unclear scoring criteria, fatigue, bias toward “good” and “bad” students, or carelessness. Test Administration Reliability Unreliability may also result from the conditions in which the test is administered. Examples: street noise, temperature, desks and chairs, the amount of light.

Reliability & Validity Test Reliability The test itself can cause measurement errors. Examples: a long test, a timed test, ambiguous test items, or a test item with more than one answer. Validity: the degree to which a test measures what it is supposed to measure or can be used successfully for the purposes for which it is intended.

Validity For example, a valid test of reading ability actually measures reading ability. Five types of validity: content validity, criterion-related validity, construct validity, consequential validity, and face validity. Content validity: A test adequately and sufficiently measures the particular skills/behavior it sets out to measure.

Validity Examples: A test that requires the learner actually to speak within an authentic context (T). An oral test asks students to answer multiple-choice questions requiring grammatical judgments (F). Direct testing involves the test-taker in actually performing the target task. e.g. producing target words orally.

Validity Indirect testing tests the learner with a task that is related to the target task. For example, in a test of oral production, the mark of stressed syllables in a list of written words is only indirect testing. Criterion-related validity: a form of validity in which a test is compared or correlated with an outside criterion measure.

Criterion-Related Validity Concurrent validity: A test has concurrent validity if its results are supported by other concurrent performance beyond the assessment itself. For example, a high score on the final exam. will be substantiated by actual proficiency in the language. Predictive validity: A test accurately predicts future performance. e.g. a language aptitude test predicts second/foreign language ability

Construct Validity A construct is any theory, hypothesis, or method that attempts to explain observed phenomena in our universe of perceptions. For example, “proficiency” and “communicative competence” are linguistic constructs. Construct validity: The test items can reflect the essential aspects of the theory on which the test is based. (e.g. the relationship between a test of communicative competence and the theory of c. c.)

Construct Validity The scoring analysis for the interview includes: pronunciation, fluency, grammatical accuracy, vocabulary use, and socio-linguistic appropriateness. If an proficiency interview includes only pronunciation and grammar being evaluated, the construct validity is questionable. (TOEFL)

Consequential Validity Consequential validity includes all the consequences of a test, including the accuracy in measuring intended criteria, the impact on the preparation of test-takers, the effect on the learner, and the social consequences of a test’s interpretation and use.

Face Validity Face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment. Face validity means that the students perceive the test to be valid. (Does the test, on the face of it, appear from the learner’s perspective to test what it is designed to test?)

Authenticity The language is as natural as possible. Items are contextualized rather than isolated. Topics are meaningful (relevant, interesting). Thematic organization to items is provided. Tasks represent, or close to, real-world tasks.

Washback Washback is the effect of testing on teaching and learning. It generally refers to the effects the tests have on instruction in terms of how students prepare for the test. S’s incorrect responses/correct responses/strategies for success/ can be served as learning devices. Comment generously and specifically on S’s test performance.

Washback In reality, letter grades and numerical scores give no information of intrinsic interest to the student. Instead, give praise for strengths and offer constructive criticism of weaknesses. Formative tests provide washback with information to the learner on progress toward goals. Teachers tend to offer no means of washback except grades in summative tests.

Applying Principles to the Evaluation (1). Are the test procedures practical? (administrative details, time frame, smooth administration, materials and equipment, cost, scoring system, reporting results) (2). Is the test reliable? (clean test sheet, audible sound amplification, equally visible video input, lighting,temperature, objective scoring procedures)

Intra-rater reliability guidelines: (consistent sets of criteria, uniform attention, double check consistency, the same standards to all, avoidance of fatigue) (3) Does the procedure demonstrate content validity? (two steps) A: Are classroom objectives identified and appropriately framed?

B: Are lesson objectives represented in the form of test specification? (4) Is the procedure face valid and “biased for best”? Conditions for face valid: a. Directions are clear. b. The structure of the test is organized logically. c. Its difficulty level is appropriately pitched.

d. The test has no surprises. e. Timing is appropriate. (5). Are the test tasks as authentic as possible? a. as natural as possible b. as contextualized as possible c. interesting, enjoyable, and/humorous d. thematic organization e. real-world tasks

(6) Does the test offer beneficial washback to the learner? (content validity, preparation time before the test, reviewing after the test, self-assessment, and peer discussion of the test results)