Presentation on theme: "Spiros Papageorgiou University of Michigan"— Presentation transcript:
1 Spiros Papageorgiou University of Michigan firstname.lastname@example.org Using the Common European Framework of Reference to Report Language Test ScoresSpiros PapageorgiouUniversity of Michigan
2 Overview The Common European Framework of Reference (CEFR) The Manual for relating language examinations to the CEFRStandard settingAn example of a CEFR standard setting study in Colombia
3 The CEFR Reference document—not prescriptive Basis for the elaboration of language syllabi, curricula, examinations, and textbooksLanguage objectives: Description of what language learners have to learn to do in order to use a language for communicationSix main levels of proficiency: A1 (lowest), A2, B1, B2, C1, C2 (highest)
4 The Manual for Relating Examinations to the CEFR It aims to “help the providers of examinations to develop, apply and report transparent, practical procedures in a cumulative process of continuing improvement in order to situate their examination(s) in relation to the Common European Framework” (p. 1).
5 Stages for Relating Test Content and Test Scores to the CEFR FamiliarizationSpecificationStandardization training and benchmarkingStandard settingValidation
6 Standard SettingThe decision making process of classifying examination results in a number of successive levelsPerformance Level Descriptions (PLD): statements describing what learners can do with language (e.g., CEFR descriptors)Performance Level Labels (PLL): labels of PLD (e.g., A1–C2)Cut scores: the boundary between two successive levelsParticipation of expert judges (panelists)
7 PLLPLDC2Can write clear, smoothly flowing, complex texts in an appropriate and effective style and a logical structure which helps the reader to find significant points.C1Can write clear, well-structured texts of complex subjects, underlining the relevant salient issues, expanding and supporting points of view at some length with subsidiary points, reasons and relevant examples, and rounding off with an appropriate conclusion.B2Can write clear, detailed texts on a variety of subjects related to his field of interest, synthesising and evaluating information and arguments from a number of sources.B1Can write straightforward connected texts on a range of familiar subjects within his field of interest, by linking a series of shorter discrete elements into a linear sequence.A2Can write a series of simple phrases and sentences linked with simple connectors like “and”, “but” and “because”.A1Can write simple isolated phrases and sentences.
8 An Example of a Standard Setting Study in Colombia Reporting scores for the Michigan English Test on the CEFR levels13 participants from the 9 Binational centers in ColombiaFamiliarization with the CEFRTraining with item difficulty (Pilot Form B)Angoff standard setting methodFirst round of judgmentsPilot Form A statistical informationSecond round of judgments
9 Standard Setting Validity Evidence Procedural validity: examining whether the procedures followed were practical and implemented properly; that feedback given to the judges was effective; and that documentation was sufficiently compiled.Internal validity: addressing issues of accuracy and consistency of the standard setting results.External validation: collecting evidence from independent sources that support the outcome of the standard setting meeting.
11 Procedural Validity: Internalization of the CEFR Correlation of descriptor level judgments with the CEFR during the Familiarization stageDescriptorsJ1J2J3J4J5J6J7J8J9J10J11J12J13Listening.22.214.171.124.126.96.36.199.70.91.84Reading.188.8.131.52.62.90Vocabulary.184.108.40.206.97Grammar.220.127.116.11
12 Internal Validity: Method Consistency Standard error of judgments should be ≤ ½ of the standard error of the test (Section I 1.71 and Section II 1.74 )Cut scoreSEj incl. extreme ratingsSEj excl. extreme ratingsSection I B11.971.57Section I B21.34Section I C11.69Section II B12.001.71Section II B22.301.62Section II C12.57
13 Internal Validity: Decision Consistency Calculating agreement coefficient rho (p0; max .98) and kappa (k; max 71)Cut scorep0kSection I B1.90.68Section I B2.88.70Section I C1.97.61Section II B1.95.64Section II B2.86.71Section II C1.94.65
14 Internal Validity: Intra-judge Consistency Correlation of mean of judgments with empirical item difficultyMET section/round of judgmentsCorrelationSection I, Round 1.42Section I, Round 2.83Section II, Round 1.73Section II, Round 2.92We based the final decision on the round 2 judgments
15 Internal Validity: Inter-judge Consistency Indices of agreement and consistencyIndexSection ISection IIICC.94W.80.76Alpha
16 External Validity: Reasonableness of the Cut Scores Classification of Pilot Form A test takers (N = 660) into CEFR levelsLevelSection ISection IIA2105 (15.91%)55 (8.33%)B1408 (61.81%)323 (48.94%)B295 (14.39%)214 (32.43%)C152 (7.88%)68 (10.30%)
17 External Validity: Comparison of Level Classifications Exact and adjacent level agreement of classifications (N = 302) provided by a test center and the cut scoreAgreementSection ISection IIExact level122 (40.40%)92 (30.46%)Within 1 level290 (96.03%)264 (87.42%)
18 Final Stage Before Reporting Test Scores: Equating A statistical procedure used to allow for comparisons of scores obtained on different test formsAdjustment of differences in test form difficulty (but not content)Scaled scores, not percentagesExaminee position on the language ability scaleScores are comparable across different administrationsLinked to the CEFR cut scores
19 Reported ScoresBoth section scores should be taken into account when interpreting the test results for use in decision-makingCEFR LevelMET Section I scoresMET Section II scoresC164 and aboveB253–63B140–52A239 or below
20 For more information visit www.lsa.umich.edu/eli/testing