Perceptions on L2 fluency-perspectives of untrained raters

Slides:

Advertisements

Similar presentations

RESEARCH CLINIC SESSION 1 Committed Officials Pursuing Excellence in Research 27 June 2013.

Advertisements

Formative and Summative Assessment

Language and Cognition Colombo, June 2011 Day 8 Aphasia: disorders of comprehension.

Mapping our language programmes Vicky Wright Centre for Language Study

Categories 4321 I. Listening This will evaluate by following the criteria below. Comprehension Correct and complete the exercises and assignment; a student.

C HINESE 318 Introduction to Applied Chinese Linguistics.

TYPES OF TEST ITEMS/TASKS

| ERK/ CEFR in Context 23 January 2015, Groningen Estelle Meima Language Centre.

Hong Kong Examinations & Assessment Authority Education Assessment Services Division Secondary 3 English Language Assistant Examiners’ Training Workshop.

Teaching Oral Communication Skills

Stages of Second Language Acquisition

National Curriculum Key Stage 2

14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,

Assessment and Performance-based Instruction

Raili Hildén University of Helsinki Relating the Finnish School Scale to the CEFR.

Language Assessment 4 Listening Comprehension Testing Language Assessment Lecture 4 Listening Comprehension Testing Instructor Tung-hsien He, Ph.D. 何東憲老師.

Language Issues in English-medium Universities: A Global Concern1 Using Mobile Phones in Pronunciation Teaching in English-medium Universities in Turkey.

Item52321 Content Full realization of the task. All content points included Good realization of the task. There is adherence to the task with one missing.

1 Who, What, Where, WENS? The Native Speaker in the ILR ECOLT 2010 October 2010 ILR Testing Committee ECOLT 2010 October 2010 ILR Testing Committee.

Developing Communicative Dr. Michael Rost Language Teaching.

The new languages GCSE: STRATEGIES FOR SUCCESSFUL IMPLEMENTATION.

Workshop: assessing writing Prepared by Olga Simonova, Maria Verbitskaya, Elena Solovova, Inna Chmykh Based on material by Anthony Green.

Mark COMMUNICATION Criteria 9-10 Very Good Information, ideas and points of view are presented and explained with confidence. Can narrate events when appropriate.

Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.

The ACTFL Performance Guidelines Dawn Samples Lexington One, 6/17/10 Languages and Learning for Schools.

Language proficiency evaluation: Raters Henry Emery PRICESG Linguistic Sub-Group.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Benjamin Rifkin The College of New Jersey.  Background  Development  ACTFL and ILR  Modalities  Levels and sublevels.

Writing A Review Sources Preliminary Primary Secondary.

HIGH SCHOOL TEACHER TRAINING WORKSHOP

To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.

Chapter 11 Linguistics and Foreign Language Teaching Lecturer: Rui Liu.

A Guide to Critical Thinking Concepts and Tools

Higher RP3a [Technology]

Effects of Reading on Word Learning

AP FRENCH: UNPACKING THE SCORING GUIDELINES FOR INTERPERSONAL TASKS

Dr Anie Attan 26 April 2017 Language Academy UTMJB

M-LANG project Ref. n NO01-KA Interactive Exchange Workshop on how to use response systems and ICT tools for creating interactive learning.

Cognitive Processes in SLL and Bilinguals:

Language proficiency evaluation: Raters

Sheltered English Instruction

UNCERTAINTY CONSTANT CHANGE DYNAMISM IT FLOWS FLEXIBILITY ADAPTABILITY.

IB Assessments CRITERION!!!.

Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang

GLoCALL & PCBET 2017 Joint Conference, 7-9 September 2017 at Universiti Teknologi Brunei, Brunei Darussalam, Presented at Room 1, 11:00-11:30. Effect of.

Listening Speaking Reading Class Preparation Class Preparation Class Preparation Class Preparation Online Tools Online Tools Online Tools Online Tools.

EL (English Language) Students and WIDA Standards

Botero a Solar Developing an Integrated Performance Assessent based on Art of Contemporary South America.

SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 11/8/2018.

Automatic Fluency Assessment

THE NATURE OF SPEAKING Joko Nurkamto UNS Solo.

INTEGRATED SPEAKING AND WRITING

Linguistic Predictors of Cultural Identification in Bilinguals

Studying Spoken Language Text 17, 18 and 19

LANGUAGE TEACHING MODELS

SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 12/3/2018.

Training Toward the ICAO Standards

SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy

Intermediates Here is a simple profile for Intermediate proficiency speakers from ACTFL 2012.

National Curriculum Requirements of Language at Key Stage 2 only

Towards Automatic Fluency Assessment

Lesson 6-7: Understanding the MYP Grading Rubric/Writing a response paragraph using PEEL 9/20/2017.

Applied Linguistics Chapter Four: Corpus Linguistics

MALTA Language Proficiency Requirements Implementation MAY 2011

English Language Proficiency

Intermediates Here is a simple profile for Intermediate proficiency speakers from ACTFL 2012.

Sample Scoring Rubrics for PresentationsScoring Rubric for Oral Presentations: Example #1.

Assessing Speaking.

Teaching Listening Comprehension

Presentation transcript:

Perceptions on L2 fluency-perspectives of untrained raters by The Foreign Language Assessment Group CFP meeting 06 March 2009

Human Raters’ Perceptions Classroom recording (video + audio) Annotation of Speaker Turn Transcription Random Selection Extracted Audio files Automated Phone Segmentation Manual Checking Data Analysis Human Raters’ Perceptions Data Analysis Data Analysis

Status of the two rating studies Summer & Fall 2008 1st Rating study 38 untrained raters from different Mandarin-speaking regions Data collection completed Data analysis currently underway Spring 2009 2nd Rating study Target 10-20 participants (i.e. trained raters) from linguistic-related disciplines Determine factors that affect raters’ perception on L2 fluency, visual impacts on rating performance and influence of variety of Mandarin

Research Purpose To explore untrained raters’ rating patterns and their perceptions on L2 fluency To compare raters’ performance between untrained and trained raters

Research Questions What kinds of rating patterns do untrained raters show? Which assessment criteria predict the L2 fluency? Are there any interaction effects between visual and audio inputs based on the rating results by untrained raters? What are implications for the automated speech recognition tool from these results?

Research Procedures (1) Target raters 38 Native Speakers of Chinese “Untrained” people in rating Mini-Training session Familiarization: 1.5 Hours Workshop Brief directions on rating scales descriptors /the rating procedures

Research Procedures (2) Session: First Rating Second Rating → Two Weeks → Type: Audio Video Audio Video R1- R19 R20-R38 R20-R38 R1- R19 120 120 120 120 120 120 120 120

Rater Procedures 38 Native Speakers of Chinese 6 or 7 assessment criteria used depending on method types Web-rating tool used Note1: e.g. Disfluency, Pronunciation, Nativeness, Communication, Syntax, Lexicon, Gesture Note 2: e.g. Audio/ video

Web-Rating Frame used

Methodology Target Raters Data Analyses Rating results of 33 untrained raters were analyzed Data Analyses Descriptive Statistics, Correlation analysis Analysis of Repeated Measures to look at the audio/visual interaction effects Logistic Regression

Q1.What kinds of rating patterns do untrained raters show? Session Type Dis Pron Natn Comn Synta Lexic Guest First Rating Audio 2.89 3.01 2.51 2.98 3.10 2.83 Visual 2.80 2.97 2.55 2.99 3.04 2.81 2.60 Mean 2.85 2.53 3.07 2.82 Second 2.76 2.48 2.92 2.91 2.77 2.87 2.68

Severity level (Mean of ratings)

Correlation Analysis (1) First Round Fluent2 Disqual Pron Natness Comm Syntax Lexicon Gesture 1 .689(**) .660(**) .698(**) .646(**) .709(**) .760(**) .691(**) .752(**) .806(**) .739(**) .633(**) .697(**) .743(**) .700(**) .775(**) .748(**) .750(**) .822(**) .783(**) .425(**) .495(**) .526(**) .503(**) .577(**) .567(**) .580(**)

Correlation Analysis (2) Second Round Fluent Disqual Pron Natness Comm Syntax Lexicon Gesture 1 .705(**) .659(**) .729(**) .660(**) .715(**) .769(**) .694(**) .762(**) .799(**) .746(**) .672(**) .725(**) .776(**) .726(**) .835(**) .679(**) .740(**) .738(**) .808(**) .805(**) .373(**) .446(**) .471(**) .439(**) .501(**) .482(**) .484(**)

Q2.Which assessment criteria can predict fluency(Logistic_1)? Rater Group 1(second) Group 2 (first) Video Exp(B) Sig. Disqual 4.552 0.000 * 4.293 Pron 1.351 0.079 2.077 Natness 2.361 1.902 Comm 1.728 0.004 * 2.092 Syntax 0.001 * 1.819 Lexicon 1.567 0.005 * 2.351 Gesture 1.108 0.391 0.932 0.578

Q2.Which assessment criteria can predict the fluency(1_2)? Group 1(S2): Y=-11.358+ 1.515X1+0.301X2+0.859X3+0.547X4 +0.643X5+0.449X6 + 0.102X7 Group 2(S1): Y=-12.620+1.457X1+0.731X2+0.643X3+0.738X4 +0.598X5+0.855X6+ -0.070X7

Q2.Which assessment criteria can predict the fluency(Logistic_1)? Rater Group 1(first) Group 2(second) Audio Exp(B) Sig. Disqual 3.632 0.000 * 6.085 Pron 1.476 0.004 * 5.002 Natness 2.419 1.051 0.715 Comm 1.500 0.008 * 1.914 Syntax 1.817 1.361 0.069 Lexicon 1.584 0.002 * 1.918

Q2.Which assessment criteria can predict the fluency(2_2)? Group 1(S1): Y=-10.188+1.290X1+0.389X2+0.883X3+0.405X4 +0.597X5+0.460X6 Group 2(S2): Y=-12.606+1.806X1+1.610X2+0.050X3+0.649X4 +0.308X5+0.651X6

Q3. Are there any interaction effects between visual and audio inputs based on the rating results by untrained raters? Within-subject Effects Source F Sig. Main Effects Session *7.190 0.007 Input Type *6.127 0.013 Interaction Effects Session*Type *12.252 0.000

Findings (1) Gestures show relatively low correlation with Fluency in both rating sessions. Gestures and Pronunciation are variables that do not predict the fluency level in G1 (video samples). Nativeness and syntax do not predict the fluency level in G2 (audio samples).

Findings (2) Interaction effects are significant. Implies that raters show different rating patterns depending on the rating session and input types (audio/video).

2nd Rating Study of Trained Raters Comparisons with rating results of 1st untrained raters Find differences in ratings between two different groups

Methodology of 2nd rating study Rating scale 6-7 Assessment criteria used in the 1st study Same speech samples used in the 1st study Visual/audio effects on rating Same input types used in the 1st study Same rating procedures used in the 1ST study Individual raters’ rating patterns

Raters Target raters Propose Rater training model for trained raters 10 - 20 Native Speakers of Chinese Teaching Experience at UIUC and other area Propose Rater training model for trained raters

Rater Training Model Actual Ratings Training Workshop STEP 1 STEP 2 Practice & Discussion Session II Practice & Discussion Session I Actual Ratings STEP 4

Rater Training Training materials Using the same rating scale descriptors and rating procedures as used in the 1st study One day Workshop (3-4 hours) More practice during on-site workshops Familiarization and norming sessions by providing lecture, practices, and discussions

Methodology in the 2nd Study For trained raters Repeated measures to look at the audio/visual interaction effect Logistic Regression analysis Correlation analysis For comparisons with two groups T-test for two group mean differences in terms of assessment criteria FACETS analysis

Validity is the underlying objective – to validate the measures developed in this project. Messick, 1989: Validity is a unitary concept which includes test use and consequences “Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment”. Kane, 2006 : Validation is a concept which understands the procedures to connect test scores/score-based inferences to test use/ the consequences of test use. “To validate an interpretation or use of measurements is to evaluate the rationale, or argument, for the claims being made, and this in turn requires a clear statement of the proposed interpretations and uses and a critical evaluation of these interpretations and uses”

Given the above definitions of validity: Further refinement of the rating scale may be suggested based on: Empirical evidence to support changes made to the rating scale New rating tool for untrained raters Wording of descriptors e.g. Phonological Control 7 Assessment criteria used in the study Propose a training model for trained raters group

Thank you for your time!!

Grammatical & Lexical Accuracy CHINESE FLUENCY PROJECT–RATING RUBRICS Version 0.43 Fluency Level Speech Flow Phonological Control Grammatical & Lexical Accuracy Delivery Skills 4 Speech is clear with fluid and ease expression; no inappropriate use of filler words, repetitions, self-correction or noticeable errors occurred that affect overall intelligibility. Phonological features (tones, rhythm, stress) used properly and easily understood by speakers unaccustomed to dealing with non-native speakers of Chinese. Excellent use of grammatical structures. Wide range of vocabulary usage. Intended meanings can be understood easily. Style of delivery (eye contact, gestures, facial expressions) effectively enhances the communication of the message. 3 Speech is clear though occasional listener effort is needed; meaning may be obscured from time to time due to a few errors and some inappropriate use of filler words, repetitions, or self-correction. Phonological features sometimes improperly used but can be understood, according to the context, by speakers accustomed to dealing with non-native speakers of Chinese. Although some errors in grammatical accuracy and selection of vocabulary, these generally do not interfere with communication. Style of delivery (eye contact, gestures, facial expressions) adequately enhances the communication of the message. 2 Speech is somewhat comprehensible despite difficulties in formulating ideas; pausing for grammatical and lexical planning and repair is very evident. Phonological features often used improperly that even speakers accustomed to dealing with non-native speakers of Chinese receive limited clues from the context and understand fragmented utterances only. Use of vocabulary and grammatical structure somewhat cause confusion and intended meanings not fully delivered. Style of delivery (eye contact, gestures, facial expressions ) is fragmented and inconsistent and this frequently impedes the communication of the message. 1 Speech contains frequent or long pausing to search for expressions; only isolated and short phrases may be understood through considerable listener effort. Phonological features mostly misused that even Chinese instructors accustomed to dealing with non-native speakers of Chinese understand very few phrases. Grammatical and vocabulary control limited to simple, short, and familiar sentences and words only. Style of delivery (eye contact, gestures, facial expressions) impedes communication of the message.

Correlation Analysis (Audio 1) First Round Fluent2 Disqual Pron Natness Comm Syntax Lexicon 1 .680(**) .639(**) .682(**) .642(**) .661(**) .731(**) .660(**) .734(**) .751(**) .666(**) .712(**) .732(**) .813(**) .677(**) .754(**) .722(**) .743(**) .792(**) .775(**)

Correlation Analysis (Audio 2) Second Round Fluent2 Disqual Pron Natness Comm Syntax Lexicon 1 .697(**) .684(**) .743(**) .600(**) .736(**) .803(**) .681(**) .783(**) .804(**) .747(**) .631(**) .750(**) .763(**) .833(**) .669(**) .795(**) .764(**) .746(**) .832(**) .822(**)