Ordinate Corporation Menlo Park, California 1 Workable Models of Standard Performance in English & Spanish 2 June 2005 EALTA Voss, Norway J. Bernstein,

Slides:

Advertisements

Similar presentations

3.6 Support Vector Machines

Advertisements

AP STUDY SESSION 2.

5 th International Teachers Conference Singapore October 2009 Teaching Science and Languages English as a Second Language.

Copyright © 2003 Pearson Education, Inc. Slide 7-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.

Author: Julia Richards and R. Scott Hawley

UNITED NATIONS Shipment Details Report – January 2006.

David Burdett May 11, 2004 Package Binding for WS CDL.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Literacy Block Others Parts of the Day 90 Min. Reading Block

1 What Is The Next Step? - A review of the alignment results Liru Zhang, Katia Forêt & Darlene Bolig Delaware Department of Education 2004 CCSSO Large-Scale.

Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.

Developed by Becky Smith, Debbie Stanley, and David Jackson

© 2011 Board of Regents of the University of Wisconsin System, on behalf of the WIDA Consortium Introduction to the WIDA Consortium Jesse Markow.

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Utilizing the 2012 WIDA ELD Standards to Support EL Achievement

Year 6 mental test 10 second questions

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.

1 Discreteness and the Welfare Cost of Labour Supply Tax Distortions Keshab Bhattarai University of Hull and John Whalley Universities of Warwick and Western.

Projects in Computing and Information Systems A Student’s Guide

Chapter 7 Sampling and Sampling Distributions

Curriculum, Instruction, and Assessment Summit Massachusetts Tiered System of Support MTSS – Academic - AM Massachusetts Department of Elementary and.

REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.

Turing Machines.

Effective Test Planning: Scope, Estimates, and Schedule Presented By: Shaun Bradshaw

1 Challenge the future Subtitless On Lightweight Design of Submarine Pressure Hulls.

PP Test Review Sections 6-1 to 6-6

H.S. Students English Oral Training for Specific Purposes Presented by Ling, Yuling ( ) Kaohsiung Municipal Hsin Chuang S. H. School ( )

EU market situation for eggs and poultry Management Committee 20 October 2011.

Thomas Jellema & Wouter Van Gool 1 Question. 2Answer.

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

A Process to Identify the Enduring Skills, Processes, & Concepts for your Content Area 1.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

The world leader in serving science TQ ANALYST SOFTWARE Putting your applications on target.

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.

© 2012 National Heart Foundation of Australia. Slide 2.

Copyright © 2014 by Educational Testing Service. ETS, the ETS logo, LISTENING. LEARNING. LEADING. and GRE are registered trademarks of Educational Testing.

UMS Speaking Writing Reading Macedonia Jagged profiles Paper 1 Paper 5.

ELA Materials Update HMR Medallion Edition 2010

Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M

1 Using one or more of your senses to gather information.

Subtraction: Adding UP

Teaching Adults to Read: Assessment Strategies and Reading Profiles 2011 ABE Statewide Summer Institute August 19,

Analyzing Genes and Genomes

Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Essential Cell Biology

Intracellular Compartments and Transport

PSSA Preparation.

Experimental Design and Analysis of Variance

Essential Cell Biology

Simple Linear Regression Analysis

1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.

Multiple Regression and Model Building

Chapter 5 Ratios and Proportion MH101 Spring 2013 J. Menghini Class 1 1.

Basics of Statistical Estimation

Commonly Used Distributions

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado.

L ANGUAGE T ESTING S ERVICES John H.A.L. de Jong The Role of the Common European Framework John H.A.L. de Jong EALTA Conference, Kranjska Gora,

Automatic Fluency Assessment

Presentation transcript:

Ordinate Corporation Menlo Park, California 1 Workable Models of Standard Performance in English & Spanish 2 June 2005 EALTA Voss, Norway J. Bernstein, J. Balogh, M. Lennig, E. Rosenfeld Ordinate Corporation

Ordinate Corporation Menlo Park, California 2 Presentation What does it mean to speak language X? Practical Problem: measure listening & speaking in a particular language (English or Spanish). Describe development & evaluation of workable models of language performance

Ordinate Corporation Menlo Park, California 3 Application dictates Technology Requirement for large volumes (>100/day) and for fairness suggests fully automatic methods Fully automatic testing dictates explicit, simple models of language (to implement & train) New models and methods require evaluation

Ordinate Corporation Menlo Park, California 4 Types of Spoken Language Test Language Proficiency Interview (LPI) –Fully Human, operational construct definition –ILR OPI, ACTFL OPI, … TSE Automatic spoken language test –Fully automatic tests with facility construct –PhonePass SET-10, SST

Ordinate Corporation Menlo Park, California 5 Types of Spoken Language Model Oral Proficiency in Communication –Structure: applied linguistics research literature –Content: iterative expert judgment Performance with Language –Structure: General-purpose statistical estimation –Content: iterative training on performance data

Ordinate Corporation Menlo Park, California 6 Applied Linguistics ~ SLP Applied Linguistics Spoken Language Processing ?

Ordinate Corporation Menlo Park, California 7 SLP History Spoken Language Processing Simplicity (+ data) swamps insight. Practical Goal: Human Machine Dialog Original mainstream method was to implement expert meta-cognitive strategies. Jelinek and others redefined the critical task as decoding speech to text on statistical basis.

Ordinate Corporation Menlo Park, California 8 First Overly Simple Model (1980s) Utterance words phonemes acoustics Grammar: p(w3) given (w1, w2) Trained on large data sets, out-performed expert models based in insight. The whole field changed and the statistical methods took over natural language processing.

Ordinate Corporation Menlo Park, California 9 Construct Comparison COMMUNICATIVE COMPETENCE* OrganizationPragmatics Grammar Text Illocution Socioling. VMSPCohRhIdeatManipHuerImagDialRegNatCult LANGUAGE FACILITY* Grammar Skill VMSPRateFluency * Bachman * SET-10 FSMs, HMMs Metric in time Taxonomic

Ordinate Corporation Menlo Park, California 10 Construct Comparison OPI Construct: Oral Proficiency as manifest in an Oral Proficiency Interview, but often with reference to communicative competence as reflected in the functional level and/or complexity of content accurately produced. SET-10 Construct: facility in spoken English – the ability to understand spoken English and speak appropriately in response at a native-like pace on everyday topics.

Ordinate Corporation Menlo Park, California 11 SET-10 Format: Test number (PIN) Part A: reading 8 items Part B: repeat Ss16 items Part C: short Qs24 items Part D: build Ss10 items Part E: open Qs 3 items

Ordinate Corporation Menlo Park, California 12 SET-10 Task Structure ReadAnswer Short QuestionRepeat Sentence 10 minutes Build Sentence Open Qs (Grey items not scored). Integrated listen speak items Items require real-time processing

Ordinate Corporation Menlo Park, California 13 SLP Paradigm in SET & SST Integrated model of linguistic performance embedded phoneme, word, and phrase networks quantitative models of criterion judgment and data- driven performance criteria Corpus-based content and scoring Content is restricted by corpus occurrence Explicit model of target interlocutor Explicit, metric combination score elements

Ordinate Corporation Menlo Park, California 14 How SET, SST model a language Hidden Markov Model framework (FSM, HMM) Embedded stochastic networks Lexicon; metric phrase & clause networks Prosodic and segmental performance models Scoring is inherently disjunctive Item Response Theory Logistic regression (data-driven implicature)

Ordinate Corporation Menlo Park, California 15 Construct and Model Facility in spoken English: ability to track what is said, extract meaning in real time, and formulate and produce relevant, intelligible responses, at a conversational pace Decoding language structures Spoken turn social declarative discourse social declarative discourse Encoding language structures (Real time process)

Ordinate Corporation Menlo Park, California 16 Phoneme & Word Alignment w1 w2 w3 w4 w5 w Words/Min p p pppp p p p p p pp ppp pp p p p p p 5.8 Phones/Sec waveform spectrum segmentation words

Ordinate Corporation Menlo Park, California 17 Simplified Response Network I hour an dontknow thirty Billlatewas half minutes SIL FIL SIL

Ordinate Corporation Menlo Park, California 18 Item Development Process 1.Bound lexicon to 1 st 7000 lemmas in Switchboard 2.Sample sentences from N.American text or spoken transcripts; edit to fit in lexical bounds 3.Review text form in US, UK, Australia 4.Recitation recordings from diverse N.Americans 5.Pilot items on sample >= 50 natives/item (US, UK) If less than 90% correct, exclude the item

Ordinate Corporation Menlo Park, California 19 Spanish Item Process 1.Bound by lexicon to LDC counts (Sp, Ar, Mx) 2.Sample sentences from Argentine developer 3.Review text form; intersect Puerto Rico, Mexico, Venezuela, Spain, Argentina, and Ecuador e.g. Aquellos eran otros tiempos. Algunas veces se quedaba dormido. Recitation recordings from diverse Latinos Pilot items on sample >= 50 natives (Argentina, Mexico, Puerto Rico, Columbia,…) If less than 80% correct, exclude the item

Ordinate Corporation Menlo Park, California 20 SST Development and Validation Scale Estimates Test Spec SST Scores Native Test Developers Ordinate: SST Content Recorded Items Validation Concurrent ILR or ACTFL Interviews Spanish Learners Native Scribes Criteria Native Judges 29,000 scale scores 52,000 transcripts OPI Scores SpanishN atives 2 nd 1 st

Ordinate Corporation Menlo Park, California 21 Augmented ASR Human Transcribers transcrip- tions Human Raters human scores PPass Scoring Vocabulary Pronunciation Fluency Sentence Mastery Σ Overall score PPass Scoring Vocabulary Pronunciation Fluency Σ S. Mastery Overall score r Spoken responses Repeats Short answers S Builds 1 st Validation Machine Estimates

Ordinate Corporation Menlo Park, California 22 1 st Machine-Human Comparison correlation = 0.94 N = 288 Human scoring compared to machine-scoring

Ordinate Corporation Menlo Park, California 23 2 nd Validation: Human ~ Machine Scores ReadShort QuestionRepeat SentenceBuild SOQSt ROpposite ILR-SPT, CEF Scale Estimates (2 human raters per) ReadShort QuestionRepeat SentenceBuild SOQSt ROpposite SST Machine Scores ILR-SPT and ACTFL Human Interview Scores

Ordinate Corporation Menlo Park, California 24 2 nd Validation: Spanish Data (SST) U.S. Government OPI Interviews 1. OPI A-Raters ~ A-Raters Estimate 2. OPI A-Raters ~ B-Raters Estimate 3. OPI A-Raters ~ Machine score 1. Same Raters Different Material 2. Two Rater Pairs Different Material 3. Machine ~ Two Raters Different Material r = 0.94 r = 0.92

Ordinate Corporation Menlo Park, California 25 Comparisons to CEF Two Rater Pairs Same Material SST ~ CEF ILR Estimate-DLI ~ CEF Machine ~ Two Raters Different Material

Ordinate Corporation Menlo Park, California 26 ACTFL Interviews Two Rater Pairs Different Material ILR Estimate-DLI ~ ACTFL Machine ~ Two Raters Different Material SST ~ ACTFL

Ordinate Corporation Menlo Park, California 27 Model Fits New Dialect Performances CDF by Country

Ordinate Corporation Menlo Park, California 28 Item-specific models are sharper

Ordinate Corporation Menlo Park, California 29 SST Summary & Conclusions SST (Spoken Spanish Test) contains Material sufficient for ILR or ACTFL estimate 49 constrained responses are adequate Six 30-second responses also adequate –Automatic scoring: strong predictor from 49 responses –SST consistently assigns high scores to natives –SST distributes learners of Spanish over a wide range Useful alignment with ILR, CEF, ACTFL levels –SST scores can estimate >80% variance of CEF scores

Ordinate Corporation Menlo Park, California 30 2 nd Validation Performance Puzzle LANGUAGE FACILITY Grammar Skill VMSPRateFluency ~80% of variance COMMUNICATIVE COMPETENCE* OrganizationPragmatics Grammar Text Illocution Socioling. VMSPCohRhIdeatManipHuerImagDialRegNatCult SET tests contain sufficient material for equivalent rating Automatic scoring matches test-retest performance of criterion instruments

Ordinate Corporation Menlo Park, California 31 Cross-Construct Puzzle The communicative frameworks (e.g. ILR, CEF) generally look for the maximum complexity level of material or function that can be expressed (without time constraint) SET-10 measures automaticity of perception and production for relatively simple material Yet SET-10 predicts communicative measures at or near their reliability limit

Ordinate Corporation Menlo Park, California 32 Message complexity depends, in part, on automaticity If one measures communicative competence by the functional level or relative complexity of the messages that are communicated, what are the bases of this complexity? 1. Adequate language-independent cognition 2. Adequate control of the language system 3. In listening and speaking, adequate automaticity of encoding and decoding

Ordinate Corporation Menlo Park, California 33 Linguistics Reconceived Read Chapter 1 in: C. Manning & H. Schutze (1999) Foundations of Statistical Natural Language Processing. MIT Press. The question is: what might a person say? rather than what is the structure of the language? Linguistics may be coming back to language use, but not thru the lens of Hymes communicative competence.

Ordinate Corporation Menlo Park, California 34 Model Characteristics Explicit and predictive Language focused; not IQ, not social skills Advantages of this kind of modeling –Equivalent scoring across time and location –Expandable capacity – up to 1000s of tests per day –Open to continuous audit – reliability & accuracy –Periodic re-estimation of parameters e.g. item difficulty, subscore combination

Ordinate Corporation Menlo Park, California 35 Automatic Spoken Language Testing SET-10 and SST build models of native and high-proficiency non-native behavior. Tests work because models of proficiency- dependent aspects of performance spread the L2 speakers but dont differentiate L1 speakers (even new dialect samples). Sentence-level structural diffs >> social and supra-sentential diffs for common L pairs (hypothesis).

Ordinate Corporation Menlo Park, California 36...