Time for Multi-State Models of Vocabulary Acquisition? Rob Waring

Slides:



Advertisements
Similar presentations
AP Exam- Tips and Tricks
Advertisements

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Vocabulary Assessment Norbert Schmitt University of Nottingham
Item Writing Techniques KNR 279. TYPES OF QUESTIONS Closed ended  Checking yes/no, multiple choice, etc.  Puts answers in categories  Easy to score.
Categories 4321 I. Listening This will evaluate by following the criteria below. Comprehension Correct and complete the exercises and assignment; a student.
© 2010 Board of Regents of the University of Wisconsin System, on behalf of the WIDA Consortium The WIDA ELP Standards and Formative Assessment.
Making a Clay Mask 6 Step 1 Step 2 Step 3Decision Point Step 5 Step 4 Reading ComponentsTypical Types of Tasks and Test Formats Phonological/Phonemic.
Welcome to the HSC Study Day For Science. Session Outline  The Content  The Syllabus  The Biggest Secret of All!  The Biggest Mistake of All!  The.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Beginning the Research Design
Presenting information
 Oceans of fun while reading online Dr. Rob Waring.
Stages of testing + Common test techniques
Katie Shriver Danielle Tevlowitz Kristie Harris. Word recognition includes the following elements:  Recognizing words without conscious attention  Recognizing.
RELIABILITY BY DESIGN Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.
Extensive Reading Research in Action
Language Assessment 4 Listening Comprehension Testing Language Assessment Lecture 4 Listening Comprehension Testing Instructor Tung-hsien He, Ph.D. 何東憲老師.
MPDP 2010 Session 2: FORMAL ASSESSMENT - TESTING.
NUMERACY PRESENTATION PLACE VALUE & THE NUMBER SYSTEM.
The Grammar – Translation Method
Teaching Vocabulary.
Should vocabulary instruction be integrated or isolated?
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
SLOW DOWN!!!  Remember… the easiest way to make your score go up is to slow down and miss fewer questions  You’re scored on total points, not the percentage.
Classroom Assessment A Practical Guide for Educators by Craig A
Dr. Rob Waring Notre Dame Seishin University. What is ER/EL? Aims to practice and deepen knowledge of already met grammar and vocabulary Aims to build.
Today we will compute a given percent as a part of a whole number. compute – figure out or an answer using math percent – per one hundred (100) whole number.
GRE General Facts And Self-defense Tips. Registration and General Information How do I register for the GRE? –Call: 1 – 800 – GRE – CALL –Register on-line.
CHAPTER 10 – VOCABULARY: STUDENTS IN CHARGE Presenter: 1.
Collecting primary data: use of questionnaires Lecture 20 th.
Lesson Plan - APP Algebra Mental and Oral Starter Pupils to complete a ‘Heard the Word’ grid and compare it to grid they completed at the start of the.
CHAPTER 10 – VOCABULARY: STUDENTS IN CHARGE Presenter: Laura Mizuha 1.
Unit 6 Math Vocab By: Marshall Lockyer. Constant Term A constant term is a term in an equation that does not change Example: a = 6 + b : In this case,
5 th Grade Math Learning Objective: We will subtract fractions with unlike denominators. READY TO TEACH SM EDI ® Lessons ©2013 All rights reserved. EDI.
Unit 2: Geographical Skills
GRE General Facts And Self-defense Tips.
What are the stages of test construction??? Take a minute and try to think of these stages???
Evaluation, Testing and Assessment June 9, Curriculum Evaluation Necessary to determine – How the program works – How successfully it works – Whether.
Pronunciation: var·i·a·ble Part of Speech: noun noun: variable verb: vary Adjective: variable Adverb: variably.
Presenting Research Findings
LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.
New Bulgarian University ENGLISH B2 FINAL/EXAM TEST FORMAT, CONTENT and PROCEDURES © Angel Rundaliev, NBU, Sofia, Bulgaria.
GOSFORTH EAST MIDDLE KEY STAGE 2 SATS BACKGROUND Children will complete a range of tests in English and Mathematics.
Early Years and KS1. Children are supported in developing their maths in Reception in a broad range of contexts in which they explore, enjoy, learn, practise.
What helps us to learn new vocabulary?. Finding meaning 1. The teacher sends us to look up the word in a dictionary 2. The teacher provides the meaning.
Part I – 50% of grade  40 Multiple Choice Questions  90 Minutes  Obviously minutes per question.  Do the EASY ones FIRST!
 Good for:  Knowledge level content  Evaluating student understanding of popular misconceptions  Concepts with two logical responses.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
L/O/G/O Vocabulary Learning
COMMON TEST TECHNIQUES FROM TESTING FOR LANGUAGETEACHER.
COMMON TEST TECHNIQUES FROM TESTING FOR LANGUAGE TEACHERs.
TOEFL iBT Strategies Week 1. Listening “It’s so hard! They talk too fast, and use too many new words. I’ll never improve!”
Nation on testing.
Gosforth East Middle Key Stage 2 SATs 2017.
Engineering Fundamentals and Problem Solving, 6e
Classroom Assessment A Practical Guide for Educators by Craig A
Writing Vocabulary Items
Place Value and Mental Calculation
Testing testing: developing tests to support our teaching and learning
What to Look for Mathematics Grade 1
Writing Survey Questions
Development of an Online Adaptive Vocabulary Test System
VOCABULARY ASSESSMENT
Summative Assessment Grade 6 April 2018 Develop Revise Pilot Analyze
TOEFL Reading Overview
Key Stage One National Testing Arrangements
The WIDA ELP Standards and Formative Assessment
General Test-Taking Strategies
Mean vs Median Sampling Techniques
The Grammar – Translation Method
Presentation transcript:

Time for Multi-State Models of Vocabulary Acquisition? Rob Waring

Assessing vocabulary acquisition Which components do we assess? receptive? productive? use? form? meaning? Which test type? m/c? translation? vocab knowledge scales? other? Which words? general? technical? low-hi frequency? Stakes high - for formal assessment, grades etc. low - for research?

Problems with our current test battery Translation L1 –> L2 or L2 –> L1 – Low volume– only a few dozen words at best before we kill the subjects :) – Notoriously difficult to score Inter-rater reliability issues Criteria for successful answer – award half points? Easy to make arbitrary choices about what is correct Not sensitive to partial knowledge Scores can be affected by test strategy use Lack of knowledge of L1 word for an L2 equivalent Not all words can be easily translated Etc.

Problems with our current test battery Multiple choice – Low volume– only a few dozen words at best before we kill the subjects :) – Notoriously difficult to make Which distractors? Sensitive or not? Contextualized or not? Equating vocabulary frequency of the distractors and the target Use of definitions, synonyms, or ???? – 25% is a giveaway (unless there’s ‘I don’t know) – Often need correction for guessing – Scores can be affected by test strategy use – Etc.

Problems with our current test battery Knowledge scales e.g. Wesche and Paribakht (1996)

Problems with our current test battery Knowledge scales e.g. Wesche and Paribakht (1993) A *?%$#>& mess! -using ordinal data nominally -multiple aspects of knowledge at the same level – internal inconsistency -productive and receptive all mixed up -totally arbitrary scoring -unclear what gains mean (e.g. t1 mean 2.5, t2 mean 2.7 t3 mean 2.8) -compare S1 + vs. S2 II +

Assumptions underlying scales We move from the receptive to productive receptiveproductive But this assumes receptive knowledge is complete before we can produce – Huh?

Assumptions underlying scales We could have separate scales receptive productive But any gains on the receptive are not seen on the receptive and vice versa – Huh?

Assumptions underlying scales We start with a threshold receptive knowledge productive receptive But any gains on the productive still aren’t seen in the receptive and vice versa – Huh?

A solution See each of the stages as states of knowledge not a scale Recognize the data are ordinal, not nominal Develop linear scales of a single aspect of vocabulary knowledge

Simple state model 0 I do not understand (the meaning of) this word 1I understand (the meaning of) this word a little 2I understand (the meaning of) this word quite well 3I understand (the meaning of) this word very well Test design Understand Can use in a sentence Apple Book Curtain

Build a matrix 3 xxxx 2 xxx 1 x 0 xxx 0123 Understand Use

Track data over time 3 h 2 efg 1 cd 0 ab 0123 Understand Use 3 ghe 2 cf 1 bd 0 a 0123 Understand Use t1t2

3d representations

Advantages of State Models Any words, phrases, collocations, etc. can be tested Fast data collection – hundreds per hour (esp. if digitally collected) Direct access to knowledge (subject reports what they know) – Knowledge is not mediated through assumptions for what a test is assessing – E.g. sensitive vs insensitive targets, with or without context Can track a single word or multiple words over time – E.g. verbs vs nouns vs adjectives – Can see how does derivative knowledge develop – Can see at what stage can learners use systemic knowledge e.g. inflectional Allows us to see changes or development over time Allows us to see patterns in development Allows us to look at whole lexicons, not just words Any variable (subject to declarative knowledge) can be used on the axes (meaning, use, pronunciation, etc.)

Issues with State Models Not suitable for high-stakes testing Assumes subjects have access to declarative knowledge Unclear what math to use for analysis (to me at least!) Adding levels to get finer detail leads to – massive increases in data needed for reliability – a need for clear labels for each state a three state model is crude (I don’t know, I think I know, I know) a 7 state model is too fine (I don’t know, ?. ?, ?, ?, ?, I know perfectly) Polygraphs need careful attention (the various meanings of bank might need contextualizing) Labels for states determine what you are testing – I can use it vs. I can use it in a sentence vs. I can use it in speech

Issues with State Models Hard to do for listening We may need to adjust the data for accuracy of reporting Hard to validate self-reports -Can use non-words to validate reports (splonk, merd, thyde) -Will need to validate any test instrument with a pilot population before mass-use e.g. Give oral check (e.g. m/c, translations) test with pilot populations to validate their rating of say state 2, is actually state 2

Issues with State Models Need to validate knowledge reports are not random -Give subjects several tests including a subset of test A words in test B a few days apart -Pilot the test instrument with some subjects first. We should find most data are orange Same knowledge t1 t2

Questions for you… What other ways could a state model of vocabulary be used? Is there an application in your own area? What math would be appropriate to use on these data?

Thanks for your time! Rob Waring