Children’s Oral Reading Corpus (CHOREC) Description & Assessment of Annotator Agreement L. Cleuren, J. Duchateau, P. Ghesquière, H. Van hamme The SPACE.

Slides:

Advertisements

Similar presentations

How to integrate automatic speech recognition (ASR) into CALL applications Helmer Strik Department of Linguistics Centre for Language and Speech Technology.

Advertisements

Maxine Eskenazi Language Technologies Institute Carnegie Mellon University.

Parent and Educator Information Dyslexia

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.

Effects of Competence, Exposure, and Linguistic Backgrounds on Accurate Production of English Pure Vowels by Native Japanese and Mandarin Speakers Malcolm.

READER’S THEATRE CONNECTING CONTENT AND LITERACY THE MARRIAGE OF ENGAGEMENT AND PERFORMANCE.

How Can Parents Help Children to Learn?

What is fluency?  Speed + Accuracy = Fluency  Reading quickly and in a meaningful way (prosody)  Decoding and comprehending simultaneously  Freedom.

Catia Cucchiarini Quantitative assessment of second language learners’ fluency in read and spontaneous speech Radboud University Nijmegen.

Katholieke Universiteit Leuven - ESAT, BELGIUM The SPACE project: Speech Algorithms for Clinical and Educational Applications Hugo Van hamme SPACE symposium.

DRA Training Lyndhurst Public Schools K- 3 Elba Castrovinci September 2013.

Multi-Modal Text Entry and Selection on a Mobile Device David Dearman 1, Amy Karlson 2, Brian Meyers 2 and Ben Bederson 3 1 University of Toronto 2 Microsoft.

1 © 2013 UNIVERSITY OF PITTSBURGH 1 Using the content-focused Coaching® Model to Support Early childhood Literacy and Language Development How to Teach.

Striving Readers Comprehensive Literacy Program: Considering the Needs of Students With or At-Risk for Reading Disabilities Paige C. Pullen, Ph.D. University.

Department of Industrial Management Engineering 1.Introduction ○Usability evaluation primarily summative ○Informal intuitive evaluations by designers even.

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

Why an objective intelligibility assessment ? Catherine Middag Jean-Pierre Martens Gwen Van Nuffelen Marc De Bodt.

TBALL Data Collection Abe Kazemzadeh, Hong You, Markus Iseli, Barbara Jones, Xiaodong Cui, Margaret Heritage, Patti Price, Elaine Anderson, Shrikanth Narayanan,

Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.

Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.

CSEP 590 B, Accessibility, Autumn October 20, 2008 Promoting Adoption and Diversity in Access Technologies for Adults with Reading Disabilities.

Developing Vocabulary & Enhancing Reading Comprehension SPC ED 587 October 25, 2007.

Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.

Introduction to machine learning

LOGO Mini Agenda Comparison of two reading strategies instructions in promoting college student’s reading comprehension Fanny Chang.

Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.

1 SPACE Reading Tutor: Design and Functionality Leen Cleuren Jacques Duchateau Pol Ghesquière Hugo Van hamme.

® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.

STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.

Arabic STD 2006 Results Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop

Identifying the gaps in state assessment systems CCSSO Large-Scale Assessment Conference Nashville June 19, 2007 Sue Bechard Office of Inclusive Educational.

Systematic Reviews.

1 Evaluation of a Reading Tutor with Synthesized Speech Feedback for Dutch Speaking Elementary School Children with Reading Difficulties Leen Cleuren Lukas.

SLD Academy 2.0 Houston Independent School District.

CHAPTER SEVEN ASSESSING AND TEACHING READING: PHONOLOGICAL AWARENESS, PHONICS, AND WORD RECOGNITION.

Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.

Illustration of a Validity Argument for Two Alternate Assessment Approaches Presentation at the OSEP Project Directors’ Conference Steve Ferrara American.

Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.

Assessment of Phonology

Member Development and Support Tools and Resources for Building Strong Programs.

Developing an automated assessment tool for children’s oral reading Leen Cleuren March

Training Individuals to Implement a Brief Experimental Analysis of Oral Reading Fluency Amber Zank, M.S.E & Michael Axelrod, Ph.D. Human Development Center.

(2) Using age-appropriate activities, students develop the ability to perform the tasks of the novice language learner. The novice language learner, when.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

1 Wilson Reading System “What is Intervention”. 2 The Gift of Learning to Read When we teach a child to read we change her life’s trajectory.

A Fully Annotated Corpus of Russian Speech

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

Introductions. Specialized instruction in Written Expression: The challenges of Learning to Write.

WCSD-Ganley 2009 Speech-Language PLC Our Journey as a PLC Defining our next steps October 30, 2009.

Assessment What are the differences between authentic and traditional assessment? What kinds of artifacts can be collected in authentic assessment for.

Intellectual Development from One to Three Chapter 12.

ELEMENTARY SCHOOL TEACHER By: Emily Vaverek. DEGREES/PAY  The average elementary school teacher in Georgia has a masters degree  The average median.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,

Dyslexia GTN 302/3 Community Nutrition & Dietetic Service Practicum SATESH BALACHANTHAR DIETETICS 3 YEAR.

Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.

Elementary Teacher BY: NICOLE CRAIN. Degree/s Needed  Percentage of RespondentsEducation Level Required  75% Bachelor's degree  19% Master's degree.

Assessment. Issues related to Phonemic awareness assessment  Is it a conceptual understanding about language or is it a skill?

LISTENING: QUESTIONS OF LEVEL FRANCISCO FUENTES NICOLAS VALENZUELA.

NURS3030H NURSING RESEARCH IN PRACTICE MODULE 7 ‘Systematic Reviews’’

Automatic Speech Recognition

Parent and Educator Information Dyslexia

Biometrics Reg: AMP/HNDIT/F/F/E/2013/067.

Audio Books for Phonetics Research

Automating Early Assessment of Academic Standards for Very Young Native and Non-Native Speakers of American English better known as The TBALL Project.

Parent and Educator Information Dyslexia

Phonological Priming and Lexical Access in Spoken Word Recognition

Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen

Presentation transcript:

Children’s Oral Reading Corpus (CHOREC) Description & Assessment of Annotator Agreement L. Cleuren, J. Duchateau, P. Ghesquière, H. Van hamme The SPACE project

2 Overview Presentation 1.The SPACE project 2.Development of a reading tutor 3.Development of CHOREC 4.Annotation procedure 5.Annotation agreement 6.Conclusions

3 1. The SPACE project SPACE = SPeech Algorithms for Clinical & Educational applications Main goals: –Demonstrate the benefits of speech technology based tools for: An automated reading tutor A pathological speech recognizer (e.g. dysarthria) –Improve automatic speech recognition and speech synthesis to use them in these tools

4 2. Development of a reading tutor Main goals: –Computerized assessment of word decoding skills –Computerized training for slow and/or inaccurate readers Accurate speech recognition needed to accurately detect reading errors

5 3. Development of CHOREC To improve the recognizer’s reading error detection abilities: CHOREC is being developed = Children’s Oral Reading Corpus = Dutch database of recorded, transcribed, and annotated children’s oral readings Participants: –400 Dutch speaking children –6-12 years old –without (n = 274, regular schools) or with (n = 126, special schools) reading difficulties

6 3. Development of CHOREC (b) Reading material: –existing REAL WORDS –unexisting, well pronounceable words (i.e. PSEUDOWORDS) –STORIES Recordings: –22050 Hz, 2 microphones –42 GB or 130 hours of speech

7 4. Annotation procedure Segmentations, transcriptions and annotations by means of PRAAT (

8 4. Annotation procedure (b) 1.Pass 1  p-files Orthographic transcriptions Broad-phonetic transcription Utterances made by the examiner Background noise 2.Pass 2  f-files (only for those words that contain reading errors or hesitations) Reading strategy labeling Reading error labeling

9 4. Annotation procedure (c) Expected:Els zoekt haar schoen onder het bed. [Els looks for her shoe under the bed.] Observed:Als (says ‘something’) zoekt haar sch…schoen onder bed. [Als (says ‘something’) looks for het sh…shoe under bed.] Elszoekthaarschoenonderhetbed. Orthography*(als)*s zoekthaar* schoenonderbed PhoneticsAls*s zuktharsx Strategyf*sggagggOg Errore/4

10 5. Annotation agreement Quality of annotations relies heavily on various annotator characteristics (e.g. motivation) and external influences (e.g. time pressure).  Analysis of inter- and intra-annotator agreement to measure quality of annotations  INTER: triple p-annotations by 3 different annotators for 30% of the corpus (p01, p02, p03)  INTRA: double f-annotations by the same annotator for 10% of the corpus (f01, f01b, f02)

11 5. Annotation agreement (b) Remark about the double f-annotations: –f01 = p01 + reading strategy & error tiers –f01b = f01 – reading strategy & error tiers + reading strategy & error tiers –f02 = p02 + reading strategy & error tiers Agreement metrics –Percentage agreement + 95% CI –Kappa statistic + 95% CI

12 5. Annotation agreement (c) All data Reading error detection (RED) 95.96% κ = Orthographic transcr. (OT)90.79% Phonetic transcriptions (PT)86.37% κ = Reading strategy labelling (RSL) f01-f01b (1) 98.64% κ = f01-f02 (2) 91.50% κ = Reading error labelling (REL) f01-f01b (1) 97.77% κ = f01-f02 (2) 94.14% κ = overall high agreement! К :0.717  % :86.37  INTER К :PT > RED * % :RED > OT > PT * INTRA К :RSL > REL * (1) > (2) * % :RSL > REL * for (1) RSL < REL * for (2) (1) > (2) * * p <.05

13 5. Annotation agreement (d) School type RegularSpecial Reading error detection (RED) 96.32%95.21% κ = 0.779κ = Orthographic transcr. (OT) 92.13%87.93% Phonetic transcriptions (PT) 88.51%82.18% κ = 0.937κ = Reading strategy labelling (RSL) f01-f01b (1) 98.72%98.45% κ = 0.961κ = f01-f02 (2) 93.09%88.38% κ = 0.802κ = Reading error labelling (REL) f01-f01b (1) 98.01%97.22% κ = 0.899κ = f01-f02 (2) 95.39%91.71% κ = 0.722κ = overall high agreement! К :0.706  % :82.18  * p <.05 When looking at % agreement scores: regular > special * (except for f01-f01b comparison) However, when looking at kappa values: No systematic or sign. differences: RED: regular < special * PT: regular > special * RSL(2): regular > special *

14 5. Annotation agreement (e) Task type Real Words (RW) Pseudowords (PW) Stories (S) Reading error detection (RED) 95.20%90.59%98.37% κ = 0.735κ = 0.776κ = Orthographic tr. (OT) 88.92%80.50%95.56% Phonetic transcriptions (PT) 78.87%68.45%94.34% κ = 0.907κ = 0.888κ = Reading strategy labelling (RSL) f01-f01b (1) 98.35%96.75%99.26% κ = 0.960κ = 0.966κ = f01-f02 (2) 91.25%77.79%95.96% κ = 0.774κ = 0.733κ = Reading error labelling (REL) f01-f01b (1) 97.55%92.55%99.32% κ = 0.896κ = 0.575κ = f01-f02 (2) 94.57%80.88%98.24% κ = 0.709κ = 0.660κ = overall substantial agreement! К :0.575  % :68.45  When looking at % agreement scores: S > RW > PW * However, when looking at kappa values: Always best agreement for S (except for RSL: no sign. diff. OR RW > S * in case of (2)) No systematic or sign. differences w. r. t. RW and PW: RED: RW < PW * PT: RW > PW * RSL: RW = PW REL: RW > PW (1) or RW = PW (2) * p <.05

15 5. Annotation agreement (f) Remarkable finding: Systematic differences in % agreement disappear when looking at kappa values! Explanation: Differences go hand in hand with differences in the amount of errors made: –children coming from special schools make more errors than children coming from a regular school – pseudowords are harder to read than real words, which are again harder to read than words embedded in a text → Kappa is better suited to assess annotation quality

16 6. Conclusions The SPACE project –SPeech Algorithms for Clinical and Educational applications – CHOREC –Dutch database of recorded, transcribed, and annotated children’s oral readings Assessment of annotator agreement –High overall agreement  reliable annotations –Kappa better suited to assess annotation quality