Presentation is loading. Please wait.

Presentation is loading. Please wait.

Children’s Oral Reading Corpus (CHOREC) Description & Assessment of Annotator Agreement L. Cleuren, J. Duchateau, P. Ghesquière, H. Van hamme The SPACE.

Similar presentations


Presentation on theme: "Children’s Oral Reading Corpus (CHOREC) Description & Assessment of Annotator Agreement L. Cleuren, J. Duchateau, P. Ghesquière, H. Van hamme The SPACE."— Presentation transcript:

1 Children’s Oral Reading Corpus (CHOREC) Description & Assessment of Annotator Agreement L. Cleuren, J. Duchateau, P. Ghesquière, H. Van hamme The SPACE project

2 2 Overview Presentation 1.The SPACE project 2.Development of a reading tutor 3.Development of CHOREC 4.Annotation procedure 5.Annotation agreement 6.Conclusions

3 3 1. The SPACE project SPACE = SPeech Algorithms for Clinical & Educational applications http://www.esat.kuleuven.be/psi/spraak/projects/SPACE Main goals: –Demonstrate the benefits of speech technology based tools for: An automated reading tutor A pathological speech recognizer (e.g. dysarthria) –Improve automatic speech recognition and speech synthesis to use them in these tools

4 4 2. Development of a reading tutor Main goals: –Computerized assessment of word decoding skills –Computerized training for slow and/or inaccurate readers Accurate speech recognition needed to accurately detect reading errors

5 5 3. Development of CHOREC To improve the recognizer’s reading error detection abilities: CHOREC is being developed = Children’s Oral Reading Corpus = Dutch database of recorded, transcribed, and annotated children’s oral readings Participants: –400 Dutch speaking children –6-12 years old –without (n = 274, regular schools) or with (n = 126, special schools) reading difficulties

6 6 3. Development of CHOREC (b) Reading material: –existing REAL WORDS –unexisting, well pronounceable words (i.e. PSEUDOWORDS) –STORIES Recordings: –22050 Hz, 2 microphones –42 GB or 130 hours of speech

7 7 4. Annotation procedure Segmentations, transcriptions and annotations by means of PRAAT (http://www.Praat.org)

8 8 4. Annotation procedure (b) 1.Pass 1  p-files Orthographic transcriptions Broad-phonetic transcription Utterances made by the examiner Background noise 2.Pass 2  f-files (only for those words that contain reading errors or hesitations) Reading strategy labeling Reading error labeling

9 9 4. Annotation procedure (c) Expected:Els zoekt haar schoen onder het bed. [Els looks for her shoe under the bed.] Observed:Als (says ‘something’) zoekt haar sch…schoen onder bed. [Als (says ‘something’) looks for het sh…shoe under bed.] Elszoekthaarschoenonderhetbed. Orthography*(als)*s zoekthaar* schoenonderbed PhoneticsAls*s zuktharsx sxunOnd@rbEt Strategyf*sggagggOg Errore/4

10 10 5. Annotation agreement Quality of annotations relies heavily on various annotator characteristics (e.g. motivation) and external influences (e.g. time pressure).  Analysis of inter- and intra-annotator agreement to measure quality of annotations  INTER: triple p-annotations by 3 different annotators for 30% of the corpus (p01, p02, p03)  INTRA: double f-annotations by the same annotator for 10% of the corpus (f01, f01b, f02)

11 11 5. Annotation agreement (b) Remark about the double f-annotations: –f01 = p01 + reading strategy & error tiers –f01b = f01 – reading strategy & error tiers + reading strategy & error tiers –f02 = p02 + reading strategy & error tiers Agreement metrics –Percentage agreement + 95% CI –Kappa statistic + 95% CI

12 12 5. Annotation agreement (c) All data Reading error detection (RED) 95.96% κ = 0.796 Orthographic transcr. (OT)90.79% Phonetic transcriptions (PT)86.37% κ = 0.930 Reading strategy labelling (RSL) f01-f01b (1) 98.64% κ = 0.966 f01-f02 (2) 91.50% κ = 0.779 Reading error labelling (REL) f01-f01b (1) 97.77% κ = 0.911 f01-f02 (2) 94.14% κ = 0.717 overall high agreement! К :0.717  0.966 % :86.37  98.64 INTER К :PT > RED * % :RED > OT > PT * INTRA К :RSL > REL * (1) > (2) * % :RSL > REL * for (1) RSL < REL * for (2) (1) > (2) * * p <.05

13 13 5. Annotation agreement (d) School type RegularSpecial Reading error detection (RED) 96.32%95.21% κ = 0.779κ = 0.816 Orthographic transcr. (OT) 92.13%87.93% Phonetic transcriptions (PT) 88.51%82.18% κ = 0.937κ = 0.917 Reading strategy labelling (RSL) f01-f01b (1) 98.72%98.45% κ = 0.961κ = 0.971 f01-f02 (2) 93.09%88.38% κ = 0.802κ = 0.744 Reading error labelling (REL) f01-f01b (1) 98.01%97.22% κ = 0.899κ = 0.921 f01-f02 (2) 95.39%91.71% κ = 0.722κ = 0.706 overall high agreement! К :0.706  0.971 % :82.18  98.72 * p <.05 When looking at % agreement scores: regular > special * (except for f01-f01b comparison) However, when looking at kappa values: No systematic or sign. differences: RED: regular < special * PT: regular > special * RSL(2): regular > special *

14 14 5. Annotation agreement (e) Task type Real Words (RW) Pseudowords (PW) Stories (S) Reading error detection (RED) 95.20%90.59%98.37% κ = 0.735κ = 0.776κ = 0.794 Orthographic tr. (OT) 88.92%80.50%95.56% Phonetic transcriptions (PT) 78.87%68.45%94.34% κ = 0.907κ = 0.888κ = 0.964 Reading strategy labelling (RSL) f01-f01b (1) 98.35%96.75%99.26% κ = 0.960κ = 0.966κ = 0.956 f01-f02 (2) 91.25%77.79%95.96% κ = 0.774κ = 0.733κ = 0.711 Reading error labelling (REL) f01-f01b (1) 97.55%92.55%99.32% κ = 0.896κ = 0.575κ = 0.933 f01-f02 (2) 94.57%80.88%98.24% κ = 0.709κ = 0.660κ = 0.848 overall substantial agreement! К :0.575  0.966 % :68.45  99.32 When looking at % agreement scores: S > RW > PW * However, when looking at kappa values: Always best agreement for S (except for RSL: no sign. diff. OR RW > S * in case of (2)) No systematic or sign. differences w. r. t. RW and PW: RED: RW < PW * PT: RW > PW * RSL: RW = PW REL: RW > PW (1) or RW = PW (2) * p <.05

15 15 5. Annotation agreement (f) Remarkable finding: Systematic differences in % agreement disappear when looking at kappa values! Explanation: Differences go hand in hand with differences in the amount of errors made: –children coming from special schools make more errors than children coming from a regular school – pseudowords are harder to read than real words, which are again harder to read than words embedded in a text → Kappa is better suited to assess annotation quality

16 16 6. Conclusions The SPACE project –SPeech Algorithms for Clinical and Educational applications –http://www.esat.kuleuven.be/psi/spraak/projects/SPACE CHOREC –Dutch database of recorded, transcribed, and annotated children’s oral readings Assessment of annotator agreement –High overall agreement  reliable annotations –Kappa better suited to assess annotation quality


Download ppt "Children’s Oral Reading Corpus (CHOREC) Description & Assessment of Annotator Agreement L. Cleuren, J. Duchateau, P. Ghesquière, H. Van hamme The SPACE."

Similar presentations


Ads by Google