Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Detection and Correction of Real-word Errors in Dyslexic Text Jenny Pedler School of Computer Science & Information Systems Birkbeck College Research.

Similar presentations


Presentation on theme: "The Detection and Correction of Real-word Errors in Dyslexic Text Jenny Pedler School of Computer Science & Information Systems Birkbeck College Research."— Presentation transcript:

1 The Detection and Correction of Real-word Errors in Dyslexic Text Jenny Pedler School of Computer Science & Information Systems Birkbeck College Research Presentation July 2002

2 Real-word errors However I gave (have) no idea what it represents. Now go to the macros button as shown bellow (below). Fred is away form (from) the 5-5- 89 and this leaves us vary (very) exposed. there is no evidence on Bill Gates that I have herd (heard) of

3 Work completed to date Dyslexic error corpus Investigation of possible approaches Syntactic anomaly Confusion set Part-of speech tag collocation experiment Dictionary update

4 The Dyslexic Error Corpus Sentences1395 Words21524 Non-word errors1681 Word-boundary errors152 Real-word errors842 Total errors2675

5 Possible Approaches Syntactic anomaly I haven't do (done) any in a long time. Confusion set {their, there, they're} {form, from} {weather, whether} {were, where, we're} {collage, college} {loose, lose}

6 Confusion Sets {their, there, they're} {form, from} {weather, whether} {were, where, we're} {loose, lose} {collage, college} {their, there} {form, from} {weather, whether} {were, where} {loose, lose}

7 Tag collocation experiment Syntactic approach Minimal information –part-of-speech tag of preceding and succeeding word Provide baseline for comparison with future approaches

8 Calculating word|tag probabilities Count occurrences of each tag,word pair Calculate probability for immediately preceding and succeeding tags P(tp|w), P(ts|w) Use Bayes rule to calculate probability of word occurring given the tag P(w|tp), P(w|ts) Store for use at run-time

9 Using Bayes Rule {w 1,....,w n } Set of words {t 1,....,t m } Set of tags |t j, w i | the number of occurrences of word w i collocating with tag t j

10 Run-time Retrieve part-of-speech tags for preceding word Assign highest P(w|tp) value to each member of confusion set Retrieve part-of-speech tags for succeeding word Assign highest P(w|ts) value to each member of confusion set P(w|tp) * P(w|ts) gives final value to each member Select member with highest value

11 unlike their adult 0.005774 0.005832 theretheir AJ0 PRP 0. 001514 0.175066 0. 005832 there P(w|tp) 0.002079 theretheir 0. 195774 0.0849080. 001805 NN1 AJ0 P(w|ts) 0.0020790. 195774 P(w|tp) * P(w|ts) there0. 000012 their0. 034723

12 Initial Results

13 Modifications Reduced tagset Combined probabilities

14 Results using reduced tagset

15 Results using combined tag probabilities

16 Target not in confusion set. the lose (loss) of {loose, lose} Errors in the immediate context grauate form (from) harved (graduate from Harvard ) in their teems (in their teens) Probabilities based on rare uses of a word Problems

17 Dictionary Update CUV2 –70,000+ entries More precise word-frequency information Part-of- speech tags corresponding to BNC Additional entries –words occurring frequently in BNC but not in CUV2

18 Further work Word collocation weather: hot, wet, dry, warm, severe, heavy, adverse, warmer, windy, better collage: paper, sticking, colourful, sound, brand, blue, postmodern, hessian, marble, cloth Increase the number of confusion sets Final testing


Download ppt "The Detection and Correction of Real-word Errors in Dyslexic Text Jenny Pedler School of Computer Science & Information Systems Birkbeck College Research."

Similar presentations


Ads by Google