# The Detection and Correction of Real-word Errors in Dyslexic Text Jenny Pedler School of Computer Science & Information Systems Birkbeck College Research.

## Presentation on theme: "The Detection and Correction of Real-word Errors in Dyslexic Text Jenny Pedler School of Computer Science & Information Systems Birkbeck College Research."— Presentation transcript:

The Detection and Correction of Real-word Errors in Dyslexic Text Jenny Pedler School of Computer Science & Information Systems Birkbeck College Research Presentation July 2002

Real-word errors However I gave (have) no idea what it represents. Now go to the macros button as shown bellow (below). Fred is away form (from) the 5-5- 89 and this leaves us vary (very) exposed. there is no evidence on Bill Gates that I have herd (heard) of

Work completed to date Dyslexic error corpus Investigation of possible approaches Syntactic anomaly Confusion set Part-of speech tag collocation experiment Dictionary update

The Dyslexic Error Corpus Sentences1395 Words21524 Non-word errors1681 Word-boundary errors152 Real-word errors842 Total errors2675

Possible Approaches Syntactic anomaly I haven't do (done) any in a long time. Confusion set {their, there, they're} {form, from} {weather, whether} {were, where, we're} {collage, college} {loose, lose}

Confusion Sets {their, there, they're} {form, from} {weather, whether} {were, where, we're} {loose, lose} {collage, college} {their, there} {form, from} {weather, whether} {were, where} {loose, lose}

Tag collocation experiment Syntactic approach Minimal information –part-of-speech tag of preceding and succeeding word Provide baseline for comparison with future approaches

Calculating word|tag probabilities Count occurrences of each tag,word pair Calculate probability for immediately preceding and succeeding tags P(tp|w), P(ts|w) Use Bayes rule to calculate probability of word occurring given the tag P(w|tp), P(w|ts) Store for use at run-time

Using Bayes Rule {w 1,....,w n } Set of words {t 1,....,t m } Set of tags |t j, w i | the number of occurrences of word w i collocating with tag t j

Run-time Retrieve part-of-speech tags for preceding word Assign highest P(w|tp) value to each member of confusion set Retrieve part-of-speech tags for succeeding word Assign highest P(w|ts) value to each member of confusion set P(w|tp) * P(w|ts) gives final value to each member Select member with highest value

unlike their adult 0.005774 0.005832 theretheir AJ0 PRP 0. 001514 0.175066 0. 005832 there P(w|tp) 0.002079 theretheir 0. 195774 0.0849080. 001805 NN1 AJ0 P(w|ts) 0.0020790. 195774 P(w|tp) * P(w|ts) there0. 000012 their0. 034723

Initial Results

Modifications Reduced tagset Combined probabilities

Results using reduced tagset

Results using combined tag probabilities

Target not in confusion set. the lose (loss) of {loose, lose} Errors in the immediate context grauate form (from) harved (graduate from Harvard ) in their teems (in their teens) Probabilities based on rare uses of a word Problems

Dictionary Update CUV2 –70,000+ entries More precise word-frequency information Part-of- speech tags corresponding to BNC Additional entries –words occurring frequently in BNC but not in CUV2

Further work Word collocation weather: hot, wet, dry, warm, severe, heavy, adverse, warmer, windy, better collage: paper, sticking, colourful, sound, brand, blue, postmodern, hessian, marble, cloth Increase the number of confusion sets Final testing

Download ppt "The Detection and Correction of Real-word Errors in Dyslexic Text Jenny Pedler School of Computer Science & Information Systems Birkbeck College Research."

Similar presentations